<a href="https://colab.research.google.com/github/aaron-abrams-uva/DS1002-S24/blob/main/05_python_data_structures_student.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metadata

```
Course:   DS1002
Module:   04 Basics
Topic:    Data Structures
```

# Data Structures in Python

-----------------------
### PREREQUISITES
- variables
- data types

### REFERENCE
- Python documentation on data structures
https://docs.python.org/3/tutorial/datastructures.html


- Python data structures
https://learning.oreilly.com/library/view/python-in-a/9781098113544/ch03.html#data_types


### OBJECTIVES
- Present the essential Python data structures and some of their basic functionality

NOTE:
1. See above references for more details. As the course progresses, our use of data structures will complexify.
2. We jump ahead a bit, showing functionality involving `for-loops`,  
both covered in more detail later.


### CONCEPTS
- zero-based indexing
- list
- tuple
- set
- dictionary (dict)
- enumerate
- range

# Data Structures

In contrast to primitive data types, data structures organize types into structures that have certain properties, such as **order**, **mutability**, and **addressing scheme**, e.g. by index.

# Lists

A list is an ordered sequence of items.

Each element of a list is associated with an integer that represents the order in which the element appears.

Lists are indexed with **brackets** `[]`.

List elements are accessed by providing their order number in the brackets.

Lists are **mutable**, meaning you can modify them after they have been created.

They can contain mixed types.

## Constructing

They can be **constructed** in several ways:

In [None]:
list1 = [3, 7, 8]
list2 = list((4, 6, "hello"))
list3 = "some string".split()
numbers = [1,2,3,4]

# play with print
list1
print(list1, list2, list3)
numbers


> Using any of the above methods, make a list containing your numerical birth month, the first letter of your name, & a boolean to the question: I like coffee.

## Indexing

**Zero-based indexing**  

Python uses zero-based indexing, which means for a collection `mylist`

`mylist[0]` references the first element  
`mylist[1]` references the second element, etc

For any iterable object of length *N*:  
`mylist[:n]` will return the first *n* elements from index *0* to *n-1*  
`mylist[-n:]` will return the last *n* elements from index *N-n* to *N-1*

In [None]:
numbers = [1,2,3,4,10,17,30]

In [None]:
numbers[0] # Access first element (output: 1)

In [None]:
numbers[0] + numbers[3] # doing arithmetic with the values (output: 5)

In [None]:
len(numbers)


In [None]:
numbers[-1:]

In [None]:
numbers[:2] # returns 1 (index 0) through 2 (index 2 minus 1)

In [None]:
numbers[-2:] # returns 3 (4 minus 2 = index 2) through 4 (4 minus 1 = index 3)

Can find the index of an element by using `.index()`

In [None]:
numbers.index(3)

> Using the list you created above, using indexing to pull out your numerical birth month.

## Slicing

`a[start:stop]` items start through stop-1

`a[start:]`     items start through the rest of the array

`a[:stop]`      items from the beginning through stop-1

`a[:]`          a copy of the whole array


In [None]:
numbers[0:2] # Output: [1, 2]

In [None]:
numbers[1:3] # Output: [2, 3]

In [None]:
numbers[2:]  # Output: [3, 4]

> Slice your list using a method from above. For more see https://www.learnbyexample.org/python-list-slicing/

## Multiply lists by a scalar

A scalar is a single value number.

In [None]:
numbers * 2

> multiply your list by 4

## Concatenate lists with `+`

In [None]:
numbers2 = [30, 40, 50]

In [None]:
numbers + numbers2 # concatenate two lists

> Add `numbers` to your list    

## Lists can mix types

In [None]:
myList = ['coconuts', 777, 7.25, 'Sir Robin', 80.0, True]

In [None]:
myList

**What happens if we multiply a list with strings?**

In [None]:
myList * 2

In [None]:
myList * myList

## Lists can be nested

In [None]:
names = ['Darrell', 'Clayton', ['Billie', 'Arthur'], 'Samantha']
print(names[2]) # returns a *list*
print(names[0]) # returns a *string*

# Dictionaries `dict`

Like a hash table, containing key-value pairs.

Elements are indexed using brackets `[]` (like lists).

But they are constructed used braces `{}` or `dict()`.

Key names must be unique. If you re-use a key, you overwrite its value.

Keys don't have to be strings -- they can be numbers or tuples or expressions that evaluate to one of these.

### Constructing

In [None]:
dictionary1 = {
    'a': 1,
    'b': 2,
    'c': 3
}
dictionary1

In [None]:
dictionary2 = dict(x=55, y=29, z=99) # Note the absence of quotes around keys

In [None]:
dictionary2

In [None]:
dictionary3 = {'A': 'foo', 99: 'bar', (1,2): 'baz'}

In [None]:
dictionary3

### Retrieve a value

Just write a key as the *index*.

In [None]:
phonelist = {'Tom':123, 'Bob':456, 'Sam':897}

In [None]:
phonelist['Bob']

### Print list of keys, values, or both

Use the `.keys()`, `.values()`, or `.items()` methods.

Keys are not sorted. They print in the order entered.

`.keys()`

In [None]:
phonelist.keys() # Returns a list

`.values()`

In [None]:
phonelist.values() # Returns a list

`.items()`

In [None]:
phonelist.items() # Returns a list of tuples

In [None]:
phonelist

Example of sorting keys using a **For Loop**

In [None]:
for key in sorted(phonelist.keys()):
    print(key)

In [None]:
spl = sorted(phonelist.items())
spl

# Tuples

A tuple is like a list but with one big difference: **a tuple is an immutable object!**

You can't change a tuple once it's created.

A tuple can contain any number of elements of any datatype.

Accessed with brackets `[]` and constructed with or without parentheses `()`.

In [None]:
numbers[3] = 30
numbers

In [None]:
# tuples are immutable
numbers4 = (26, 27, 28)
numbers4[2]

## Constructing

Created with comma-separated values, with or without parentheses.


In [None]:
letters = 'a', 'b', 'c', 'd'

In [None]:
letters

In [None]:
numbers = (1,2,3,4) # numbers 1,2,3,4 stored in a tuple

A single valued tuple must include a comma `,`, e.g.

In [None]:
# missing the comma
tuple0 = (29)

In [None]:
tuple0, type(tuple0)

In [None]:
# comma included
tuple1 = (29,)

In [None]:
tuple1, type(tuple1)

In [None]:
# attempting to reassign a tuple value
numbers[0] = 5 # Trying to assign a new value 5 to the first position

> List one important attribute of list, dictionary, and tuple

# Common functions and methods to all sequences

```
len()
in
+
*
```

In [None]:
[1, 3] * 8

In [None]:
(1, 3) * 8

## Membership with `in`

Returns a boolean.

In [None]:
'Sam' in phonelist

# Sets

A `set` is an unordered collection of unique objects.

They are subject to set operations.

In [None]:
peanuts = {'snoopy','snoopy','woodstock'}

In [None]:
peanuts

Note the set is deduped, i.e., duplicates are automatically removed.

Since sets are unordered, they don't have an index. This will break:

In [None]:
peanuts[0]

In [None]:
for peanut in peanuts:
    print(peanut)

**Check if a value is in the set using `in`**

In [None]:
'snoopy' in peanuts

Combine two sets

In [None]:
set1 = {'python','R'}
set2 = {'R','SQL'}

This fails:

In [None]:
set1 + set2

This succeeds:

In [None]:
set1.union(set2)

Get the set intersection

In [None]:
set1.intersection(set2)

# Ranges

A range is a sequence of integers, from `start` to `stop` by `step`.
- The `start` point is zero by default.  
- The `step` is one by default.  
- The `stop` point is NOT included.  

Ranges can be assigned to a variable.

In [None]:
rng = range(5)
rng

More often, ranges are used in iterations, which we will cover later.

In [None]:
for rn in rng:
    print(rn)

another range:

In [None]:
rangy = range(1, 11, 2)
for rn in rangy:
    print(rn)