# Python Data Structures

# Lists

Python Lists are a flexible container that holds other objects in an ordered arrangment.  **A list is a general purpose, ordered data structure that allows you to change the data it holds (mutable).**

In [None]:
my_bag_has = ['lunch', 1.50, True, ["HitchHiker's Guide", 'Fluent Python', 'Tensorflow'], 42]
print(my_bag_has[3])
my_bag_has[3][0] += " to Python"
print(my_bag_has)

## Simple list comprehensions
A readable, efficient way to perform operations on lists.

In [None]:
# Evaluate ways to return the absolute value of each element in a list

# In python 3 this will return a range object, which is a generator
a = range(-10,11)

# Let's cast it to a list
a = list(a)

# return the absolute value of each element in the list a using
# 1) a for loop
# 2) a functional built-in, map
# 3) a list comprehension

# using for loop (try to avoid this)
abs_a_for = []
for elem in a:
    abs_a_for.append(abs(elem))

## with functional builtins (map)
abs_a_map = list(map(abs,a))

## using a list comprehension (very Pythonic and fast)
abs_a_lstcomp = [abs(x) for x in a]

print("Here are the resulting arrays")
print("          The original array: {0}\n".format(a))
print("              Using for loop: {0}".format(abs_a_for))
print("                   Using map: {0}".format(abs_a_map))
print("Using the list comprehension: {0}".format(abs_a_lstcomp))
print("\nAre the resulting arrays the same?")
print(abs_a_for == abs_a_map == abs_a_lstcomp)

### So which version is best?
Code readability is very imporant in Python, and that's one reason why list comprehensions are "Pythonic."
Of course, speed matters too.  So which one is fastest?

In [None]:
# import the timeit module
import timeit
number_executions = 1000

In [None]:
# define a larger list so there is more to time
lst = range(-20000, 20001)

In [None]:
# define absolute value functions

def abs_using_for(lst):
    """Returns a list containing the absolute values of all 
       elements in lst using a for loop.
    """
    abs_values = []
    for val in lst:
        abs_values.append(abs(val))
    return abs_values

def abs_using_map(lst):
    """Returns a list containing the absolute values of all 
       elements in lst using map.
    """
    return list(map(abs, lst))

def abs_using_lstcomp(lst):
    """Returns a list containing the absolute values of all 
       elements in lst using a list comprehension.
    """
    return [abs(val) for val in lst]

In [None]:
# use globals so lst is defined in the function call
time_for = timeit.timeit('abs_using_for(lst)', globals=globals(), number=number_executions) 
print("The for function takes {0:0.2f} seconds for {1} executions.".format(time_for, number_executions))

In [None]:
time_map = timeit.timeit('abs_using_map(lst)', globals=globals(), number=number_executions) 
print("The map function takes {0:0.2f} seconds for {1} executions.".format(time_map, number_executions))

In [None]:
time_lstcomp = timeit.timeit('abs_using_lstcomp(lst)', globals=globals(), number=number_executions) 
print("The list comp. takes {0:0.2f} seconds for {1} executions.".format(time_lstcomp, number_executions))

**Summary** Functional programming (map, filter, reduce) is often faster, but is not as readable nor as flexible as list comprehensions.  
Advice: aim to code in comprehensions (list, dictionary, set) and go functional if speed becomes an issue.  We'll get into Numpy later in the course and its vectorized way of doing computations is *the* fastest way to go.

## Filtering list comprehensions

In [None]:
## filter
a = ['', 'fee', '', '', '', 'fi', '', '', '', '', 'foo', '', '', '', '', '', 'fum']
b = [x for x in a if len(x) > 0]
print(b)

### <font color='gray'>List Comprehension Question</font>
Write a list comprehension that squares an item in the list below if it's an integer.

In [None]:
a_lst = [1, '4', 5, 'a', 0, 4]


<details><summary>
Click here for solution…
</summary>
`a_sqrd = [elem**2 for elem in a_lst if type(elem) is int]`
</details>

## Nested list comprehensions
You can do them, but do you want to?

In [None]:
nest_lst = [[1,2,3], [4,5,6], [7,8]]
print("The nested list:")
print(nest_lst)

flat_lst = []
for lst in nest_lst:
    for item in lst:
        flat_lst.append(item)
print("Flattened:")
print(flat_lst)
        
flat_lst2 = [item for lst in nest_lst for item in lst]

print("\nDo both methods for flattening result in the same list?")
print(flat_lst == flat_lst2)

## Having fun with Zip
Useful way to combine same length iterables.

In [None]:
a1 = [1,2,3]
a2 = ['a','b','c']
print("The lists, separately:")
print(a1)
print(a2)

print("\nZipped together:")
a1a2 = list(zip(a1,a2))
print(a1a2)

print("\nHow it's often used:")
for v1, v2 in zip(a1, a2):
    print(f"a1: {v1}, a2: {v2}")

### Tuples

Tuples are a lightweight (meaning relatively small in memory) immutable brother/sister of the `list`. Tuples are immutable, ordered collections.  Similar to lists, tuples are declared by passing an iterable to the `tuple()` constructor, with or without the syntactic parenthesis (this works because Python automatically interprets comma separated things that aren't specifically specified otherwise as tuples).  **If you want an ordered, lightweight data structure to hold unchanging data, use tuples.**

In [None]:
my_first_tuple = tuple([1, 2])
print(type(my_first_tuple))
my_first_tuple

In [None]:
my_second_tuple = (1, 2)
print(type(my_second_tuple))
my_second_tuple

In [None]:
my_third_tuple = 1, 2
print(type(my_third_tuple))
my_third_tuple

### <font color='gray'>Tuple Questions</font>

1. Make a tuple called `my_tuple` with the values `1` and `"hello"` in it. 
    1. How do you access the `1` in `my_tuple`?
    2. How do you access the `"hello"` in `my_tuple`?
2. Can you change the `"hello"` entry in `my_tuple` to `"hello there"`? Why or why not?

In [None]:
my_tuple = (1, 'hello')

<details><summary>
Click here for solution 1.A
</summary>
`my_tuple = (1, 'hello')`
`print(my_tuple[0])`
</details>

In [None]:
my_tuple[0]

<details><summary>
Click here for solution 1.B
</summary>
`my_tuple = (1, 'hello')`
`print(my_tuple[1])`
</details>

<details><summary>
Click here for solution 2
</summary>
You can't.  Tuple are immutable (you can't modify them after creation).
</details>

### Dictionaries

So far, the only collections that we have talked about are ordered.  These are great as containers if there is some intrinsic order to the data that we're storing. However, there are plenty of times when we don't care about order, either because it simply doesn't matter or because the data are associated with each other in a different way.  Dictionaries are useful because they link data (the value) to a key for fast look-up.  **When you want to link data or any object to some entity, use a dictionary.**

In [None]:
states_caps_dict = {'Georgia': 'Atlanta', 'Colorado': 'Denver', 'Indiana': 'Indianapolis'}
states_caps_dict

In [None]:
# for a standard dictionary, asking for a key that hasn't been assigned previously
states_caps_dict['Washington']

In [None]:
# .get method allows a default value to be supplied
states_caps_dict.get('Washington', 'State not found')

In [None]:
# but we could have assigned it:
states_caps_dict['Washington'] = 'Olympia'
states_caps_dict

In [None]:
# default dictionaries allow a default value to be set
from collections import defaultdict
states_caps = defaultdict(lambda: 'State not found')
states_caps.update(states_caps_dict)
print(states_caps)
states_caps['Oregon']
states_caps['Washington']

### <font color='gray'>Dictionary Questions</font>

1. Make a dictionary called `restaurant_types` that has the following associated `key-value` pairs: `('Red Lobster', 'Seafood')`, `('Burger King', 'Fast Food')`, `('Safeway', 'Groceries')`.

2. How do you find the resturant type for `'Burger King'`?
3. What if you don't know whether or not `'Outback Steakhouse'` is in the `resturant_types` dictionary - how would you go about trying to get it's resturant type and make sure that you won't get an error?

<details><summary>
Click here for solution 1
</summary>
```
restaurant_types = {'Red Lobster': 'Seafood', 'Burger King': 'Fast Food', 'Safeway': 'Groceries'}
print(restaurant_types)
```
</details>

<details><summary>
Click here for solution 2
</summary>
```
restaurant_types['Burger King']
```
</details>

<details><summary>
Click here for solution 3
</summary>
`restaurant_types.get('Outback Steakhouse', 'Restaurant not in dictionary.')`
</details>

## Sets

A set combines some of the features of both the `list` and the `dictionary`. A set is defined as an unordered, mutable collection of unique items. This means that a `set` is a data structure where you can store items, without caring about their order and knowing that there will be at most one of them in the structure.  Sets use a hash to link each item to membership or not.  **If you are going to check membership in a data structure, use a set.**

In [None]:
# how to initialize
my_set = set([1, 2, 3])
my_other_set = {1, 2, 3}
my_empty_set = {}  # uh, no....
print(type(my_set))
print(type(my_other_set))
print(type(my_empty_set)) # see?

In [None]:
empty_set_2 = set()
print(type(empty_set_2))

In [None]:
my_set = {1, 2, 3}
my_other_set = {5, 6, 7}
my_set.union(my_other_set)

In [None]:
my_set.add(4)
my_set

In [None]:
my_set.update(my_other_set)
my_set

In [None]:
my_set.remove(5)
my_set

In [None]:
my_set.intersection(my_other_set)

**Set Questions**

1. Make a set called `first_set` with the values 1-10 and another with the values 5-15 called `second_set`.
2. Add the value 11 to `first_set`.
3. Add the string `'hello'` to `second_set`.
4. Using one of the methods discussed above, find what elements `first_set` and `second_set` have in common.

<details><summary>
Click here for answers
</summary>
```
first_set = set(range(1,11))
second_set = set(range(5,16))
print(first_set)
print(second_set)
first_set.add(11)
second_set.add('hello')
print(first_set)
print(second_set)
intersection = first_set.intersection(second_set)
print(intersection)  
```
</details>