# Collections and dictionaries

In this notebook, we're going to cover various collections of data, including dictionaries. These are important topics in Python and data-driven linguistics. Fundamentally, what we're learning about today are ways to store multiple related data values.


- array: ordered, items may be changed, but the size is fixed. the same value may appear multiple times. convenient for performing numerical operations.
- list: ordered, items may be changed, size may be changed. the same value may appear multiple times. less convenient than an array for performing numerical operations.
- tuple: ordered, and unchangeable. the same value may appear multiple times.
- set: unordered, impossible to index (i.e. you cannot use bracket notation). contains only unique values.
- dictionary: key-value pairs.

All of these data types come up often in Python, and the goal of this lecture is to distinguish them and show common uses. We'll also cover list comprehensions, which you can think of as shorthand for a for loop.

In [None]:
from datascience import *
import numpy as np

## Arrays

Arrays can be easily created using the datascience function `make_array`:

In [None]:
a = make_array(1, 2, 3)
a

In [None]:
type(a)

Recall that `Table` columns are actually arrays, and can be retrieved using the `.column(column_name)` method:

In [None]:
t = Table().with_columns('Name', make_array('Tom', 'Maria', 'Shrenek'),
                        'Age', make_array(22, 31, 25))
t

In [None]:
t.column('Age')

Operations using arrays are performed in an element-wise fashion. For instance, adding two arrays together adds each of their matching elements:

In [None]:
a = make_array(1, 2, 3)
b = make_array(2, 3, 4)

a + b

In [None]:
a * b

Values can be changed using bracket notation:

In [None]:
a = make_array(1, 2, 3)
a[2] = 17
a

Unique to arrays, you can change multiple values at once using a boolean array:

In [None]:
a = make_array(1, 2, 3, 1, 2, 2)
print(a == 2) # the boolean array used for indeing
# changing the value
a[a == 2] = 13
a

It is not possible to directly add to an array. However, you can make a new array with the added values using `np.append(array, values)`.

In [None]:
a = make_array(1, 2, 3)
b = make_array(2, 3, 4)

np.append(a, b)

Note that neither a nor b are changed with `np.append`, meaning that you should store the result of np.append as a variable.

In [None]:
print(a)
print(b)

In [None]:
np.append(a, 6)

## List
A list is a data structure composed of ordered values. A list is defined by a pair of square brackets:

In [None]:
x = [1, 2, 3]
x

Because the list is a built-in data type, it is the output of many Python functions. For instance, the `s.split(sep)` method returns a list (not an array):

In [None]:
s = 'cat,dog,fish'
x = s.split(',')
print(x)
type(x)

Converting between an array and a list is easy. To convert a list to an array, use `np.array(x)`. (This is the canonical way to make an array without `make_array`.)

In [None]:
x = [1, 2, 3]
np.array(x)

To convert an array to a list, use the `list()` function. 

*Note: similar functions exist for other data types: `int()`, `float()`, `str()`, ... This is technically referred to as **casting**.* 

In [None]:
a = make_array(1, 2, 3)
x = list(a)
print(x)
type(x)

You can use the `+` and `*` operators on lists, but unlike `numpy` arrays, these alter the *shape* of the data structure, not its values. Compare the following two cells with the equivalent operations using a `numpy` array:

In [None]:
x = [1, 2, 3]
y = [4, 5, 6]
x + y

In [None]:
x = [1, 2, 3]
x * 5

However, just like with a `numpy` array, you can use *slicing* to access pieces of a `list`:

In [None]:
a = make_array(1, 2, 3, 4, 5)
x = [1, 2, 3, 4, 5]
print(a[2:5])
print(x[2:5])

Only individual values may be changed at a time with lists:

In [None]:
x = [1, 2, 3, 4, 5]
x[2] = 7
print(x)

You can directly add to a list using the method `list.append(value)`:

In [None]:
x = [1, 2, 3]
x.append(4)
print(x)

Note that the value of `x` is changed with `.append(value)`. More technically `.append(value)` is in *in-place* operation, meaning that it doesn't return a new object. You can see this if you try to store the result of `.append(value)`. It returns the `None` object, which is a special data type that Python uses when a function or method does not return anything.

In [None]:
x = [1, 2, 3]
y = x.append(4)
print(y)

However, `.append(value)` still did its job: the value of `x` has been changed.

In [None]:
print(x)

Appending a list to another list adds the second list as an element in the list. This probably isn't what you want to do.

In [None]:
x = [1, 2, 3]
y = [4, 5]

x.append(y)
x

You can append the values of one list to another using the method `x.extend(y)`:

In [None]:
x = [1, 2, 3]
y = [4, 5]

x.extend(y)
x

## Copying arrays and lists
One extremely important thing to remember about arrays and lists is that variables simply point to them, meaning that altering the list will change the data for all of the variables pointing to them. This is probably easier to show in action:

In [None]:
a = make_array(1, 2, 3)
b = a # just another pointer to the array

a[2] = 4
print('a:', a)
print('b:', b)

Note that even though `a` was used to alter the array, `b` was changed too. This can be avoided by using `.copy()`:

In [None]:
a = make_array(1, 2, 3)
b = a.copy() # creates a new array

a[2] = 4
print('a:', a)
print('b:', b)

Lists work the same way, including the `.copy()` method. A shorthand way to copy the values of a list into a new objects is to use `list[:]`:

In [None]:
a = [1, 2, 3]
b = a # just another pointer to the list
c = a.copy() # creates a new list
d = a[:] # short-hand way to create a new list; this doesn't work with arrays

a[2] = 20
print('a:', a)
print('b:', b)
print('c:', c)
print('d:', d)

## Tuple
A *tuples* is a data structure composed of ordered values. Unlike a list they cannot be changed. They are created with parentheses:

In [None]:
tup = (1, 2, 3)
tup

Since tuples canot be changed, they are not particularly useful data structures. However, the inputs to functions are treated as tuples, so error messages may occasionally reference them.

## Set
A *set* is a data structure composed of unordered unique values. Sets can be created using curly brackets, though I cannot think of a reason to do this directly. Note that duplicate values are ignored.

In [None]:
s = {1, 2, 3, 1, 2, 3}
print(s)
type(s)

More commonly, the `set(collection)` function is an easy way to find unique values in an array or a tuple (alongside the other technique we learned, using `np.unique(collection)`:

In [None]:
a = [1, 2, 1, 2, 3]
set(a)

In [None]:
set(a)[0]

In [None]:
list(set(a))[0]

Some useful methods include `set.intersection(other)`, `set.difference(other)` and `set.union(other)`. `other` does not have to be a set and can be any collection object:

In [None]:
a = set([1, 2, 3])
b = [2, 3, 4]

print(a.intersection(b))
print(a.difference(b))
print(a.union(b))

##  Check if Item Exists
The keyword `in` came up in our discussion of `for` loops/iteration:

In [None]:
for name in make_array('Tom', 'Maria', 'Shrenek'):
    print(name)

The `in` keyword can also be used to check if a value is in any collection object. It returns a boolean indicating whether the value was in the collection.

In [None]:
a = make_array('Tom', 'Maria', 'Shrenek')
print('Tom' in a)
print('Keith' in a)

**This is extremely useful.** For instance, this can be paired with the control statement `if` to guide the flow of a program:

In [None]:
def shout_new(names, new):
    '''Prints in uppercase the names of unseen names.'''
    # loop through the new names
    for n in new:
        if n not in names:
            n_upper = n.upper()
            print('NEW NAME: {}'.format(n_upper))

In [None]:
names = make_array('Tom', 'Maria', 'Shrenek')
people = make_array('Sally', 'Tom', 'Leroy')

shout_new(names, people)

## List comprehensions
*List comprehensions* can be used to write loops in a single line. The syntax essentially involves running a loop "within" a list, as follows: 

### `[<some operation on i> for i in <some list>]`

In [None]:
original_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
identical_list = [i for i in original_list]
identical_list

In [None]:
same_thing = []
for i in original_list:
    same_thing.append(i)
same_thing

A list comprehension does not alter the original list:

In [None]:
list_incremented_by_one = [i+1 for i in original_list]
list_incremented_by_one

In [None]:
original_list

To be explicit about the compactness of list comprehension, note that the following two cells perform identical operations:

In [None]:
new_list = []
for i in original_list:
    new_list.append(i**2)
new_list

In [None]:
[i**2 for i in original_list]

As an aside, list comprehensions can be done with `numpy` arrays as well, though the result will be a list, not a `numpy` array. As with all lists, the result can be turned into a `numpy` array with `np.array(list)`.

In [None]:
a = make_array(4, 5, 6)
[i**2 for i in a]

In [None]:
np.array([i**2 for i in a])

`make_array(list)` from the `datascience` package operates a bit differently because it turns each argument into an element, so the equivalent functionality is `make_array(list)[0]`.

In [None]:
tmp = make_array([i**2 for i in a])
tmp

In [None]:
tmp[0]

## Dictionaries

*Dictionaries* are data structures composed of *key-value* pairs. Dictionaries can be used to map keys to values:
<img src="https://raw.githubusercontent.com/learn-co-curriculum/cssi-4.10-python-dictionaries/master/images/dictionary.png">

In [None]:
l = ["Eggs", "Milk", "Cheese", "Yogurt", "Butter", "More Cheese"]

In [None]:
l[4]

Dictionaries are defined using curly brackets:
`{key1: value1, key2: value2, key3: value3, ...}`:

In [None]:
d = {"Eggs" : 2.59, "Milk": 3.19, "Cheese": 4.80, "Yogurt": 1.35, "Butter": 2.59, "More Cheese": 6.19}
d

In [None]:
d['Butter']

You can build a dictionary incrementally as well. Instantiate an empty dictionary with `<name of dictionary> = {}`, and add values with `<name of dictionary>[<name of a key>] = <new value>`. The following dictionary maps the names of fruits to their colors.

In [None]:
fruits = {} # initialize an empty dictionary
fruits['apple'] = "red"
fruits['pear'] = "green"
fruits['banana'] = "yellow"
fruits

In [None]:
fruits['banana']

### `<name of dictionary>[<name of a key>]`
Use square brackets to access the values of a dictionary. For instance, say, I wanted to know how much my pet dinosaur weighed:

In [None]:
pets = {"dog" : 5, "cat": 2, "parrot": 0.5, "dinosaur": 100}
pets

In [None]:
weight = pets['dinosaur'] # How much does my dinosaur weigh?
print("The weight of my dinosaur weights", weight, "kg.")

### `<name of dictionary>.keys()`
The function `keys` can be used to get all of a dictionary's keys at once. 

In [None]:
pets.keys()

The output of `dict.keys()`, `dict.values()` and `dict.items()` are special Python objects. Unfortunately, you cannot index them (with bracket notation):

In [None]:
pets.keys()[0]

However, these objects can easily be turned into lists with the `list()` function.

In [None]:
list(pets.keys())

In [None]:
list(pets.keys())[0]

Don't try to directly turn the output into a numpy array. It won't work:

In [None]:
np.array(pets.keys())

### `<name of dictionary>.values()`
Similarly, the function `values` can be used to get all of a dictionary's values at once. 

In [None]:
pets.values()

### `<name of dictionary>.items()`
The function `items` returns all of the dictionary's keys and values at once. Each element in the result is a *tuple* of the form `(key, value)`.

In [None]:
pets

In [None]:
pets.items()

In [None]:
list(pets.items())[3]

One easy way to access the values of a dictionary is by iterating over it using one of the access methods (`.keys()`, `.values()` or `.items()`. For instance, if I wanted to access all the weights of my animals at once, I could do the following:

In [None]:
for key in pets.keys():
    weight = pets[key]
    print("My", key, "weighs", weight, "kg.")

In [None]:
for key, value in pets.items():
    print("My", key, "weighs", value, "kg.")

## Dictionary comprehensions

*Dictionary comprehensions* are like list comprehensions but for dictionaries. The key difference is the use of curly brackets.

For instance, consider the dictionary `pets` from earlier. You could reverse the dictionary in a single line using a dictionary comprehension:

In [None]:
pets

In [None]:
{pets[i] : i for i in pets}

Remember that curly brackets are also used to create a `set` object, so it is important to make sure that you include the `:` with key/value pairs. 

In [None]:
{pets[i] for i in pets}

In [None]:
type({pets[i] for i in pets})

## Nested dictionaries

A dictionary can have many levels. For example, let's pretend that I teach language courses. I could use a dictionary to store information on all of the classes that I teach. The class data could itself be organized as a dictionary, so that it contained each student's name connected to their grades:

In [None]:
teach = {"Greek": {"Katie":[100, 85],
                   "Bob":[70, 95],
                   "Div":[50, 65]},
        "Spanish": {"Vasilis":[10, 20],
                    "Elva":[100, 100]},
        "French":{'Laura':[165, 200, 187],
                  'Lars':[134, 182, 200],
                  'Tony':[200, 200, 200]}
        }
teach

The value linked to the key "Spanish" is another dictionary:

In [None]:
spanish_class = teach['Spanish']
spanish_class

In [None]:
spanish_class['Vasilis']

Keys can be integers as well. Other data types may work as well, but more likely the keys are strings. Note that this illustrates that the keys of a dictionary do not have to be the same data type: here `spanish_class` has a mixture of string and integer keys.

In [None]:
spanish_class[50] = [50, 50]

In [None]:
spanish_class