In [1]:
from IPython.core.display import HTML
from IPython.lib.display import YouTubeVideo

def css_styling():
    styles = open("../Data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

In [8]:
import glob
import datetime
import string

# Synopsis

In this unit we will learn that:

1. **Dictionaries** are **mutable unordered** collections whose elements are accessed using **keys**.

    1. Dictionaries are created using the `{}` syntax.
    
    2. Dictionaries are composed of `key, value` pairs.

    3. Each `key, value` pair is called an *item*.
    
    4. `Items` can be added to a dictionary using the built-in method `update()`.
    
    5. `Items` can be changed using instanciation.
    
    6. `Items` can be removed usind the functions `del` and the method `pop()`.
    
2. Dictionaries allow nesting with all data types.

3. We can access all `items`, `keys`, and `values` in a dictionary.

+++

## Dictionaries

A Python dictionary is an extraordinarily useful data type that expands on the possibilities offered by lists.  In a list one keeps track of the elements by an index that must be an integer.  **Dictionaries keep track of elements by `key`!**

Each element in a dictionary is an **item**, and every `item` has both a **key** and a **value**. You use the `key` to "look up" the `value`. This concept is just like if we wanted to look up the meaning of a word in a real dictionary. Also, just like in a real dictionary, it means that all of the `keys` **must** be unique. If we had a `key` multiple times, then we wouldn't know where to go look up its `value`. Remember `sets`?  **The `keys` in a dictionary form a set!**

The syntax to create a dictionary also uses the syntax`{}`. If we are initializing a dictionary, we enter `key-value` pairs separated by commas; for each `item`, the key is separated from the value by a colon.

`a_dict = {key : value, another_key : another_value}`


## What are dictionaries good for?

Great that you would ask! Recall the project involving all the student records?  Dictionaries are **the** data type to deal with records.  What are `Date of Birth` and `Age` if not keys? 

Let's retrieve our code so that we can start seeing how great dictionaries are.

In [9]:
def find_student_records(directory):
    """
    Return all roster filenames in a directory
    
    input:
        directory - str, Directory that contains the roster files
    output:
        filenames - list, List of roster filenames in directory
    """
    filenames = glob.glob( directory + '/*.txt' )
    
    return filenames

In [10]:
def string2tuple_dob(dob_string):
    '''
    Takes a date string of "M/D/YY" and converts it to a tuple with month, day, and year as integers

    input:
        * dob_string - str, birthday string of form "M/D/YY"
    output:
        * dob - tuple, ("month", "day", "year")
    '''
    month, day, year = dob_string.split('/')
    month = int(month)
    day = int(day)
    year = int(year) + 1900
    dob = (month, day, year)
    
    return dob


def parse_student_record(filename):
    '''
    Parses a student record file into a list of tuples
    
    input:
        filename - str, path to the file
    output:
        data - list of tuples, student attribute data
    '''
    data = []
    file_in = open(filename, u"r")
    
    for line in file_in.readlines():
        if line[0] == '#' or line in string.whitespace:
            continue
        else:
            field, value = line.split(':')
            field = field.strip()
            value = value.strip()
            if field == 'Date of Birth':
                value = string2tuple_dob( value )
            data.append( (field, value) )
            
    return data


In [None]:
filenames = find_student_records('../../data/roster')
print(filenames[:2])
data = parse_student_record(filenames[0])

The list of tuples that we create for each student contains all information we have available. However, it is not particular easy to access any particular field.  Imagine we want to find the `Department` where `Agatha Brooks` studies. We need to find the index of the tuple for which the first element equals `Department` and then print the second element of that tuple.

In [None]:
for i in range( len(data) ):
    if data[i][0] == 'Department':
        print( data[i][1])
        break

Let's transform our list of tuple into a dictionary and check how much easier it is to retrieve the same information.

In [None]:
data_dict = {}
for i in range( len(data) ):
    data_dict[data[i][0]] = data[i][1]

print(data_dict)

In [None]:
data_dict['Department']

In [None]:
data_dict['Favorite Color']

Let's look into the properties of dictionaries.  Dictionaries are **mutable**. You can change the value of an element by re-assigning its value.

Dictionaries are **unordered**. If you print the same dictionary twice, the order in which `items` will be printed does not need to be the same. 

If we want to add a new `key-value` pair to the dictionary, we access a new `key` and assign it a value. If we want to add multiple `items` to a dictionary, we must use the built-in method `update()`.

In [None]:
data_dict['Favorite Sport'] = 'Soccer'
print(data_dict)
data_dict.update({'Favorite Sports Team': 'S. L. Benfica', 'Favorite Sports Team Mascot': 'Eagle'})
print(data_dict)

To remove a `key-value` pair from a `dict` variable, we can use `del` and provide the `key`. Guess what happens if you provide a `key` that does not exist? 

Alternatively, you can use the built-in method `pop()` and provide a `key`. This method deletes the `item` with `key` and returns the `value`.


In [None]:
del data_dict['Favorite Sports Team Mascot']
print( data_dict.pop('Favorite Sports Team') )
print( data_dict.pop('Favorite Sport') )

In [None]:
data_dict

We could access all elements of a list using contructions such as `for element in a_list` or `for i in range(len(a_list))`. 

Dictionaries have more types of information. We can access all `items`, or all `keys`, or all `values`.

In [None]:
print(data_dict.keys())   # It looks like as list of strings
print()
print(data_dict.items())  # It looks like a list of tuples
print()
print(data_dict.values()) # It looks like a list

Even though all these objects look like lists they are not lists. They are **iterators**.  This means that you can go in order and access each one in turn, but they are not accessible by index.

In [None]:
for value in data_dict.values():
    print(value)

print()
print( type( data_dict.values() ) )
print()
print( list(data_dict.values())[1] )
print( data_dict.values()[1] )

Working with dictionaries can be challenging when you are starting.  Accessing information by `key` is less natural for many of us.  Moreover, things can quickly become rather complex when nesting is involved. Keeping track of the elements in a list of dictionaries that contains lists of list is not easy task.  

As in many other situations, being organized and working out specific cases with paper and pen can make all the difference.

In order to gain experience with these challenges, let's create a list of dictionaries using the code for processing the roster files.  First, we have to change the function `parse_student_record()` to create a dictionary instead of a list of tuples.

In [None]:
def string2tuple_dob(dob_string):
    '''
    Takes a date string of "M/D/YY" and converts it to a tuple with month, day, and year as integers

    input:
        * dob_string - str, birthday string of form "M/D/YY"
    output:
        * dob - tuple, ("month", "day", "year")
    '''
    month, day, year = dob_string.split('/')
    month = int(month)
    day = int(day)
    year = int(year) + 1900
    dob = (month, day, year)
    
    return dob


def parse_student_record(filename):
    '''
    Parses a student record file into a list of tuples <-- CHANGE THIS
    
    input:
        filename - str, path to the file
    output:
        data - list of tuples, student attribute data <-- CHANGE THIS
    '''
    data = []   # CHANGE THIS
    file_in = open(filename, u"r")
    
    for line in file_in.readlines():
        if line[0] == '#' or line in string.whitespace:
            continue
        else:
            field, value = line.split(':')
            field = field.strip()
            value = value.strip()
            if field == 'Date of Birth':
                value = string2tuple_dob( value )
            data.append( (field, value) ) # CHANGE THIS
            
    return data


In [None]:
filenames = find_student_records('../Data/Day2-Collections-and-Files/Roster/')

all_records = []
for file in filenames[:20]:
    data = parse_student_record(file)
    all_records.append(data)
    
print(all_records[:2])

In order to see what we are up against, it is useful to know how to read the output above.  The information is enclosed inside `[]`.  We expect this, since `all_records` is a list.

We also know that since `all_records` is a list of dictionaries. Thus, each element in the list is going to be enclosed inside `{}` and separated by commas. Look for `}, {`. Those mark where a dictionary ends and the next begins.

Inside each dictionary, we have `key: value` pairs separated by commas. 

If you can read them, you can write a command that accesses them!

In [None]:
# Print the Department of the 5th student record


In [None]:
# Print the Height of the 15th student record


In [None]:
# Print the string "My name is ______ and I was born in _month___ of ___year." for the 15th student record


Well. That does not look good, does it?  It would be nice to have the actual month name instead of the number...  Why don't we find a way to do that conversion?

In [None]:
def convert_number2month( number ):
    """
    This function takes a number between 1 and 12 and returns the name of the corresponding month.
    
    input:
        number - int
    output:
        month - string
    """

    return month

In [None]:
# Print the string "My name is ______ and I was born in _month___ of ___year." for the 15th student record


In [None]:
# Print the string "My name is ______ and I love _color_." for the 25th student record


In [None]:
# # Print the string "My name is ______ and I love _color_." for the first 10 student records 


Ok, since you seem to get this, let's make it more interesting.

Please find for each student all other students that are within 5 years of their age.  Then add to the record of each student a list with the indices in the `all_records` of those other students.

## Using JSON to store and retrieve data

So far, we have performed calculation and used the results inside our program without the need to store complex data structures to a file.  However, many times we will want to store the result of a calculation for later instead of having to repeat it whenever we need the result.  

`JSON` (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. `JSON` is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. **These properties make JSON an ideal data-interchange language**.

JSON is built on two structures:

* A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
* An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

These are universal data structures. Virtually all modern programming languages support them in one form or another. 
It makes sense that a data format that is interchangeable with programming languages also be based on these structures.

In JSON, they take on these forms:

* An `object` is an unordered set of `name/value` pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
* An `array` is an ordered collection of `values`. An array begins with [ (left bracket) and ends with ] (right bracket). `Values` are separated by , (comma).

A `value` can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.


As you would expect, there is a library in Python for using JSON.

In [None]:
import json

The `json` [library](https://docs.python.org/3.4/library/json.html) gives us access to a number of methods.  In order to explore some of those methods, we will write to a file the data we created earlier.

In [None]:
with open('../Data/Day4-Dictionaries/records.json', 'w') as file_out:
    json.dump(all_records, file_out)

This is what the contents of the file look like:

`[{"Date of Birth": [1, 10, 1975], "Zodiac Sign": "January", "Department": "Engineering", "Weight": "220lbs", "Favorite Color": "Lime", "Height": "6ft,0in", "Favorite Animal": "Turtle", "Email Address": "agatha.bailey@northwestern.edu", "Name": "Agatha A. Bailey"}, {"Date of Birth": [3, 31, 1986], "Zodiac Sign": "March", "Department": "Music", "Weight": "194lbs", "Favorite Color": "Blue", "Height": "5ft,5in", "Favorite Animal": "Cat", "Email Address": "agatha.brooks@northwestern.edu", "Name": "Agatha A. Brooks"}, {"Date of Birth": [7, 24, 1989], "Zodiac Sign": "July", "Department": "Psychology", "Weight": "164lbs", "Favorite Color": "Yellow", "Height": "5ft,10in", "Favorite Animal": "Mouse", "Email Address": "agatha.campbell@northwestern.edu", "Name": "Agatha N. Campbell"},...`


We can read back the information stored in the file into a variable:


In [None]:
with open('../Data/Day4-Dictionaries/records.json', 'r') as file_in:
    loaded_records = json.load(file_in)

In [None]:
print(type(all_records))
print(type(loaded_records))
print(all_records[1])
print(loaded_records[1])

Notice any changes?