# L13 - Dictionaries
Let's introduce a motivating example for the use of a dictionary. We have a text file hanks.txt that contains many different Hanks actors and the movies they are in. We want to count the number of movies each Hanks has done. 

     Hanks, Jim                          | Goofyfoot                         |            2010
     Hanks, Colin                        | High School                       |            2010

In [None]:
all_names = []
for line in open('hanks.txt', encoding = "ISO-8859-1"):
    words = line.strip().split('|')
    name = words[0].strip()
    all_names.append(name)
    
all_names.sort()
counts = []
for i in range(len(all_names)):
    if i == 0:
        counts.append([all_names[i], 1])
    elif all_names[i] == all_names[i-1]:
        counts[-1][1] += 1
    else:
        counts.append([all_names[i], 1])
        
print(counts)

We can accomplish this using lists, but it was very annoying and inefficient to do so. We had to go through the entire text document to compile all the names, sort the names, then go back through the sorted list and keep track of the counts list. There must be a better way.
# Introducing Dictionaries
A dictionary is a type that relates keys and values. Keys are commonly strings or numbers, but values can be anything you want.

In [None]:
weights = dict() # create empty dictionary
weights = {} # also creates an empty dictionary
weights['Chihuahua'] = 5.2
weights['German Shepherd'] = 62.1
weights['Cockapoo'] = 15.2
weights['Husky'] = 49.8

print(weights)

You can add keys and values to dictionaries using square brackets, []. Inside the bracket you put the key, and on the right side of the assignment operator you put the value. If the key does not exist yet in the dictionary, it will be created. If it already exists, the value will be overwritten. Notice the format of the print statement, you can also define dictionaries using that format there.

In [None]:
weights = {
    'Chihuahua' : 5.2,
    'German Shepherd' : 62.1,
    'Cockapoo' : 15.2,
    'Husky' : 49.8
}
print(weights)

If you are unsure if something exists in a dictionary, you can use the 'in' keyword to check a couple different ways.

In [None]:
print('Chihuahua' in weights)
print('Lab' in weights)
print(list(weights.keys()))
print('Cockapoo' in weights.keys())

You can also check to see if certain values exist in the dictionary.

In [None]:
print(15.2 in weights)
print(15.2 in weights.values())
print(list(weights.values()))

Note that by default, using 'in' on dictionary will query the keys, not the values. 

Another thing you'll notice is that dictionaries are not ordered. To extract a certain value, you use a key. So it does not matter what the ordering is. This means that in principle, a dictionary without any values is just a set. 

So how does this help with counting all the different Hanks? You can use the actors' names as keys and their movie total as values. Look how much easier it is with dictionaries.

In [None]:
actors = dict()
for line in open('hanks.txt', encoding = "ISO-8859-1"):
    words = line.strip().split('|')
    name = words[0].strip()
    if name in actors.keys():
        actors[name] += 1
    else:
        actors[name] = 1
        
for actor, num in actors.items():
    print("{:<18} has been in {:>2} movie(s).".format(actor, num))

All you need to do is check to see if the name is already a key. If it is, you increase their count, and if not, you give them a starting count. You'll also notice how we iterated through the dictionary using the items() method. This is the best way to iterate through a dictionary. You give two iterating variables, and every iteration the first will take on a new key and the second will take on the corresponding value. The same output can also be achieved by iterating just through the keys. Just access the value where you need to using the dictionary.

In [None]:
for actor in actors.keys():
    print("{:<18} has been in {:>2} movie(s).".format(actor, actors[actor]))

# More specifications
Dictionary keys can be any of the following: strings, ints, floats, booleans, or tuples (although floats and booleans are rarely ever used as keys).

Values, however, can be literally anything. This means they can even be lists or other dictionaries. Let's see how this works with the Hanks example.

In [None]:
actors = dict()
for line in open('hanks.txt', encoding = "ISO-8859-1"):
    words = line.strip().split('|')
    name = words[0].strip()
    if name in actors.keys():
        actors[name]['titles'].add((words[1].strip(), int(words[2].strip())))
        actors[name]['count'] += 1
    else:
        actors[name] = {
            'titles' : {(words[1].strip(), int(words[2].strip()))},
            'count' : 1
        }
        
for actor, details in actors.items():
    print("{} has been in {:>2} movie(s):".format(actor, details['count']))
    for movie, year in details['titles']:
        print('\t{} in {}'.format(movie, year))

Maybe you're starting to pick up on the theme of nesting things, but here it is once again. You can nest as many dictionaries as your computer's memory will allow. Similarly to nested lists, every layer requires another index. Look on line 6. We want to update the current actor's personal dictionary, so that dictionary is returned by the first pair of brackets. The second pair is then needed to access the 'titles' key inside the actor's personal dictionary. Same thing on the next line, except now we want to access the 'count' in that actor's personal dictionary. 

When we go to print out our findings, notice there are multiple loops now. The first grabes the key-value pairs from the big dictionary of actors where the key is the actor name and the value is their corresponding dictionary called details. To get the actor's movie count we request it from details. And when we want to print out all the movie titles for that particular actor, we access the set containing all the movies and years for that actor using details['titles']. We use two looping variables because each item in the set is a tuple (movie, year).
# Removing Values from Sets and Dictionaries
There are two ways to remove elements from sets:

    discard() method removes specified element and does nothing if it's not there
    pop() method removes random element from the set and returns the item removed
    
The del function removes keys (and subsequently their values) from dictionaries.


The clear() method works on both and removes everything.

In [None]:
s = {1, 4, 8, 'dog', 'cat'}
s.discard('cat')
print(s)

In [None]:
s.discard('cat')
removed = s.pop()
print('Removed', removed)
print(s)

In [None]:
print(weights)
del weights['Chihuahua']
print(weights)

Yeah the del function has weird syntax but that's just the way it is.

Another useful dictinoary method is update(). This merges two dictionaries.

In [None]:
grades_1 = {'Math':98, 'Science':97}
grades_2 = {'English':80}
grades_1.update(grades_2)
print(grades_1)

Lastly, remember that little thing called aliasing? Well it turns out that that happens to dictionaries too. They can get to be pretty big after all. Knowing that dictionaries get aliased, what do you think the output of this code is?

In [None]:
d = dict()
d[15] = 'hi' 
L = []
L.append(d) 
d[20] = 'bye' 
L.append(d.copy()) 
d[15] = 'hello' 
del d[20] 
print(L) 

Kind of a weird result, but something to keep in mind when storing dictionaries in containers. You need to use the copy() method if you want to preserve the original dictionary.