

# Fall 2018 Status: Just in Time ok

# Dictionaries

Dictionaries work like the mental model you probably have a dictionary. There are keys, these are the things that will be defined, and there are values or the definitions of the key. Dictionaries in Python are a lot like lists in that they hold collections of data.  However, as we discussed with for loops, they have fundamentally different purposes. As a result, they act in very different ways.  

*Important Tips:* Don't try to understand dictionaries like you understand lists.  Yes, they each hold things, but that's where similarites stop.

Perhaps the best way to consider dictionaries is as a small database. There are two pieces: unique IDs, and associated records. As noted in the first sentence, we call these keys and values.  

## Think of an apartment building

This type of data is all around us. In fact, we tend to sort things out this way when given a chance. Take, for example, an apartment. For ease of use, let's call it *your* apartment. 

Your apartment has a unique address.  First off, the street address tells you which unique buildng you want to find.  There may be several identical buildings, but each has a unique address.

Inside that apartment building there are unit IDs.  These may be letters (A, B, etc), labels (Garden, Penthouse, etc), or numbers (6, 303. etc).  Many apartment buildings share the same numbering systems, and this is fine because that apartment belongs to that building.  So the unique combination of street address and unit number can uniquly identify a unit in the same city.

Inside these units you can hold whatever you want.  Usually household goods, food, pets, collectables, human bodies, etc.  The idea is that in order to access this unit, you must go into a building and enter a specific ID.  The IDs in that unit must be unique.  

You also have to access things in order.  So you can't just directly go into an apartment number.  Even if that apartment door is outside, you still have to find the building.  Imagine saying, "Hi, I live in New York City and I likein apartment 3.  The party will be at 8pm."  Will anyone be at your party?  Not so much.

In this analogy, the building is our dictionary.  The street address is our variable name, and the unit number is our key.  You use the key with the variable name to access the contents of that unit.

That's pretty much it.

In [1]:
mybuilding = {1: ['cat', 'adult human', 'adult human',
                  'human child', 'rosy boa', "Dumeril's boa"],
              2: ['adult human'],
              3: ['adult human', 'adult human', 'human child', 'dog?']}

If I'm using this to represent my apartment, there are 3 units:  `1`, `2`, and `3`.  Each of these unit labels is actually my key.  See the : in there after each key?  That is separating the key and the value, and the order does matter.  It is always `key: value`.  Meanwhile, each of my values is a list that contains strings of the occupants.

As we start covering syntax, I suggest that you write them down or make a small cheat sheet for yourself.  You'll likely want to look it up each time you want to use it until you learn them by heart.

# Common patterns that you will want to look up constantly

## Accessing values with a key

Here's how we can look up the contents:  `dictvariable[key]`

In [2]:
print(mybuilding[1])

['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"]


In [3]:
print(mybuilding[2])

['adult human']


In [4]:
print(mybuilding[3])

['adult human', 'adult human', 'human child', 'dog?']


And if I try to look up a key that isn't in there?  I get a key error.

In [5]:
print(mybuilding[4])

KeyError: 4

Something to note is that these keys and values can be of any data type, and you can have a combination of them in the same dictionary.  You just need to match how that data type is actually typed in.  Don't forget the quotes if you have a string!

In [6]:
anotherbuilding = {'Garden': ['Adult human'],
                    2: ['Adult human', 'Adult human', 'cat']}

## Adding a new key/value pair

Say that I leart more about my neighbors and I want to add more things in.  I can add garages as well.

The syntax for adding a key/value looks like an assignment statement extending our lookup syntax.

`dictvariable[new_key] = new_value`

So you place the new key that you want to add in the `[]` and whatever the value is after the `=`.

In [7]:
mybuilding['garage 1'] = ['car', 'innumerable crap']

In [8]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'innumerable crap']}


Take a closer look at that syntax.  It has the `[]` with the new key on the left side of the `=` statement and only 1 `=`.  So I don't need to completely reassign my dictionary to change it.

## Changing the value with a given key

Warning!  This syntax is also shared for the reassignment.  If you reuse a key that is in the dictionary, it will overwrite that value without warning.

In [9]:
mybuilding['garage 1'] = ['car', 'bike', 'bike', 'innumerable crap']

In [10]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'bike', 'bike', 'innumerable crap']}


Alternatively, since our value for each key is in fact a mutator, we can directly reference that object using our access syntax and alter it.

Say that the person in apartment 2 has a baby and I want to add that to their record.  Since the data type of that value is a list, I can call .append() to that lookup statement.

In [11]:
mybuilding[2].append('human child')

In [12]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human', 'human child'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'bike', 'bike', 'innumerable crap']}


## checking to see if a key is in there

You'll often be looping over some data and building up a dictionary in the process.  Oftentimes you'll want to handle things defferently if that key is not already in the dictionary.  For example, let's say I want to add some new people to a unit not already in my dictionary.

In [13]:
mybuilding[5].append('adult human')

KeyError: 5

We can check this with the `in` keyword that we've seen elsewhere.  This will create a boolean expression that will return `True` if the key does exist in that dictionary, and `False` if it doesn't.

In [14]:
5 in mybuilding

False

I can't append something to a key that doesn't already exist, but if I use my assignment statement I'll destroy the data I already have in there.  Here's a fragment showing how we might handle such a situation.

In [15]:
new_member = 'adult human'
key = 5

if key in mybuilding:
    mybuilding[key].append()
else:
    mybuilding[key] = [new_member]

In [16]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human', 'human child'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'bike', 'bike', 'innumerable crap'], 5: ['adult human']}


This is an incredibly common pattern.  You'll also notice that I can reference my key as a variable.  Hint hint, this means you can loop over a set of keys and access the values in turn.

## Getting the keys and values out separately

There are several helpful dictionary methods to use.

Warning!  Dictionaries have no actual order.  The following methods will give you lists, which have order, but that order should not be depended on.

`mydict.keys()` will give you the keys and `mydict.values()` will give you the values.

In [17]:
print(list(mybuilding.keys()))

[1, 2, 3, 'garage 1', 5]


In [18]:
print(list(mybuilding.values()))

[['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], ['adult human', 'human child'], ['adult human', 'adult human', 'human child', 'dog?'], ['car', 'bike', 'bike', 'innumerable crap'], ['adult human']]


I know they may look like they are in order, but you shouldn't depend on that.

## getting the keys and values out together

Here's how you can get the pairs of data out such that the pair relationship can be reliably maintained.

In [19]:
print(list(mybuilding.items()))

[(1, ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"]), (2, ['adult human', 'human child']), (3, ['adult human', 'adult human', 'human child', 'dog?']), ('garage 1', ['car', 'bike', 'bike', 'innumerable crap']), (5, ['adult human'])]


These are tuple pairs, and are valuable when looping over them.

## looping over dictionaries

You may want to loop over the entire thing at once.  Using `.items()` for this is great for when you need both the key and value items out at the same time, and you want to keep your code very tidy.

Recall that `.items()` will give you a list that contains tuples with the key and value pairs.  You'll be looping over these tuples, and unpacking those values within the for loop.  This is why you see the `key, value` within the iterable variable space within the for loop declaration line.

The key for the pair is stored in `key` and the value is stored in the `value` variable.  This means that you can directly reference them within the loop without having to look up anything else in the process.

``` python
for key, value in mydict.items():
    print(key, value)
```

This pattern will print out lines that contain the key and the value content for each of the key/value pairs in your dictionary.

In [20]:
for key, value in mybuilding.items():
    print("Unit", key, "has", ", ".join(value))

Unit 1 has cat, adult human, adult human, human child, rosy boa, Dumeril's boa
Unit 2 has adult human, human child
Unit 3 has adult human, adult human, human child, dog?
Unit garage 1 has car, bike, bike, innumerable crap
Unit 5 has adult human


However, sometimes you might want to mess with the keys and only look up a few things.  (massive homework hint) In this case, you can extract out the list of keys, do what you need with it, and then loop through those remaining (LITERALLY DYING OF HINT HERE) keys.

Say that I only want the integer value keys.

In [21]:
allkeys = mybuilding.keys()

justints = []

for key in allkeys:
    if type(key) == int:
        justints.append(key)
    else:
        print("rejected!", key)
        
print("Remaining keys are:", justints)

for key in justints:
    print(key, "has:", ", ".join(mybuilding[key]))

rejected! garage 1
Remaining keys are: [1, 2, 3, 5]
1 has: cat, adult human, adult human, human child, rosy boa, Dumeril's boa
2 has: adult human, human child
3 has: adult human, adult human, human child, dog?
5 has: adult human


## Counting occurances

There are several things we can do to count occurances, but I suggest using the `Counter` object first.

Let's revisit last week's example of randomly counting to 100.  Below we've got code that runs the simulation 10,000 times and collects up the number of steps taken for each.

In [22]:
import random

# priming all the values
count_sum = 0
top = 100
step_count = 0

all_step_counts = []

for i in range(10000):
    while count_sum < 100:
        my_num =  random.randint(0, 10)

        count_sum += my_num # sum = sum + my_num
        step_count += 1 # step_count = step_count + 1
#     print(sum, step_count)
    all_step_counts.append(step_count)
    # reset!
    count_sum = 0
    step_count = 0

print("The average number of steps to take is:", sum(all_step_counts) / len(all_step_counts))

The average number of steps to take is: 20.5947


# Make a dictionary of the data

In [24]:
counts_dict = {} # make an empty dictionary

for one_count in all_step_counts:
    if one_count not in counts_dict:
        counts_dict[one_count] = 1 # start at one if it isn't already there
    else:
        counts_dict[one_count] += 1 # increment up by one if it isn't already there

In [25]:
print(counts_dict)

{24: 596, 17: 753, 22: 1156, 27: 158, 18: 1080, 20: 1385, 23: 834, 19: 1274, 21: 1314, 15: 175, 26: 251, 16: 390, 25: 418, 30: 19, 29: 31, 28: 83, 14: 64, 31: 7, 34: 2, 13: 6, 32: 3, 33: 1}


# Sorting

See how that previous one is in kind of a random order?  We might want to sort it by things.

Sorting an unordered list by a key or value and retaining that pair information along the way is rough.

Like sorting a column in a speadsheet, but oops, you forgot to tell it to move all the rest of the columns with the rows you are sorting.  

## Sorting by key (the easy one)

Totally not a problem!  That's the advantage.  You can extract all the keys out, mess around with the order, and then use that content to re-look up the values.

There are three steps:

1. Get the keys out
    * use list(the_dict.keys()) to make them a list
2. Sort the keys however you want
    * Use your usual list methods for this. (page 345 for the list of list methods)
    * Do this on your original list, but it will change it!
    * There's no assignment statement.
3. Loop through the new list, using the elements you're getting out to look up the values out of the dictionary.

You'll have to decide what to do about the things you're getting out.  As a start, you'll want to print out the pairs.


In [27]:
count_keys = list(counts_dict.keys()) # step 1, get the keys

count_keys.sort() # step 2, sort it

for one_count_key in count_keys:
    print(one_count_key, "appeared", counts_dict[one_count_key], "times")

13 appeared 6 times
14 appeared 64 times
15 appeared 175 times
16 appeared 390 times
17 appeared 753 times
18 appeared 1080 times
19 appeared 1274 times
20 appeared 1385 times
21 appeared 1314 times
22 appeared 1156 times
23 appeared 834 times
24 appeared 596 times
25 appeared 418 times
26 appeared 251 times
27 appeared 158 times
28 appeared 83 times
29 appeared 31 times
30 appeared 19 times
31 appeared 7 times
32 appeared 3 times
33 appeared 1 times
34 appeared 2 times


## Sort by values, the one that sucks

So this isn't even a dictionary thing, this is kind of a hack for lists that happens to be useful for dicts.

In [29]:
def byFreq(pair):
    return pair[1] # get the second item out, which is the value

pair_list = list(counts_dict.items()) # get all the item pairs

pair_list.sort(key = byFreq)

for pair in pair_list:
    print(pair[0], "appeared", pair[1], "times") 
    # this is a list! so I'm using positions, even though ya it totes looks like a key lookup

33 appeared 1 times
34 appeared 2 times
32 appeared 3 times
13 appeared 6 times
31 appeared 7 times
30 appeared 19 times
29 appeared 31 times
14 appeared 64 times
28 appeared 83 times
27 appeared 158 times
15 appeared 175 times
26 appeared 251 times
16 appeared 390 times
25 appeared 418 times
24 appeared 596 times
17 appeared 753 times
23 appeared 834 times
18 appeared 1080 times
22 appeared 1156 times
19 appeared 1274 times
21 appeared 1314 times
20 appeared 1385 times


In [None]:
def sup(num):
    return num * 1000



Now we have a list of 10,000 data points, but we want to count them up.  We can use `Counter` to do just this.

First we'll need to import it to bring the module into our namespace, then we can use the function.

In [23]:
from collections import Counter

print(Counter("hello howdy"))
print(Counter("hello howdy").most_common(2))

Counter({'h': 2, 'l': 2, 'o': 2, 'e': 1, ' ': 1, 'w': 1, 'd': 1, 'y': 1})
[('h', 2), ('l', 2)]


This tool takes a sequence (string, list, etc) and counts how many times each item occurs.  This isn't exactly a dictionary, but it acts a lot like one.  Let's apply it to our results.

In [24]:
countedsteps = Counter(all_step_counts)
print(countedsteps)

Counter({20: 1365, 19: 1347, 21: 1331, 22: 1117, 18: 1056, 23: 890, 17: 737, 24: 624, 25: 390, 16: 384, 26: 230, 15: 184, 27: 134, 28: 71, 29: 49, 14: 41, 30: 18, 13: 15, 31: 13, 32: 2, 34: 1, 33: 1})


In [25]:
# here are the unique values

counts = list(countedsteps.keys()) 

# now I can sort them

counts.sort()

print(counts)

[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]


In [26]:
# now I can loop in order

for step in counts:
    print(step, ":", countedsteps[step])

13 : 15
14 : 41
15 : 184
16 : 384
17 : 737
18 : 1056
19 : 1347
20 : 1365
21 : 1331
22 : 1117
23 : 890
24 : 624
25 : 390
26 : 230
27 : 134
28 : 71
29 : 49
30 : 18
31 : 13
32 : 2
33 : 1
34 : 1


In [27]:
# now we can do a janky histogram

for step in counts:
    print(step, ":", (countedsteps[step] // 15) * 'X')

13 : X
14 : XX
15 : XXXXXXXXXXXX
16 : XXXXXXXXXXXXXXXXXXXXXXXXX
17 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
18 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
19 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
21 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
22 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
23 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
24 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
25 : XXXXXXXXXXXXXXXXXXXXXXXXXX
26 : XXXXXXXXXXXXXXX
27 : XXXXXXXX
28 : XXXX
29 : XXX
30 : X
31 : 
32 : 
33 : 
34 : 


Likewise, we can filter things on the fly.  Say that we only want to see the values for steps 13 through 20.

In [28]:
for step in counts:
    if step >= 13 and step <= 20:
        print(step, ':', countedsteps[step])

13 : 15
14 : 41
15 : 184
16 : 384
17 : 737
18 : 1056
19 : 1347
20 : 1365


However, if we want to see the most common things, we don't need to do much that's fancy. The counter object comes with some nice things.  Be careful here, because counter really only works well with a large number of things to count.  Sometimes things get weird when you have repeated counts.  It'll give you the 10 most common items, not the items in the 10 most common values.

In [29]:
countedsteps.most_common()

[(20, 1365),
 (19, 1347),
 (21, 1331),
 (22, 1117),
 (18, 1056),
 (23, 890),
 (17, 737),
 (24, 624),
 (25, 390),
 (16, 384),
 (26, 230),
 (15, 184),
 (27, 134),
 (28, 71),
 (29, 49),
 (14, 41),
 (30, 18),
 (13, 15),
 (31, 13),
 (32, 2),
 (34, 1),
 (33, 1)]

Here's a little program we can run to actually grab the items with the top 6 frequencies.  We're doing this by way flipping the dictionary around.  Instead of having a unive value with a count, we want all the unique counts and their values.  

In [30]:
s = Counter("hello howdy how are you fine day more letters here hahahaha")
print(s.most_common())

tops = {}

howmany = 6

pos = 0
alls = s.most_common()

while len(tops) <= howmany:
    content, count = alls[pos]
    if count in tops:
        tops[count].append(content)
    else:
        tops[count] = [content]
    pos += 1
        
print(tops)

[(' ', 10), ('h', 8), ('e', 8), ('a', 6), ('o', 5), ('r', 4), ('l', 3), ('y', 3), ('w', 2), ('d', 2), ('t', 2), ('u', 1), ('f', 1), ('i', 1), ('n', 1), ('m', 1), ('s', 1)]
{10: [' '], 8: ['h', 'e'], 6: ['a'], 5: ['o'], 4: ['r'], 3: ['l', 'y'], 2: ['w']}


# More crappy state data

# heeeey I didn't finish this part 👍

Sometimes you'll need to create your own dictionary of things before you can get into the fun stuff. We're going to play with the data that we collected to create a dictionary of what we found.  So that we have the states as our keys and the cities as the values.

Remember, this is the code that we previously did.  We could add in the dictionary stuff into this code, but we don't have to.  Instead, we can directly use the list that we create to make the dictionary.

In [1]:
fileio = open('crappystatedata.txt', 'r')

text = fileio.read()
lines = text.split("\n")

allchunks = []
statechunk = []

foundfirst = False

for line in lines:
    if line.endswith("[edit]") and foundfirst == False:
#         print("This is the first state!", line)
        statechunk.append(line)
        foundfirst = True
    elif line.endswith("[edit]"):
#         print("a new state has begun!", line)
        # add completed state
        allchunks.append(statechunk)
        # reset the chunk
        statechunk = [line]
    else:
        statechunk.append(line)
    allchunks.append(statechunk)
        

This will be another accumulator pattern, and `{}` represents an empty list.  Remember that we can't add something to a dictionary that doesn't exist, and we can't define the dictionary in our loop, because it'll be erased each time the loop starts over.  So as we've done elsewhere, we need to declare out dictionary outside of and before our for loop through the data.

Remember the structure of our allstates list.  This is a list of lists, where the first element of the list is the state name and the remaining elements are all the cities.

Let's remember our list position lookups.

* "first element of" is `list[0]`
* "everything else (after the first)" is `list[1:]` We're omitting the stop position because there are a variable number of values in each, so when we don't provde a stop position it says "go until the end".

The state will be our key and the cities will be our value, so let's remind ourselves of the assignment syntax.

`dict[state] = cities`

In [32]:
collegecities = {}

for statechunk in allchunks:
    state = statechunk[0]
    cities = statechunk[1:]
    collegecities[state] = cities

This is great!  Except we've got some junk happening in the state name with the edit.  We can go back in and edit that to make it more readable.

In [34]:
collegecities = {}

for statechunk in allchunks:
    state = statechunk[0].replace("[edit]", "")
    cities = statechunk[1:]
    collegecities[state] = cities


In [35]:
collegecities.keys()

dict_keys(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'])