<img src="https://github.com/christopherhuntley/BUAN5405-docs/blob/master/Slides/img/Dolan.png?raw=true" width="180px" align="right">

# Lesson 9: Dictionaries
_Associative arrays by another name_

# Learning Objectives

## Theory / Be able to explain ...
- The purpose and usage of associative arrays
- Python dictionaries as associative arrays
- Hashing and it's implications for dictionary keys

## Skills / Know how to  ...
- Display the hash for any dictionary key
- Iterate over dictionary items, keys, and values 
- Generate dictionaries from keys and values
- Use a dictionary comprehension for efficient dictionary generation

**What follows is adapted from Chapter 9 of the _Python For Everybody_ book. If you have not read it, then please do so before continuing on.**

---

In [None]:
#@title Lesson 9 Introduction
%%html
<div style="max-width: 1000px">
  <div style="position: relative;padding-bottom: 56.25%;height: 0;">
    <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  src="https://www.youtube.com/embed/9-P5eWyZCfQ" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
  </div>
</div>


## The Magic of Associative Arrays
>"A very little key will open a very heavy door." -― Charles Dickens, _Hunted Down_

After so many years of programming in C, I found myself using it for basically everything, until one day in 1994 I was asked by a very wise boss to try [AWK](https://en.wikipedia.org/wiki/AWK). AWK was a text processing language developed at Bell Labs in the 1970s by the same team that created C and Unix. It was designed to be a tiny domain-specific language for working with streaming text data. One would feed data to an AWK script one line at a time. AWK would then output text to an output file, also one line at a time. It could, of course, remember things from one line to the next, allowing it to accumulate information along the way. 

Soon I was using AWK for lots of text processing tasks. One notable application was to translate mainframe data into SQL code for loading into a relational database. Data would come in one line at a time and then right into the database. I think I got at least one promotion from just this one parlor trick. A year or so later, in late 1995 or early 1996, I used the same trick to develop a dashboarding web app that was cobbled together with AWK and bash scripts. No Perl. No Python. No PHP. No Java. Just AWK and bash on a Unix command line. I am still amazed that it worked but we never had a crash or any other bug reported.   

One of the reasons why I loved AWK so much was a feature called "associative arrays" where we could index a variable length array with text **keys** instead of integers. We could even mix keys with integer indexes if we liked. This meant that, for example, I could have an array of birthdays indexed by people's names. Or vice versa if that was what I wanted.  Or, I could create a histogram for words in a file with two lines of code. The potential uses seemed endless. Nothing could have been more convenient for a wannabe smart and lazy programmer. 

The Python equivalent of an associative array is a **dictionary**. It does many of the same things as a list but with keys instead of positions. Like associative arrays, there are an endless array of uses. If you have ever pulled data from a web API or added a Series to a DataFrame then you have used something like a dictionary. It's just how it's done. 

---
## Dictionaries as Collections of Key-Value Pairs
Python dictionaries have the type `dict`. Here's a brief example, followed by a few notes.

In [None]:
birthdays = {'Washington':'1732-02-22','Jefferson':'1743-04-13','Lincoln':'1809-02-12'}
birthdays['Madison']='1751-03-16'
for president in birthdays:
    print(president,"was born", birthdays[president])

Washington was born 1732-02-22
Jefferson was born 1743-04-13
Lincoln was born 1809-02-12
Madison was born 1751-03-16


- `dict` literals work like `list` literals except they use curly brackets `{}` instead of square brackets `[]`.
- `dict` indexes use **keys** of any **Hashable** type (more about this in a minute) instead of just integers. 
- the bracket operator `[]` is used for retrieving specific values, just like a list. 
- Dictionaries are mutable. We can add or remove key-value pairs as needed. The `+=` operator doesn't work though.  

In [None]:
birthdays += {'Adams Sr.':'1735-10-30'}

TypeError: unsupported operand type(s) for +=: 'dict' and 'dict'

### Hashing
To ensure data integrity, dictionary keys are required to be:
- **Unique**: If two items have the same key, then how do we know which is which?
- **Immutable**: If we can change the value of a key (e.g., via aliasing) then how does the dictionary let everybody know about it?
- **Printable**: If not printable/visible, then how can we humans use them safely? 

When passed an object, a **hashing** function generates a _printable_ **hash** or **digest** value that is _almost_ guaranteed to be unique. The odds of "collision" (i.e., two objects with the same hash) is very, very, very remote. Further, if the object being hashed is itself immutable then we have met all three requirements for dictionary keys:

1. Each key has a unique hash. If two keys are the same then they generate the same hash.
2. Because the key is required to be immutable, then so is the hash.
3. Hashes are printable as (typically) very long strings of characters or digits. So, even if the key itself isn't printable, its hash is. 

Besides its obvious integrity advantages, hashing of keys is also highly very efficient. Since hashes are convertible to strings or integers, we can sort them just like list positions. That makes using a key to lookup a value just as efficient as using an integer index to look up a value in a list. (Ever used a primary key or index to speed up a SQL query? That's exactly the same thing.)

While the precise hashing function may vary from data type to data type, the [default] uses a version of the Fowler-Noll-Vo algorithm which is outside the scope of this course. However, we can call the `hash()` standard library function on any immutable object with 100% predictable results:   

In [None]:
print(hash( 1 ))                         # int
print(hash( 2.3 ))                       # float
print(hash( "Mary Had a Little Lamb" ))  # string
print(hash( b'Mary Had a Little Lamb' )) # bytes (same as string)
print(hash( (1,2,3) ))                   # tuple, which is immutable
print(hash( hash ))                      # the hash function object
print(hash( [1,2,3] ))                   # list; oops that's mutable!

1
691752902764107778
-6819508771906632067
-6819508771906632067
2528502973977326415
873565


TypeError: unhashable type: 'list'

### Dictionary Traversal
When iterating over a `dict`, we can use one of three iterator _view_ methods that return list-like sequences:
- `keys()` which returns all keys
- `values()` which returns all the values
- `items()` which returns all the key-value pairs (a.k.a., "items")

When used in a `for` loop the default is to use the `keys()` iterator:

In [None]:
# the default iteration order
for president in birthdays:
    print(president,"was born", birthdays[president])
print("---")
# explicitly iterating over keys()
for key in birthdays.keys():
    print(key,"was born", birthdays[president])

Washington was born 1732-02-22
Jefferson was born 1743-04-13
Lincoln was born 1809-02-12
Madison was born 1751-03-16
---
Washington was born 1751-03-16
Jefferson was born 1751-03-16
Lincoln was born 1751-03-16
Madison was born 1751-03-16


However, we can also iterate over items or even values, though with somewhat differing results.

In [None]:
# iterating over items; each item is a tuple
for item in birthdays.items():
    print(item)
print("---")
# iterating over values()
for v in birthdays.values():
    print(v)

NameError: ignored

You may have noticed that the order is the same each time. As of Python 3.6, each iterator will always follow the order in which the keys were inserted into the dictionary. 

### Pulse Check ...
**Use the [`dict()` function](https://docs.python.org/3/library/stdtypes.html#dict) to create a new dictionary called `presidents` that swaps the keys and values of the `birthdays` dictionary.** Each key should be a birthdate and each value should be the associated president's last name.

In [None]:
# YOUR CODE HERE

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/DeJXfkTXFnk"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

---
## Pro Tips

### Generating `dict`s
In the examples so far, all of our `dict`s have been created as literals with `{}` or through the `dict()` function. However, dictionaries can be created in lots of curious ways. Just about any iteration process that generates paired sequences of keys and values can be used to create and populate a dictionary. 

In [None]:
d_keys =   ["Washington","Jefferson"]
d_values = ['1732-02-22','1743-04-13']

d = {}        # an empty dictionary
for i in range(len(d_keys)):
    d[d_keys[i]] = d_values[i]
d

{'Washington': '1732-02-22', 'Jefferson': '1743-04-13'}

While straightforward, this is not the most efficient way to generate a dictionary. There are actually two different one line equivalents that are both less code and more efficient. Both are explained below.

### `dict` Comprehensions
A dictionary comprehension is a lot like a list comprehension, which we covered in Lesson 8:
```python
{ key : value for item in items }
```
The key and/or value will vary from item to item.

In [None]:
# reuses the d_keys and d_values from before

{ d_keys[i] : d_values[i] for i in range(len(d_keys)) }

{'Washington': '1732-02-22', 'Jefferson': '1743-04-13'}

There are other allowed forms (e.g., the pairs can be specified as tuples) but this is the most commonly used one. 

### That One Weird Zip Dict Trick (Say that fast 3 times)
The `zip()` function converts several sequences of the same length into an iterator of tuples (immutable lists, covered in Lesson 10), where the each tuple is composed of corresponding items. 

In [None]:
bdays = ['1732-02-22','1743-04-13','1809-02-12']
presidents = ['Washington','Jefferson','Lincoln']

z = zip(bdays,presidents)  # z is an iterator
list(z)                    

[('1732-02-22', 'Washington'),
 ('1743-04-13', 'Jefferson'),
 ('1809-02-12', 'Lincoln')]

This can be very useful for generating dictionaries. Let one of the sequences be a list of keys and the other a list of values. When used with the `dict()` constructor we now have a quick and efficient way to zip the keys and values together into a single dict.

In [None]:
# bdays is the keys list
# presidents is the values list
dict(zip(bdays,presidents))  # Voila! a one line dict maker

{'1732-02-22': 'Washington',
 '1743-04-13': 'Jefferson',
 '1809-02-12': 'Lincoln'}

---
## Exercise

**1. Use your `waist2Hip_ratio()` function to process each of the dictionaries listed below.**
```python
[{'waist': 28, 'hip': 40, 'gender': 'F'},
 {'waist': 23, 'hip': 35, 'gender': 'F'},
 {'waist': 30, 'hip': 40, 'gender': 'M'},
 {'waist': 30, 'hip': 37, 'gender': 'M'},
 {'waist': 32, 'hip': 39, 'gender': 'M'}]
```

In [None]:
# YOUR CODE HERE

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/1Cz4eS2-wRA"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

**2. Rewrite your `roman2int()` function from Lesson 6 to use a dictionary and an iterator instead of the 14-clause `if` statement. The dictionary should have 13 keys, one for each of the patterns.** 

In [None]:
# YOUR CODE HERE

## Submit your work to GitHub
1. Save this Notebook.
2. Commit your changes and save/push to GitHub. Make sure to replace USERNAME with your GitHub username.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title Sync to GitHub
github_email = "" #@param {type:"string"}
github_username = "" #@param {type:"string"}
repo_name = "buan5405-lessons-USERNAME" #@param {type:"string"}
commit_comment = "Completed Lesson 9" #@param {type:"string"}

if github_email and github_username and repo_name and commit_comment :  
  repo_dir = "/content/drive/My Drive/Colab Notebooks/"+repo_name
  %cd {repo_dir}
  !git config --global user.email {github_email}
  !git config --global user.name {github_username}
  !git add .
  !git commit -m "{commit_comment}"
  !git push
else:
  print("Please fill out all fields")