## The Python `dict`

`{key1: value1, key2: value2}`

Let's make a list of dictionaries:

In [None]:
data = [
    {'author': 'F. SCOTT FITZGERALD', 'text': 'Action is character'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'Every man is my superior in some way. In that, I learn of him'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'The purpose of life is not to be happy. It is to be useful, to be honorable, to be compassionate, to have it make some difference that you have lived and lived well'},
    {'author': 'Ralph Waldo Emerson', 'text': 'Every man alone is sincere.  At the entrance of a second persion, hypocrisy beings'},
    {'author': 'Majjha Nikaya', 'text': 'This is, because that is.  This is not, because that is not.  This is like this, because this is like that'}
]

We can select our first quote dictionary:

In [None]:
fitz = data[0]

fitz

We can iterate over the keys:

In [None]:
for k in fitz.keys():
    print(k)

And the same for the values:

In [None]:
for v in fitz.values():
    print(v)

## Exercise

Use `zip` to iterate over the keys & values at the same time:

In [None]:
for k, v in zip(fitz.values(), fitz.keys()):
    pass

We can iterate over both the keys & values at the same time:

In [None]:
for k, v in fitz.items():
    print(k, v)

Be careful with iterating over dicts - the order is not guranteed to be consistent.

## Dict comprehension

Similar sytax to the list comprehension.

Without a dict comprehension, we might do the following:

In [None]:
processed = {}
for k, v in fitz.items():
    processed[k.lower()] = v.lower()
    
processed 

A dict comprehension of the above:

In [None]:
{k.lower(): v.lower() for k, v in fitz.items()}

## Dict & JSON

The similarity between the Python dict and JSON:

In [None]:
fitz

In [None]:
import json

js = json.dumps(fitz)

js

In [None]:
type(js)

## Exercise

Dump the `quotes` list to a `.json` file - one row per record.

## Sets

Sets are useful for finding uniques:

In [None]:
set(['Bob', 'Bob', 'Dylan'])

Like dicts, sets are **hashed**
- this makes lookup constant time

In [None]:
#  this always takes the same time
#  even for very large sets
'Bob' in set(['Bob', 'Bob', 'Dylan'])

Sets are **unordered**

We can't index them:

In [None]:
zimmerman = set(['Bob', 'Bob', 'Dylan'])

In [None]:
zimmerman[0]

But we can iterate over them:

In [None]:
[i for i in zimmerman]

Common set operations include the **union** (a join):

In [None]:
beatles = set(['john', 'paul', 'george', 'ringo'])

new_york = beatles.union(zimmerman)

new_york

The **intersection** gets items in both:

In [None]:
new_york.intersection(beatles)

The **difference** does what it says on the tin:

In [None]:
new_york.difference(beatles)

The **symmetric distance** has elements from either, or (but not both):

In [None]:
new_york.symmetric_difference(beatles)

In [None]:
new_york.symmetric_difference(zimmerman)

## Practical

For the quotes dataset
- find the unique authors
- find the unique words
- words that appear in both Emerson's and Fitzgerald's quotes

In [None]:
quotes = [
    {'author': 'F. SCOTT FITZGERALD', 'text': 'Action is character'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'Every man is my superior in some way. In that, I learn of him'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'The purpose of life is not to be happy. It is to be useful, to be honorable, to be compassionate, to have it make some difference that you have lived and lived well'},
    {'author': 'Ralph Waldo Emerson', 'text': 'Every man alone is sincere.  At the entrance of a second persion, hypocrisy beings'},
    {'author': 'Majjha Nikaya', 'text': 'This is, because that is.  This is not, because that is not.  This is like this, because this is like that'}
]