# Dictionaries and Sets

<style>
section.present > section.present { 
    max-height: 90%; 
    overflow-y: scroll;
}
</style>

<small><a href="https://colab.research.google.com/github/brandeis-jdelfino/cosi-10a/blob/main/lectures/notebooks/10_dictionaries_sets.ipynb">Link to interactive slides on Google Colab</a></small>

## Announcements

* PS2 grades are in LATTE, reach out to me with questions
* PS3 due Sunday 11:59pm; don't count on availability of help over the weekend
* PS3 guidance: Read the instructions carefully! In some problems, **especially Pig Latin**, the provided test cases don't cover every possible situation. 
   * You are required to add unit tests for Vowel Filter, but you are free to add them to other problems if it is helpful.

In the next 2 lectures, we'll cover the last 2 widely used data types in Python:
* Dictionaries: mappings from keys to values
* Sets: Unordered lists of unique values

## The exercise Exercise, revisited

We had 2 "parallel lists": names, and lists of which days those characters exercised. We used `zip` to iterate over them together.

In [None]:
names = ["Spongebob", "Batman", "Dora", "Peppa", "Bill Murray"]
data = [
    [1, 7, 15, 31],
    [2, 21],
    [5],
    [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29],
    [1, 2, 3, 4, 5, 6]
]

most = 0
most_name = ""
for (name, dates) in zip(names, data):
    if len(dates) > most:
        most = len(dates)
        most_name = name
print(f"{most_name} exercised the most, with {most} days of exercise!")

**Dictionaries** are a better data structure for this type of data. They map **keys** (names) to **values** (list of exercise dates).

Here's how that example would look with a dictionary.

In [None]:
data = {
    "Spongebob": [1, 7, 15, 31],
    "Batman": [2, 21],
    "Dora": [5],
    "Peppa": [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29],
    "Bill Murray": [1, 2, 3, 4, 5, 6]
}

most = 0
most_name = ""
for name in data:
    if len(data[name]) > most:
        most = len(data[name])
        most_name = name
print(f"{most_name} exercised the most, with {most} days of exercise!")

# Creating dictionaries

Dictionaries are created and represented with curly brackets: `{` and `}`.

In [None]:
distances_from_boston = {
    "NY, NY": 216,
    "Portland, ME": 110,
    "San Francisco, CA": 2692,
    "Auckland, NZ": 9002
}
print(distances_from_boston)

`distances_from_boston` holds 4 key/value pairs. All the keys are strings, all the values are integers. Keys can be any **immutable** type and values can be any type.

In [None]:
x = {}
print(x)

`x` is an empty dictionary.

Dictionary keys are unique. The same key can't appear more than once in a dictionary.

In [None]:
distances_from_boston = {
    "NY, NY": 216,
    "NY, NY": 110,
    "San Francisco, CA": 2692,
    "Auckland, NZ": 9002
}
print(distances_from_boston)

The first `NY, NY` value (`216`) was overwritten by the second (`110`) 

## Accessing dictionary elements

Dictionaries are accessed using the same bracket notation (`[]`) as lists, but with keys instead of indices.

In [None]:
number_words = {
    "one": 1,
    "four": 4,
    "one hundred": 100
}
print(number_words["four"])

In [None]:
print(number_words["one " + "hundred"])

Accessing an element that doesn't exist generates an error

In [None]:
print(number_words[2])

You can use the `.get()` method if you want to access an element that may not exist.

In [None]:
number_words = {
    "one": 1,
    "four": 4,
    "one hundred": 100
}
print(number_words.get("one"))

In [None]:
print(number_words.get("two"))

`get()` takes an optional argument the specifies the value to use if the key isn't found in the dictionary.

In [None]:
print(number_words.get("two", "unknown"))

You can check for the existince of a key with `in` (similar to lists and other sequences)

In [None]:
number_words = {
    "one": 1,
    "four": 4,
    "one hundred": 100
}
print("one" in number_words)
print(1 in number_words)

In [None]:
"ten" in number_words

## Modifying dictionaries

Assigning a value to a key adds a key/value pair to the dictionary if the key doesn't already exist. If the key does already exist, the value for that key is updated.

In [None]:
number_preferences = {
    1.5: False,
    2.7: True,
    3.9: False
}
print(number_preferences)

In [None]:
number_preferences[4.0] = "yes"
print(number_preferences)

In [None]:
number_preferences[1.5] = "no"
print(number_preferences)

You can delete from a dictionary using the `del` keyword.

In [None]:
stats = {
    "height": 67,
    "weight": 140,
    "eye color": "blue",
    "dominant hand": "left"
}
print(stats)
del stats["weight"]
print(stats)

## Iterating over dictionaries

You can use for loops to iterate over the keys in a dictionary

In [None]:
stats = {
    "height": 67,
    "weight": 140,
    "eye color": "blue",
    "dominant hand": "left"
}
for statname in stats:
    print(f"My {statname} is {stats[statname]}")

You can also iterate over key/value pairs with the `.items()` method.

In [None]:
stats = {
    "height": 67,
    "weight": 140,
    "eye color": "blue",
    "dominant hand": "left"
}
#print(list(stats.items()))
for (key, value) in stats.items():
    print(f"My {key} is {value}")
#for key in stats:
#    print(f"My {key} is {stats[key]}"))

`.items()` produces the keys/values as 2-tuples:

In [None]:
list(stats.items())

Our `for` loop used value unpacking - each key from `.items()` is unpacked into `statname`, and each value is unpacke into `value`:

In [None]:
for statname, value in stats.items():
    print(f"My {statname} is {value}")

## Exercise

Write a function to keep track of personal high scores for everyone who plays a game: `record_attempt(name, new_score, scores)`, where:
* `name` is a string
* `new_score` is an int
* `scores` is a dictionary where the keys are names, and the value is the best score for a name
* `record_attempt()` should modify `scores` in place

In [None]:
def record_attempt(name, new_score, scores):
    # ???
    return

In [None]:
# attempt 1
def record_attempt(name, new_score, scores):
    if new_score > scores[name]:
        scores[name] = new_score

In [None]:
scores = {}
record_attempt("Batman", 100, scores)
print(scores)

We can't use `[]` to check the existing score, because the name might not be in there yet.

In [None]:
# attempt 2 (correct)
def record_attempt(name, new_score, scores):
    if name not in scores:
        scores[name] = new_score
    elif scores[name] < new_score:
        scores[name] = new_score
        
    #if new_score > scores.get(name, 0):
    #    scores[name] = new_score

In [None]:
scores = {}
record_attempt("Batman", 100, scores)
print(scores)

In [None]:
scores = {}
record_attempt("Batman", 100, scores)
record_attempt("Superman", 10, scores)
record_attempt("Spongebob", 50, scores)
record_attempt("Spiderman", 1, scores)
record_attempt("Spiderman", 11, scores)
record_attempt("Spiderman", 110, scores)
print(scores)

# [Slido](https://wall.sli.do/event/h4FwBMY6vLLm4qKujQS37V?section=53bf946d-921c-4bda-8f4f-0e7fa6363572)

## Exercise

Write a function that prints the person with the highest overall score: `print_winner(scores)`

In [None]:
# attempt 1
def print_winner(scores):
    best_score = 0
    best_player = ""
    for player, score in scores.items():
        if score > best_score:
            best_score = score
            best_player = player
    print(f"Congratulations, {player}, you win with {score} points!")

In [None]:
scores = {}
record_attempt("Batman", 100, scores)
record_attempt("Superman", 10, scores)
record_attempt("Spongebob", 50, scores)
record_attempt("Spiderman", 1, scores)
record_attempt("Spiderman", 11, scores)
record_attempt("Spiderman", 110, scores)
print_winner(scores)

That output looks correct... but can you spot the bug?

In [None]:
# attempt 1
def print_winner(scores):
    best_score = 0
    best_player = ""
    for player, score in scores.items():
        if score > best_score:
            best_score = score
            best_player = player 
    print(f"Congratulations, {player}, you win with {score} points!")

In [None]:
scores = {}
record_attempt("Batman", 150, scores)
record_attempt("Spongebob", 50, scores)
record_attempt("Spiderman", 1, scores)
record_attempt("Spiderman", 11, scores)
record_attempt("Spiderman", 110, scores)
record_attempt("Superman", 10, scores)
print_winner(scores)

We printed the wrong variables at the end! This is a common type of error - make sure you test your code with a variety of inputs!

In [None]:
# attempt 2 (correct)
def print_winner(scores):
    best_score = 0
    best_player = ""
    for player, score in scores.items():
        if score > best_score:
            best_score = score
            best_player = player
    print(f"Congratulations, {best_player}, you win with {best_score} points!")

In [None]:
scores = {}
record_attempt("Batman", 150, scores)
record_attempt("Superman", 10, scores)
record_attempt("Spongebob", 50, scores)
record_attempt("Spiderman", 1, scores)
record_attempt("Spiderman", 11, scores)
record_attempt("Spiderman", 110, scores)
print_winner(scores)

## Exercise

Write a function that takes a string, and returns a dictionary representing the unique words in the string, and the number of times each word occurred.

In [None]:
def count_words(word_string):
    # ???
    return {}

In [None]:
# attempt 1
def count_words(word_string):
    words = word_string.split()
    counts = {}
    for word in words:
        counts[word] = counts[word] + 1
    return counts

In [None]:
count_words("It was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness")

We need to handle the initial condition, where the word hasn't been found yet.

In [None]:
# attempt 2
def count_words(word_string):
    words = word_string.split()
    counts = {}
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] = counts[word] + 1
    return counts

In [None]:
count_words("It was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness")

Whoops, looks like we have to deal with upper vs. lower case.

In [None]:
# attempt 3
def count_words(word_string):
    words = word_string.lower().split()
    counts = {}
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] = counts[word] + 1
    return counts

In [None]:
count_words("It was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness")

That was perfectly good code! But we can make it a little cleaner using `get()`:

In [None]:
# an alternative, using get()
def count_words(word_string):
    words = word_string.lower().split()
    counts = {}
    for word in words:
        counts[word] = counts.get(word, 0) + 1
    
    sortable = [(freq, word) for word, freq in counts.items()]
    sortable.sort(reverse=True)
    
    final_answer = []
    for (freq, word) in sortable:
        final_answer.append(word)
    
    return final_answer

In [None]:
count_words("It was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness")

## Dictionary comprehensions

Similar to lists, dictionaries can be created using "comprehensions":

In [None]:
squared_nums = {x: x**2 for x in range(5)}
print(squared_nums)

The formal syntax of a dictionary comprehension is:

`{<key expression>: <value expression> for <var> in <iterable>}`  
or  
`{<key expression>: <value expression> for <var> in <iterable> if <boolean expression>}`

Both the key and value can be any expression (as long as they evaluate to valid key and value types).

This code creates a dictionary that maps from vegetable name to the count of vowels in the vegetable name:

In [None]:
def count_vowels(word):
    count = 0
    
    for c in word:
        if c in 'aeiou':
            count += 1
    return count

In [None]:
# with a comprehension
vegetables = ["cucumber", "tomato", "pepper", "carrot"]
veggie_counts = {veggie: count_vowels(veggie) for veggie in vegetables}
print(veggie_counts)

In [None]:
# without a comprehension
vegetables = ["cucumber", "tomato", "pepper", "carrot"]
veggie_counts = {}
for veggie in vegetables:
    veggie_counts[veggie] = count_vowels(veggie)
print(veggie_counts)

This code does the same thing, except it limits the dictionary to vegetables that have at least 3 vowels in their names:

In [None]:
# with a comprehension
vegetables = ["cucumber", "tomato", "pepper", "carrot"]
veggie_counts = {veggie: count_vowels(veggie) for veggie in vegetables if count_vowels(veggie) > 2}
print(veggie_counts)

In [None]:
# without a comprehension
vegetables = ["cucumber", "tomato", "pepper", "carrot"]
veggie_counts = {}
for veggie in vegetables:
    num_vowels = count_vowels(veggie)
    if num_vowels > 2:
        veggie_counts[veggie] = num_vowels
print(veggie_counts)

One last example, which creates a dictionary with each vegetable as a key, and a random integer as each value:

In [None]:
import random 
vegetables = ["cucumber", "tomato", "pepper", "carrot"]
veggie_counts = {veggie: random.randint(0, 10) for veggie in vegetables}
print(veggie_counts)

# [Slido](https://wall.sli.do/event/h4FwBMY6vLLm4qKujQS37V?section=53bf946d-921c-4bda-8f4f-0e7fa6363572)

## Dictionaries review

Dictionaries are best used when you have data that consists of key/value pairs. 

A simple example is an old-school phone book: the names are keys, the phone numbers are the values.

In [None]:
phone_book = {
    "Harry": "555-1234",
    "Hermione": "555-4567",
    "Ron": "555-7890"
}

## Dictionaries review

Key/value data is not always that simple though. 

Think of a modern Contacts app: the keys may still be names, but the values might have multiple phone numbers, addresses, social media handles, etc. The values themselves might be lists, or dictionaries, combinations of the 2.

In [None]:
contacts = {
    "Harry": {
        "mobile": "555-1234",
        "address": "Number 4 Privet Drive",
        "instagram": "@harryp"
    },
    "Hermione": {        
        "mobile": "555-4567",
        "address": "8 Heathgate, Hampstead Garden Suburb, London",
        "instagram": "@hermgran"
    },
    "Ron": {        
        "mobile": "555-7890",
        "address": "The Burrow",
        "instagram": "@weaslier"
    }
}

## Dictionaries review

Accessing elements:
* If you know the key will exist in the dictionary: use brackets `[...]`
* If you aren't sure the key will exist: use `.get(...)`

## Brackets `[]`

In [None]:
drink_temps = {
    "Red Wine": 16.0,
    "White Wine": 10.0,
    "Beer": 8.0,
    "Black tea": 85.0,
    "Green tea": 74.0
}

for drink in drink_temps:
    print(f"Drink {drink} at {drink_temps[drink]}˚C")

Here, we know that each key will exist in the dictionary, so using brackets makes sense.

## `.get()`

In [None]:
drink_temps = {
    "Red Wine": 16.0,
    "White Wine": 10.0,
    "Beer": 8.0,
    "Black tea": 85.0,
    "Green tea": 74.0
}

drink = input("What drink do you want to know about? ")
print(f"Drink {drink} at {drink_temps.get(drink)}˚C")

Here, we don't know if the key will exist in the dictionary. If we used brackets, the code would raise an error if the user typed in a drink that wasn't in `drink_temps`. We can use `get()` to prevent an error from being raised.

## .get() is a convenience

You can always explicitly check for the existence of a key before trying to access it with brackets.

In [None]:
drink_temps = {
    "Red Wine": 16.0,
    "White Wine": 10.0,
    "Beer": 8.0,
    "Black tea": 85.0,
    "Green tea": 74.0
}

drink = input("What drink do you want to know about? ")
if drink in drink_temps:
    print(f"Drink {drink} at {drink_temps[drink]}˚C")
else:
    print(f"I don't know about {drink}")

## `.get()` with a default value

You can tell `.get()` what to return if the key does not exist in the dictionary.

In [None]:
drink_temps = {
    "Red Wine": 16.0,
    "White Wine": 10.0,
    "Beer": 8.0,
    "Black tea": 85.0,
    "Green tea": 74.0
}

drink = input("What drink do you want to know about? ")
print(f"Drink {drink} at {drink_temps.get(drink, 'unknown')}˚C")

This code demonstrates passing an extra parameter to `get()`. Now, if the users types in an unknown drink, `.get()` will return `"unknown"` instead of `None`.

# Sets

Sets are unordered collections of unique values. 

They are most commonly used when you want to remove duplicates from a collection of items, easily check if an item is in a collection, or compare/combine multiple collections of unique items.

## Creating sets 

Sets can be created from any iterable.

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])
print(names)

## Accessing set members

Because sets are unordered, there are no index-based accessors. You can use `in` to check for membership.

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])

In [None]:
"Batman" in names

In [None]:
"Peppa" in names

## Iterating over sets

Sets are iterable, which means you can loop through every member in a set with a `for` loop. 

They aren't sequences though - they don't have a well-defined order.

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])
for name in names:
    print(name)

## Modifying sets

`add()`, `remove()`, `clear()`

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])
names.add("Peppa")
print(names)

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])
names.remove("Batman")
print(names)

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])
names.clear()
print(names)

## Modifying sets

`update()` will add multiple items at once.

In [None]:
names = set(["Batman", "Spiderman", "Batman", "Spongebob"])
names.update(["Wolverine", "Batman", "Wolverine", "Magneto"])
print(names)

## Set operations

Sets provide equality, intersection, union, and difference operators.

## Equality

Sets can be compared with `==`

In [None]:
fluffy = set(["cat", "dog", "dandelion", "pillow"])
loud = set(["dog", "cat", "baby", "horn", "alarm"])
fluffy == loud

## Intersection 

The "intersection" of 2 (or more) sets is the set of elements which appear in each of the sets. The `&` operator performs an intersection on 2 sets.

In [None]:
fluffy = set(["cat", "dog", "dandelion", "pillow"])
loud = set(["dog", "cat", "baby", "horn", "alarm"])
fluffy & loud

## Union

The "union" of 2 (or more) sets is the set of elements that appear in any of the sets. The `|` operator performs a union on 2 sets.

In [None]:
fluffy = set(["cat", "dog", "dandelion", "pillow"])
loud = set(["dog", "cat", "baby", "horn", "alarm"])
fluffy | loud

## Subtraction / difference

The "difference" of 2 sets is the set of elements that appears in set A, but not in set B. The `-` operator performs a difference on 2 sets.

In [None]:
fluffy = set(["cat", "dog", "dandelion", "pillow"])
loud = set(["dog", "cat", "baby", "horn", "alarm"])
fluffy - loud

Order matters for subtraction.

In [None]:
loud - fluffy

## Symmetric difference ("xor")

The "symmetric difference" of 2 sets is the set of items that appear in either set, but not both. 

This is also sometimes referred to as `xor` ("exclusive or"), from boolean/digital logic. 

The `^` operator performs a symmetric difference.

In [None]:
fluffy = set(["cat", "dog", "dandelion", "pillow"])
loud = set(["dog", "cat", "baby", "horn", "alarm"])
fluffy ^ loud

## Superset / subset

Superset: a set contains all elements of another set  
Subset: a set is made up only of elements from another set

In [None]:
letters = set("abcdefghijklmnopqrstuvwxyz")
vowels = set("aeiou")
letters.issuperset(vowels)

In [None]:
vowels.issuperset(letters)

In [None]:
vowels.issubset(letters)

In [None]:
letters.issubset(vowels)

## Example

Write a program that prompts the user to enter the postal abbreviations for all 50 U.S. states. The player loses if they repeat a guess, or guesses something that isn't valid.

In [None]:
states = set([ 'AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA',
           'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
           'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
           'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
           'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'])

In [None]:
states = set([ 'AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA',
           'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
           'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
           'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
           'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'])
guesses = set()

print("Guess all 50 US state postal abbreviations, without repeating.")
while True:
    print("Guess? ")
    answer = input()
    if answer in guesses:
        print(f"You already guessed {answer}!")
        break
    if answer not in states:
        print(f"{answer} is not a valid postal code!")
        break
    guesses.add(answer)
    if guesses == states:
        print("You win!")
        break
    print(f"You've guessed {len(guesses)}/{len(states)} so far.")