# 01_03: Dictionaries and sets

The other super important data structure in Python is the dictionary or dict. While lists and tuples give us a way to retrieve values by their numerical index, dictionaries associate values to unique keys. 

In [32]:
import math
import collections
import dataclasses
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as pp  

In [2]:
# Dicts are written with curly braces with items 
# separated by commas. Each Item is given as key 
# colon value
#In this example, the countries are they key, their capitals the value.
capitals = {'United States': 'Washington, DC', 'France': 'Paris', 'Italy': 'Rome'}

In [3]:
capitals

{'United States': 'Washington, DC', 'France': 'Paris', 'Italy': 'Rome'}

In [None]:
#Like the preceding data structures, the length of a dictionary is provided by len()
#The empty dictionary is empty curl braces
len(capitals), len({})

(3, 0)

In [5]:
#As with lists, values are accessed with a bracket notation
#Though with dictionaries we use a key, which is usually a string, rather than an index
capitals['Italy']

'Rome'

In [6]:
#We can do the same key-as-index to add items to the dictionary
capitals['Spain'] = 'Madrid'

In [7]:
capitals

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid'}

In [8]:
#Querying a dictionary for a nonexistent item results in a key error
capitals['Germany']

KeyError: 'Germany'

In [11]:
# We can avoid this by using the in operator to determine whether a key is present in a dictionary before we query it
'Germany' in capitals, 'Italy' in capitals
#This should produce False, True

(False, True)

In [12]:
morecapitals = {'Germany': 'Berlin', 'United Kingdom': 'London'}

In [None]:
#Dictionaries also have an unpack operator, in the form of **
#so we can combine two dictionaries by using the double unpack on the two dictionaries, and placing the unpacked dictionaries between two curly braces, as below
{**capitals, **morecapitals}
#Note that if keys are repeated, the last one is used.

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid',
 'Germany': 'Berlin',
 'United Kingdom': 'London'}

In [14]:
#We can also update a dict in place using another dict. 
capitals.update(morecapitals)

In [None]:
#This adds morecapitals to capitals
capitals

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid',
 'Germany': 'Berlin',
 'United Kingdom': 'London'}

In [16]:
# Similar to lists, we can delete items by key.
del capitals['United Kingdom']

In [17]:
capitals

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid',
 'Germany': 'Berlin'}

In [18]:
# In fact, keys do not need to be strings. Any Python obect that is hashable may be used as a name. Hashable meaning that Python can convert the object to a number
birthdays = {(7,15): 'Michele', (3,14): 'Albert'}
#This hashability is true for strings, numbers, and tuples, but NOT for lists.

In [19]:
birthdays[(7,15)]

'Michele'

In [None]:
#We can see the internal representation of the keys with hash.
hash('Italy'), hash((7,15))

(-135269660706365527, -2471369287409462312)

In [None]:
#Looping over a dictionary is very similar to looping over a list, but instead there are 3 types of loops we can write
#over keys...
for country in capitals:
    print(country)

United States
France
Italy
Spain
Germany


In [None]:
#(same as above)
for country in capitals.keys():
    print(country)

United States
France
Italy
Spain
Germany


In [23]:
capitals.keys()

dict_keys(['United States', 'France', 'Italy', 'Spain', 'Germany'])

In [24]:
list(capitals.keys())

['United States', 'France', 'Italy', 'Spain', 'Germany']

In [25]:
#...over values...
for capital in capitals.values():
    print(capital)

Washington, DC
Paris
Rome
Madrid
Berlin


In [None]:
#..or over both
#the pairs are known as items
for country, capital in capitals.items():
    print(country, capital)

United States Washington, DC
France Paris
Italy Rome
Spain Madrid
Germany Berlin


We can also write this more explicitly

In [28]:
# here are the keys of the dictionary
capitals.keys()

dict_keys(['United States', 'France', 'Italy', 'Spain', 'Germany'])

In [27]:
#This is a special iterative object, like range, but is one that can be turned into a list
list(capitals.keys())

['United States', 'France', 'Italy', 'Spain', 'Germany']

The other two dict looping constructs are over values or over keys and values together using tuple unpacking

From python 3.7 onwards, order of insertion is preserved for dicts
this means that when we loop over keys or items, we get them in the order that we originally inserted them, which was not always the case
The standard library defined a special object, ordereddict, to preserve the order, which is no longer necessary.

In [33]:

#There is another specialized dictionary data structure that is very useful; defaultdict, which we set up to return a default value instead of an error when an item has not been set
#Here our dfefault vlaue is "I don't know"
capitals_default = collections.defaultdict(lambda: "I don't know!")

In [34]:
#Now we will add the original capitals to this object
capitals_default.update(capitals)

In [35]:
#Now we if lookup an item that isn't defined in the Dict, it will pop out our default value, "I don't know"
capitals_default['Canada']

"I don't know!"

Dicts are very important in Python since they underlie many aspects of the language itself, such as the methods and atributes of classes which are stored internally in dicts. And the interface by which we access dict values using keys is also adopted in the Python data analysis library Pandas, so it's helpful for us to be familiar with it. 

In [36]:
#Finally, we will mention sets. They are essentially bags of items, like the mathematical construct
#They can be of any immutable type and they are never duplicated
#What this means is that in the below, only one of the two Africa's are acknowledged
continents = {'America', 'Europe', 'Asia', 'Oceania', 'Africa', 'Africa'}

In [37]:
#See, only one Africa
continents

{'Africa', 'America', 'Asia', 'Europe', 'Oceania'}

In [38]:
#You can still use in to check to see if an item is in a setIFrame
'Africa' in continents

True

In [39]:
#You can add items
continents.add('Antarctica')

In [40]:
#You can remove them
continents.remove('Antarctica')

In [41]:
#You can even loop over them, but be careful as there is NO indexing
for c in continents:
    print(c)

Asia
America
Africa
Oceania
Europe
