![Py4Eng](img/logo.png)

# Dictionaries
## Yoav Ram

## Reminder: Lists
Lists are a data structure used to store **ordered** collections of elements (`int`, `float`, `str`, etc.).

In [1]:
organisms = ['Pan troglodytes', 'Gallus gallus', 'Xenopus laevis', 'Vipera palaestinae']

We access elements of lists by using their _index_:

In [2]:
print(organisms[0])
print(organisms[2])

Pan troglodytes
Xenopus laevis


## Dictionaries

__Dictionaries__ are _hashtables_: a data structure used to store collections of elements to be accessed with a _key_. Keys can be of any _immutable_ type - strings, integers, floats, etc. Each key refers to a _value_.

### Defining dictionaries:

In [9]:
taxonomy = {
    'Pan troglodytes': 'Mammalia', 
    'Gallus gallus': 'Aves', 
    'Xenopus laevis': 'Amphibia', 
    'Vipera palaestinae': 'Reptilia'
}

In this dictionary, the _keys_ are the organisms and the _values_ are the taxonomic classification of each organism. Both are of type `str`.

Another example would be a dictionary representing the number of observations of various species:

In [4]:
observations = {
    'Equus zebra': 143,
    'Hippopotamus amphibius': 27,
    'Giraffa camelopardalis': 71,
    'Panthera leo': 112
}

Here, the keys are of type `str` and the values are of type `int`. Any other combination could be used.

### Accessing dictionary records
Accessing a dictionary record is similar to what we did with lists, only this time we'll use a _key_ instead of an _index_:

In [10]:
print(taxonomy['Pan troglodytes'])
print(taxonomy['Gallus gallus'])

Mammalia
Aves


### Changing and adding records
We can change the dictionary by simply assigning a new value to a key.

In [11]:
taxonomy['Pan troglodytes'] = 'Mammals'
print(taxonomy['Pan troglodytes'])

Mammals


Similarly, we can use this syntax to add new records: 

In [12]:
taxonomy['Danio rerio'] = 'Actinopterygii'
print(taxonomy['Danio rerio'])

Actinopterygii


__Note 1__: The fact that we can change elements of the dictionary and dynamically add more elements suggests that `dict` is a **mutable** type.

__Note 2__: A dictionary may not contain multiple records with the same _key_, but it may contain many keys with the same _value_.

### Looping over dictionary items

By default, `for` loops over the dictionary keys:

In [13]:
for organism in taxonomy:
    print('{} is of class {}'.format(organism, taxonomy[organism]))

Pan troglodytes is of class Mammals
Gallus gallus is of class Aves
Xenopus laevis is of class Amphibia
Vipera palaestinae is of class Reptilia
Danio rerio is of class Actinopterygii


**Note**: the order of the keys in the dictionary items is **arbitrary** in Python <=3.5, and **ordered** in Python 3.6, but the fact it is ordered is an implelemtation detail rather than part of the specification, so we should not rely on this order. If you need a **explicitly ordered** dictionary, use [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict).

We can even change values while looping, as this doesn't affect the keys collection (changing what you loop over is dangerous!):

In [14]:
for animal in observations:
    observations[animal] = observations[animal] > 50
print(observations)

{'Equus zebra': True, 'Hippopotamus amphibius': False, 'Giraffa camelopardalis': True, 'Panthera leo': True}


We can check if a dictionary contains a *key* using the `in` operator:

In [15]:
'Vipera palaestinae' in taxonomy

True

In [17]:
'Bos taurus' in taxonomy

False

In [19]:
for organism in ('Vipera palaestinae', 'Bos taurus', 'Drosophila melanogaster'):
    if organism in taxonomy:
        print('{} is of class {}'.format(organism, taxonomy[organism]))
    else:
        print('{} not found'.format(organism))

Vipera palaestinae is of class Reptilia
Bos taurus not found
Drosophila melanogaster not found


The above code uses an idiom called _peak before you leap_ - checking if a key is in the dictionary before getting it's value to avoid a `KeyError`.

Another way to do it, which is usually prefered, is the _Easier to ask forgivenss than to ask permission_, which uses exceptions:

In [20]:
for organism in ('Vipera palaestinae', 'Bos taurus', 'Drosophila melanogaster'):
    try:
        print('{} is of class {}'.format(organism, taxonomy[organism]))
    except KeyError:
        print('{} not found'.format(organism))

Vipera palaestinae is of class Reptilia
Bos taurus not found
Drosophila melanogaster not found


Although exception are somewhat less efficient than `if` in terms of performance, in the latter example we do only a single lookup (no `in`) and moreover, it is stable in multi-threaded applications, whereas in the former example a different thread could in principle change the dictionary between the check (`in`) and the access (`[..]`).

## Exercise - secret

Given in the code below is a dictionary (named `code`) where the keys represent encrypted characters and the values are the corresponding decrypted characters. Use the dictionary to decrypt an ecnrypted message (named `secret`) and print out the resulting cleartext message.

In [21]:
secret = """Mq osakk le eh ue usq qhp, mq osakk xzlsu zh Xcahgq,
mq osakk xzlsu eh usq oqao ahp egqaho,
mq osakk xzlsu mzus lcemzhl gehxzpqhgq ahp lcemzhl oucqhlus zh usq azc, mq osakk pqxqhp ebc Zokahp, msauqjqc usq geou dat rq,
mq osakk xzlsu eh usq rqagsqo,
mq osakk xzlsu eh usq kahpzhl lcebhpo,
mq osakk xzlsu zh usq xzqkpo ahp zh usq oucqquo,
mq osakk xzlsu zh usq szkko;
mq osakk hqjqc obccqhpqc, ahp qjqh zx, mszgs Z pe heu xec a dedqhu rqkzqjq, uszo Zokahp ec a kaclq iacu ex zu mqcq obrfblauqp ahp ouacjzhl, usqh ebc Qdizcq rqtehp usq oqao, acdqp ahp lbacpqp rt usq Rczuzos Xkqqu, mebkp gacct eh usq oucbllkq, bhuzk, zh Lep’o leep uzdq, usq Hqm Meckp, mzus akk zuo iemqc ahp dzlsu, ouqio xecus ue usq cqogbq ahp usq kzrqcauzeh ex usq ekp."""

code = {'w': 'x', 'L': 'G', 'c': 'r', 'x': 'f', 'G': 'C', 'E': 'O', 'h': 'n', 'O': 'S', 'y': 'q', 'R': 'B', 'd': 'm', 'f': 'j', 'i': 'p', 'o': 's', 'g': 'c', 'a': 'a', 'u': 't', 'k': 'l', 'q': 'e', 'r': 'b', 'V': 'Z', 'X': 'F', 'N': 'K', 'B': 'U', 'T': 'Y', 'M': 'W', 'U': 'T', 'm': 'w', 'C': 'R', 'J': 'V', 't': 'y', 'S': 'H', 'v': 'z', 'e': 'o', 'D': 'M', 'p': 'd', 'K': 'L', 'A': 'A', 'P': 'D', 'l': 'g', 's': 'h', 'W': 'X', 'H': 'N', 'j': 'v', 'z': 'i', 'I': 'P', 'b': 'u', 'Z': 'I', 'F': 'J', 'Y': 'Q', 'Q': 'E', 'n': 'k'}





# Sets

A [set](https://docs.python.org/3.5/tutorial/datastructures.html#sets) is an **unordered collection** with **unique elements**, similar to the mathematical concept of a [set](https://en.wikipedia.org/wiki/Set_%28mathematics%29) (קבוצה). 

Curly braces (`{}`) or the `set()` function can be used to create sets. 

In [22]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket) # duplicates have been removed
type(basket)

{'pear', 'orange', 'apple', 'banana'}


set

Basic uses include eliminating duplicate entries (as above, one apple and one orange were eliminated), and fast membership testing:

In [23]:
print('orange' in basket)
print('crabgrass' in basket)

True
False


Set objects also support set-theoretical operations like union, intersection, difference, and symmetric difference.

In [24]:
a = set('abracadabra')
b = set('alacazam')
print(a)
print(b)
type(b)

{'a', 'd', 'c', 'r', 'b'}
{'a', 'l', 'm', 'z', 'c'}


set

Letters in `a` but not in `b`:

In [25]:
a - b

{'b', 'd', 'r'}

Letters in either `a` or `b`:

In [26]:
a | b

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

Letters in both `a` and `b`:

In [27]:
a & b

{'a', 'c'}

Letters in `a` or `b` but not both:

In [28]:
a ^ b

{'b', 'd', 'l', 'm', 'r', 'z'}

To create an empty set you have to use `set()`, not `{}`; the latter creates an empty dictionary.

In [29]:
Ø = set()
print(Ø)
type(Ø)

set()


set

Note that a `set` is mutable:

In [30]:
print(a)
a.add('z')
print(a)

{'a', 'd', 'c', 'r', 'b'}
{'a', 'd', 'z', 'c', 'r', 'b'}


The is also a immutable set, called `frozenset`:

In [31]:
a = frozenset('abracadabra')
print(type(a), a)
a.add('z')

<class 'frozenset'> frozenset({'a', 'd', 'c', 'r', 'b'})


AttributeError: 'frozenset' object has no attribute 'add'

## Colophon
This notebook was written by [Yoav Ram](http://python.yoavram.com) and is part of the [_Python for Engineers_](https://github.com/yoavram/Py4Eng) course.

The notebook was written using [Python](http://python.org/) 3.6.1.
Dependencies listed in [environment.yml](../environment.yml), full versions in [environment_full.yml](../environment_full.yml).

This work is licensed under a CC BY-NC-SA 4.0 International License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)