## 8.7 Summary

### 8.7.1 Maps and dictionaries

A map is an unordered collection of key–value pairs with unique keys.
The same value may be associated with different keys.
Python's `dict` data type implements the map ADT.

Operation | English | Python
:-|:-|:-
new | let _m_ be an empty map | `d = dict()`
size | │*m*│ | `len(d)`
membership | _key_ in _m_ | `key in d`
associate | let _m_(_key_) be _value_ | `d[key] = value`
lookup | _m_(_key_) | `d[key]`
remove | remove _m_(_key_) | `d.pop(key)`
equality | m1 = m2 | `d1 == d2`
inequality | m1 ≠ m2 | `d1 != d2`

Accessing a value raises a key error if the dictionary doesn't contain the given key.

Dictionaries are implemented with hash tables, described below.
Dictionaries take more memory than sequences with the same key–value pairs.
We assume all dictionary operations take constant time, except (in)equality,
which takes linear time in the size of the dictionary in the worst case.

Dictionaries are iterable:
`for key in a_dict` iterates over all keys in `a_dict` and
`for (key, value) in a_dict.items()` iterates over all key–value pairs.
While iterating over a dictionary, no key–value pair can be added or removed.
Iterating over a dictionary's keys or items takes linear time
in the size of the dictionary.

### 8.7.2 Lookup and hash tables

A map with natural numbers as keys can be implemented with a lookup table,
an array in which the indices are the keys and the items are the values.
Lookup tables are often used to store pre-computed values.

A hash table with separate chaining is a lookup table of sequences of key–value
pairs. Each sequence is called a slot or bucket.

The load factor is the number of pairs (size of the map) divided by the number
of slots (size of the table), i.e. the mean number of pairs per slot.
The lower the load factor, the more memory is used, but the higher the chance
that each slot has at most one pair.
If two different keys are associated to the same slot, a collision occurs.
With separate chaining, the collision resolution algorithm simply adds both
keys to the same slot.

The hash table is implemented with a dynamic array, to increase the
number of slots as the dictionary size increases and keep a low load factor.
When the table grows or shrinks the slots of all pairs have to be recomputed.

To search, add, replace or delete a value by key, we compute for the given key
the slot it must be in, and then do a linear search of the key in that slot.
With a low load factor, a hash function that reduces collisions,
and short keys, map operations take amortised constant time,
which we assume is the usual situation. In the worst case
(all pairs in the same slot), operations are linear in the size of the map.

In Python, lists and dictionaries aren't hashable, i.e. can't be used as keys,
to avoid inadvertently changing a key after it was inserted in the dictionary.
A tuple is hashable only if all its items are.

### 8.7.3 Sets

Sets are unordered collections without duplicate items.
Python's `set` class is implemented like a dictionary and thus
requires items to be hashable. Python's sets are iterable but not hashable.

Operation | Maths | English | Python | Complexity (best/worst)
:-|:-|:-|:-|:-
new | let _s_ be {}  |  let _s_ be the empty set | `s = set()` | Θ(1)
size | │*s*│ |  |`len(s)` | Θ(1)
membership | $i \in s$  | _i_ in _s_ | `item in s` or `item not in s` | Θ(1)
add  |  |   add _i_ to _s_  | `s.add(item)` | Θ(1)
remove   |   | remove _i_ from _s_   | `s.discard(item)` | Θ(1)
union | $s1 \cup s2$  | union of _s1_ and _s2_ | `s1.union(s2)` | Θ(│`s1`│ + │`s2`│)
intersection | $s1 \cap s2$  |  intersection of _s1_ and _s2_ | `s1 & s2` or `s1.intersection(s2)` | Θ(min(│`s1`│, │`s2`│)
difference | _s1_ − _s2_ | | `s1 - s2` or `s1.difference(s2)` | Θ(│`s1`│)
(proper) subset  | _s1_ $\subset$ _s2_ and _s1_ $\subseteq$ _s2_ | | `s1 < s2` and `s1 <= s2` | Θ(1) / Θ(│`s1`│)
(proper) superset |  _s1_ $\supset$ _s2_ and _s1_ $\supseteq$ _s2_ | | `s1 > s2` and `s1 >= s2` | Θ(1) /  Θ(│`s2`│)
(in)equality | _s1_ = _s2_ and _s1_ ≠ _s2_ | | `s1 == s2` and `s1 != s2` | Θ(1) / Θ(│`s1`│)

Set union can also be written as `s1 | s2`.

Two sets are disjoint if their intersection is empty.

### 8.7.4 Bags

A bag is an unordered collection of possibly duplicate items.
Bags are a generalisation of sets, allowing each item to occur more than once.
The number of occurrences is called the item's multiplicity.
The bag operations take the multiplicity of each item into account.

The `Counter` class in module `collections` is a subclass of `dict` where
each key is a unique item and the value is that item's multiplicity.
The items must be hashable.
Accessing a nonexistent key returns zero instead of raising an error.
Instances of `Counter` are iterable (in the same way as dictionaries)
but not hashable.

The `Counter` class supports the operations in the table above,
with the same complexities, except:

Operation | Python
:-|:-
new | `bag = Counter()`
size  | N/A  |
number of unique items  | `len(bag)`
add  | `bag[item] = bag[item] + 1`
remove | `bag[item] = bag[item] - 1`
multiplicity | `bag[item]`
inclusion | N/A

Methods `union`, `intersection` and `difference` aren't available either.
The multiplicity can be directly set, or increased/decreased by more than one.

⟵ [Previous section](08_6_counter.ipynb) | [Up](08-introduction.ipynb) | [Next section](../09_Practice-1/09-introduction.ipynb) ⟶