# Dicts and Sets

Dictionaries are a keystone of Python. Beyond the basic dict, the standard library offers handy, ready-to-use specialized mappings like defaultdict, OrderedDict, ChainMap and Counter, all defined in the collections module. The same module also provides the easy to extend UserDict class. 

Two powerful methods available in most mappings are setdefault and update. The setdefault method is used to update items holding mutable values, for example, in a dict of list values, to avoid redundant searches for the same key. The update method allows bulk insertion or overwriting of items from any other mapping, from iterables providing (key, value) pairs and from keyword arguments. 

Mapping constructors also use update internally, allowing instances to be initialized from mappings, iterables or keyword arguments. A clever hook in the mapping API is the __missing__ method, that lets you customize what happens when a key is not found. The collections.abc module provides the Mapping and MutableMapping abstract base classes for reference and type checking. The little-known MappingProxyType from the types module creates immutable mappings. There are also ABCs for Set and Mutable Set.


## Dictionaries

In [None]:
a = dict(one=1, two=2, three=3) 
b = {'one': 1, 'two': 2, 'three': 3} 
c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) 
d = dict([('two', 2), ('one', 1), ('three', 3)]) 
e = dict({'three': 3, 'one': 1, 'two': 2}) 
a == b == c == d == e

In [None]:
# Nice example
import re
WORD_RE = re.compile('\w+')

index = {}

with open('text.txt', encoding='utf-8') as fp:

    for line_no, line in enumerate(fp,1):

        for match in WORD_RE.finditer(line):
            word = match.group()
            col_no = match.start() + 1
            location = (line_no, col_no)
            index.setdefault(word,[]).append(location)

for word in sorted(index, key=str.upper):
    print(word,index[word])

## Sets

A new feature and usually misused.

In [None]:
import numpy as np

a = {1,2,3,4,5}
b = {3,8,9}

'basic operations a OR b {} a AND B {} a MINUS b {} a XOR b {}'.format(a & b,a | b,a - b, a ^ b)

In [None]:
a = np.arange(1000)
b = np.random.randint(low=0, high=1000, size=10)

#finding b in a is very simple, without loops
set(a) & set(b)

## List Comprehensions

In [None]:
from unicodedata import name
{chr(i) for i in range(32,256) if 'SIGN' in name(chr(i),'')}



## Practical consequences of implementation
1. Keys must be hashable
2. Dicts have significant memory overhead
3. Key searh is very fast
4. Key ordering depends on insertion ordering
5. It is a bad idea to add items while iterating
