# Chapter 3: Dictionary and Sets

## Dictionary

In [1]:
>>> a = dict(one=1, two=2, three=3)

>>> b = {'one': 1, 'two': 2, 'three': 3}

>>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))

>>> d = dict([('two', 2), ('one', 1), ('three', 3)])

>>> e = dict({'three': 3, 'one': 1, 'two': 2})

>>> a == b == c == d == e

True

### Dict Comprehensions

In [4]:
>>> DIAL_CODES = [ 
... (86, 'China'), 
... (91, 'India'), 
... (1, 'United States'), 
... (62, 'Indonesia'), 
... (55, 'Brazil'), 
... (92, 'Pakistan'), 
... (880, 'Bangladesh'), 
... (234, 'Nigeria'), 
... (7, 'Russia'), 
... (81, 'Japan'),
]

>>> country_code = {country: code for code, country in DIAL_CODES}

In [5]:
country_code

{'Bangladesh': 880,
 'Brazil': 55,
 'China': 86,
 'India': 91,
 'Indonesia': 62,
 'Japan': 81,
 'Nigeria': 234,
 'Pakistan': 92,
 'Russia': 7,
 'United States': 1}

In [9]:
{code: country.upper() for country, code in country_code.items() if code < 66}

{1: 'UNITED STATES', 7: 'RUSSIA', 55: 'BRAZIL', 62: 'INDONESIA'}

### defaultdict: Missing Keys

A defaultdict is configured to create items on demand whenever a missing key is searched.

In [None]:
import sys 
import re 
import collections

WORD_RE = re.compile('\w+')

index = collections.defaultdict(list) 

with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line): 
            word = match.group() 
            column_no = match.start()+1 
            location = (line_no, column_no) 
            index[word].append(location)

# print in alphabetical order 
for word in sorted(index, key=str.upper):
    print(word, index[word])

### Variations of dict
- collections.OrderedDict: Maintains keys in insertion order, allowing iteration over items in a predictable order.

# Set
## Set Theory

A set is a collection of unique objects. A basic use case is removing duplication.

In addition to guaranteeing uniqueness, the set types implement the essential set oper‐ ations as infix operators, so, given two sets a and b, $a | b$ returns their **union**, $a \& b$ computes the **intersection**, and $a - b$ the **difference**. Smart use of set operations can reduce both the line count and the runtime of Python programs, at the same time making code easier to read and reason about—by removing loops and lots of conditional logic.

In [17]:
>>> l = ['spam', 'spam', 'eggs', 'spam']

>>> set(l)

{'eggs', 'spam'}

In [18]:
list(set(l))

['eggs', 'spam']

In [None]:
# Count occurrences of needles in a haystack, both of type set
found = len(len(needles) & len(haystack))

# For version which is much slower
found = 0 
for n in needles:
    if n in haystack:
        found += 1

# Set Comprehensions

In [24]:
>>> from unicodedata import name

>>> {chr(i) for i in range(32, 256)}
{'§', '=', '¢', '#', '¤', '<', '¥', 'µ', '×', '$', '¶', '£', '©', '°', '+', '÷', '±', '>', '¬', '®', '%'}

{'#',
 '$',
 '%',
 '+',
 '<',
 '=',
 '>',
 '\xc2\xa2',
 '\xc2\xa3',
 '\xc2\xa4',
 '\xc2\xa5',
 '\xc2\xa7',
 '\xc2\xa9',
 '\xc2\xac',
 '\xc2\xae',
 '\xc2\xb0',
 '\xc2\xb1',
 '\xc2\xb5',
 '\xc2\xb6',
 '\xc3\x97',
 '\xc3\xb7'}

In [29]:
# Set operation:
a = {1, 2, 3}
b = {1, 4, 5}

a & b

{1}

In [26]:
a | b

{1, 2, 3, 4, 5}

In [30]:
a - b

{2, 3}

In [32]:
a ^ b # = (a - b) | (b - a)

{2, 3, 4, 5}

In [33]:
3 in a

True

In [35]:
a <= b

False

In [37]:
{1, 2} <= a # subset

True

In [38]:
{1, 2, 3, 4, 5} >= a # superset

True

### How dicts work

- dicts have significant memory overhead
- Key search is very fast
- Key ordering depends on insertion order
- Adding items to a dict may change the order of existing keys

### How sets work

- Set elements must be hashable objects.
- Sets have a significant memory overhead.
- Membership testing is very efficient.
- Element ordering depends on insertion order.
- Adding elements to a set may change the order of other elements.