## defaultdict

The `defaultdict` is a specialized dictionary found in the `collections` module. (It is a subclass of the `dict` type).

Approach with a traditional dictionary:

In [1]:
d = {}

# d['a']   # KeyError: 'a'

result = d.get('a')
print(type(result))

result = d.get('a', 100)
print(result)

d # the dictionary itself doesn't change

<class 'NoneType'>
100


{}

<br>

Approach with a `defaultdict`:

In [2]:
from collections import defaultdict

In [3]:
dd = defaultdict(lambda: 47)

print(dd['a'])

dict(dd)

47


{'a': 47}

In [4]:
# Variable of type defaultdict is still a usually dictionary
print(isinstance(dd, defaultdict))
print(isinstance(dd, dict))

True
True


<br>

Very often we can see what looks like a type specified as the default factory - but in fact it is a corresponding function.<br>
For example:

In [5]:
d = defaultdict(int)   # defaultdict(lambda: 0)
d['a']

0

In [6]:
d = defaultdict(bool)  # defaultdict(lambda: False)
d['a']

False

In [7]:
d = defaultdict(str)   # defaultdict(lambda: '')
d['a']

''

In [8]:
d = defaultdict(list)   # defaultdict(lambda: list())
d['a']

[]

<br>

Example.<br>
We have a dictionary structure that has people's names as keys, and a dictionary for the value that contains the person's eye color. We want to create a dictionary of eye colors, with a list of the people's names that have that eye color (and 'unknown' as the key if the eye color was not specified).

In [9]:
persons = {
    'john': {'age': 20, 'eye_color': 'blue'},
    'jack': {'age': 25, 'eye_color': 'brown'},
    'jill': {'age': 22, 'eye_color': 'blue'},
    'eric': {'age': 35},
    'michael': {'age': 27}
}

In [10]:
eye_colors = defaultdict(list)

for person, details in persons.items():
    color = details.get('eye_color', 'Unknown')
    eye_colors[color].append(person)
    
dict(eye_colors)

{'blue': ['john', 'jill'], 'brown': ['jack'], 'Unknown': ['eric', 'michael']}

<br>

Alternative approach:

In [11]:
from functools import partial

In [12]:
eyedict = partial(defaultdict, lambda: 'unknown')

In [13]:
persons = {
    'john': eyedict(age=20, eye_color='blue'),
    'jack': eyedict(age=20, eye_color='brown'),
    'jill': eyedict(age=22, eye_color='blue'),
    'eric': eyedict(age=35),
    'michael': eyedict(age=27)
}

In [14]:
eye_colors = defaultdict(list)

for person, details in persons.items():
    eye_colors[details['eye_color']].append(person)
    
dict(eye_colors)

{'blue': ['john', 'jill'], 'brown': ['jack'], 'unknown': ['eric', 'michael']}

<br>

<br>

## defaultdict

See notebook 'Standard data types/collections'.

<br>

<br>

## Counter

In [15]:
from collections import Counter

In [16]:
# sentence = 'the quick brown fox jumps over the lazy dog'
sentence = 'anti-aging technologies'

In [17]:
c1 = Counter()
for c in sentence:
    c1[c] += 1
    
c1

Counter({'a': 2,
         'n': 3,
         't': 2,
         'i': 3,
         '-': 1,
         'g': 3,
         ' ': 1,
         'e': 2,
         'c': 1,
         'h': 1,
         'o': 2,
         'l': 1,
         's': 1})

In [18]:
# or, easier
c1 = Counter(sentence)

c1

Counter({'a': 2,
         'n': 3,
         't': 2,
         'i': 3,
         '-': 1,
         'g': 3,
         ' ': 1,
         'e': 2,
         'c': 1,
         'h': 1,
         'o': 2,
         'l': 1,
         's': 1})

`Counter` has a slew of additional methods which make sense in the context of counters:

1. Iterate through all the elements of counters, but repeat the elements as many times as their frequency
2. Find the `n` most common (by frequency) elements
3. Decrement the counters based on another `Counter` (or iterable)
4. Increment the counters based on another `Counter` (or iterable)
5. Specialized constructor for additional flexibility

If you are familiar with multisets, then this is essentially a data structure that can be used for multisets.

In [19]:
import random

In [20]:
random.seed(0)

my_list = [random.randint(0, 10) for _ in range(1_000)]

c2 = Counter(my_list)
c2

Counter({6: 95,
         0: 97,
         4: 91,
         8: 76,
         7: 94,
         5: 89,
         9: 85,
         3: 80,
         2: 88,
         1: 107,
         10: 98})

We can also initialize a `Counter` object by passing in keyword arguments, or a dictionary:

In [21]:
c3 = Counter(a=1, b=10)
c3

Counter({'a': 1, 'b': 10})

In [22]:
c3 = Counter({'a': 1, 'b': 10})
c3

Counter({'a': 1, 'b': 10})

<br>

#### Finding the n most Common Elements

In [23]:
import re

In [24]:
sentence = "Life extension is the concept of extending the human lifespan, either modestly through improvements in medicine or dramatically by increasing the maximum lifespan beyond its generally-settled limit of 125 years. Several researchers in the area, along with 'life extensionists', 'immortalists' or 'longevists' (those who wish to achieve longer lives themselves), postulate that future breakthroughs in tissue rejuvenation, stem cells, regenerative medicine, molecular repair, gene therapy, pharmaceuticals and organ replacement (such as with artificial organs or xenotransplantations) will eventually enable humans to have indefinite lifespans through complete rejuvenation to a healthy youthful condition."

words = re.split('\W+', sentence)

words

['Life',
 'extension',
 'is',
 'the',
 'concept',
 'of',
 'extending',
 'the',
 'human',
 'lifespan',
 'either',
 'modestly',
 'through',
 'improvements',
 'in',
 'medicine',
 'or',
 'dramatically',
 'by',
 'increasing',
 'the',
 'maximum',
 'lifespan',
 'beyond',
 'its',
 'generally',
 'settled',
 'limit',
 'of',
 '125',
 'years',
 'Several',
 'researchers',
 'in',
 'the',
 'area',
 'along',
 'with',
 'life',
 'extensionists',
 'immortalists',
 'or',
 'longevists',
 'those',
 'who',
 'wish',
 'to',
 'achieve',
 'longer',
 'lives',
 'themselves',
 'postulate',
 'that',
 'future',
 'breakthroughs',
 'in',
 'tissue',
 'rejuvenation',
 'stem',
 'cells',
 'regenerative',
 'medicine',
 'molecular',
 'repair',
 'gene',
 'therapy',
 'pharmaceuticals',
 'and',
 'organ',
 'replacement',
 'such',
 'as',
 'with',
 'artificial',
 'organs',
 'or',
 'xenotransplantations',
 'will',
 'eventually',
 'enable',
 'humans',
 'to',
 'have',
 'indefinite',
 'lifespans',
 'through',
 'complete',
 'rejuvenati

In [25]:
word_count = Counter(map(str.lower, words))

word_count

Counter({'life': 2,
         'extension': 1,
         'is': 1,
         'the': 4,
         'concept': 1,
         'of': 2,
         'extending': 1,
         'human': 1,
         'lifespan': 2,
         'either': 1,
         'modestly': 1,
         'through': 2,
         'improvements': 1,
         'in': 3,
         'medicine': 2,
         'or': 3,
         'dramatically': 1,
         'by': 1,
         'increasing': 1,
         'maximum': 1,
         'beyond': 1,
         'its': 1,
         'generally': 1,
         'settled': 1,
         'limit': 1,
         '125': 1,
         'years': 1,
         'several': 1,
         'researchers': 1,
         'area': 1,
         'along': 1,
         'with': 2,
         'extensionists': 1,
         'immortalists': 1,
         'longevists': 1,
         'those': 1,
         'who': 1,
         'wish': 1,
         'to': 3,
         'achieve': 1,
         'longer': 1,
         'lives': 1,
         'themselves': 1,
         'postulate': 1,
         'that':

In [26]:
word_count.most_common(10)

[('the', 4),
 ('in', 3),
 ('or', 3),
 ('to', 3),
 ('life', 2),
 ('of', 2),
 ('lifespan', 2),
 ('through', 2),
 ('medicine', 2),
 ('with', 2)]

<br>

#### Using Repeated Iteration

In [27]:
c1 = Counter('abba')
c1

Counter({'a': 2, 'b': 2})

In [28]:
for c in c1:
    print(c)

a
b


However, we can have an iteration that repeats the counter keys as many times as the indicated frequency:

In [29]:
for c in c1.elements():
    print(c)

a
a
b
b


<br>

#### Updating from another Iterable or Counter

In [30]:
c1 = Counter(a=1, b=2, c=3)
c2 = Counter(b=1, c=2, d=3)

c1.update(c2)
print(c1)

Counter({'c': 5, 'b': 3, 'd': 3, 'a': 1})


In [31]:
# Instead of adding counters we can substract them
c1 = Counter(a=1, b=2, c=3)
c2 = Counter(b=1, c=2, d=3)

c1.subtract(c2)
print(c1)

Counter({'a': 1, 'b': 1, 'c': 1, 'd': -3})


Notice the key `d` - since `Counters` default missing keys to `0`, when `d: 3` in `c2` was subtracted from `c1`, the counter for `d` was defaulted to `0`.

In [32]:
# We can also use other types of arguments
c1 = Counter('aabbccddee')
print(c1)
c1.update('abcdef')
print(c1)

Counter({'a': 2, 'b': 2, 'c': 2, 'd': 2, 'e': 2})
Counter({'a': 3, 'b': 3, 'c': 3, 'd': 3, 'e': 3, 'f': 1})


<br>

#### Mathematical Operations

* `+`: same as `update`, but returns a new `Counter` object instead of an in-place update.
* `-`: subtracts one counter from another, but discards zero and negative values
* `&`: keeps the **minimum** of the key values
* `|`: keeps the **maximum** of the key values

In [33]:
c1 = Counter(a=2, b=2, c=2)
c2 = Counter(a=1, b=2, c=3)
c1 + c2

Counter({'a': 3, 'b': 4, 'c': 5})

In [34]:
c1 - c2

Counter({'a': 1})

In [35]:
c1 = Counter(a=5, b=1)
c2 = Counter(a=1, b=10)

c1 & c2

Counter({'a': 1, 'b': 1})

In [36]:
c1 | c2

Counter({'a': 5, 'b': 10})

The **unary** `+` can also be used to remove any non-positive count from the Counter:

In [37]:
c1 = Counter(a=10, b=0, c=-10)
+c1

Counter({'a': 10})

The **unary** `-` changes the sign of each counter, and removes any non-positive result:

In [38]:
-c1

Counter({'c': 10})

<br>

<br>

## ChainMap

In [39]:
from collections import ChainMap

Remember the `chain` function in the `itertools` module? That allowed us to chain multiple iterables together to look like a single iterable.

The `ChainMap` in the `collections` module is somewhat similar - it allows to chain multiple dictionaries (<i>mapping types more generally</i>) so it looks like a single mapping type.
But there are some wrinkles.

Let's look at some simple examples where we do not have key collisions first:

In [40]:
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}
d3 = {'e': 5, 'f': 6}

d = ChainMap(d1, d2, d3)

print(d)

ChainMap({'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6})


In [41]:
print(isinstance(d, dict))
print(type(d))

False
<class 'collections.ChainMap'>


The result is not a dictionary, but a mapping type that we can use almost like a dictionary.

In [42]:
for k, v in d.items():
    print(k, v)

e 5
f 6
c 3
d 4
a 1
b 2


**Note** that the iteration order here, unlike a regular Python dictionary, is **not** guaranteed!

<br>
CAUTION.<br>
What happens if we have key collision?

In [43]:
d1 = {'a': 1, 'b': 2}
d2 = {'b': 20, 'c': 3}    # 'b' in two dictionaries

d = ChainMap(d1, d2)

d['b']

2

As we can see, the value returned corresponds to the the value of the first key found in the chain.

In [44]:
for k, v in d.items():
    print(k, v)

b 2
c 3
a 1


<br>

Now let's look at how ChainMap objects handle inserts, deletes and updates.

In [45]:
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}

d = ChainMap(d1, d2)

d['z'] = 100

print(d)
print(d1)
print(d2)

ChainMap({'a': 1, 'b': 2, 'z': 100}, {'c': 3, 'd': 4})
{'a': 1, 'b': 2, 'z': 100}
{'c': 3, 'd': 4}


As we can see, the new element was added to the chain map and in the **first** uderlying dictionary.

Let's try to update `c`, which is in the second dictionary:

In [46]:
d['c'] = 300

print(d)
print(d1)
print(d2)

ChainMap({'a': 1, 'b': 2, 'z': 100, 'c': 300}, {'c': 3, 'd': 4})
{'a': 1, 'b': 2, 'z': 100, 'c': 300}
{'c': 3, 'd': 4}


Deletion of item:

In [47]:
del d['c']

print(d)
print(d1)
print(d2)

ChainMap({'a': 1, 'b': 2, 'z': 100}, {'c': 3, 'd': 4})
{'a': 1, 'b': 2, 'z': 100}
{'c': 3, 'd': 4}


In [48]:
d[c]  # now we use the instance of key 'c' in the second underlying dictionary

2

In [49]:
# del d['c']  # KeyError: "Key not found in the first mapping: 'c'"

We can't delete a key from the second underlying dictionary.

A `ChainMap` is built as a view on top of a sequence of mappings, and those maps are incorporated **by reference**.
This means that if an underlying map is mutated, then the `ChainMap` instance will **see** the change:

In [50]:
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}
d = ChainMap(d1, d2)

d1['a'] = 100

print(d)

ChainMap({'a': 100, 'b': 2}, {'c': 3, 'd': 4})


A `ChainMap` also has other commands, and there're different tricks in work with it.

##### Example

A typical application of a chain map is to create a mutable version of merged dictionaries that does not mutate the underlying dictionaries.

Remember that mutating elements of a chain map mutates the elements only in the first map.

Let's say we have a dictionary with some settings and we want to temporarily modify these settings, but without modifying the original dictionary.

We could certainly copy the dictionary and work with the copy, discarding the copy when we no longer need it - but this incurs some overhead copying all the data.

Instead we can use a chain map, by making the first dictionary in the chain a new empty dictionary - any updates we make will be made to that dictionary only, thereby preserving the other dictionaries.

In [51]:
config = {
    'host': 'prod.deepdive.com',
    'port': 5432,
    'database': 'deepdive',
    'user_id': 'my_user',
    'user_pwd': 'my_pwd'
}

In [52]:
local_config = ChainMap({}, config)

In [53]:
list(local_config.items())

[('host', 'prod.deepdive.com'),
 ('port', 5432),
 ('database', 'deepdive'),
 ('user_id', 'my_user'),
 ('user_pwd', 'my_pwd')]

And we can make changes to `local_config`:

In [54]:
local_config['user_id'] = 'test_user'
local_config['user_pwd'] = 'test_pwd'

list(local_config.items())

[('host', 'prod.deepdive.com'),
 ('port', 5432),
 ('database', 'deepdive'),
 ('user_id', 'test_user'),
 ('user_pwd', 'test_pwd')]

But notice that our original dictionary is unaffected:

In [55]:
list(config.items())

[('host', 'prod.deepdive.com'),
 ('port', 5432),
 ('database', 'deepdive'),
 ('user_id', 'my_user'),
 ('user_pwd', 'my_pwd')]

That's because the changes we made were reflected in the **first** dictionary in the chain, that earlier was empty:

In [56]:
local_config.maps

[{'user_id': 'test_user', 'user_pwd': 'test_pwd'},
 {'host': 'prod.deepdive.com',
  'port': 5432,
  'database': 'deepdive',
  'user_id': 'my_user',
  'user_pwd': 'my_pwd'}]