# Chapter 3. Dictionaries and Sets
---

## ToC


### [Automatic Handling of Missing Keys](#automatic-handling-of-missing-keys)
1. [Approach 1. defaultdict: Another Take on Missing Keys](#approach-1-defaultdict-another-take-on-missing-keys)
2. [Approach II. The `__missing__` Method](#approach-ii-the-__missing__-method)
3. [Inconsistent Usage of `__missing__` in the Standard Library](#inconsistent-usage-of-__missing__-in-the-standard-library)


---

In [2]:
import strkeydict0

## Automatic Handling of Missing Keys

Sometimes it is convenient to have mappings that return some made-up value when a
missing key is searched. There are two main approaches to this:  

    I. Use a `defaultdict` instead of a plain dict.  
    II. Subclass `dict` or any other mapping type and add a `__missing__` method

### Approach I. defaultdict: Another Take on Missing Keys

A `collections.defaultdict` instance creates items with a default value on demand whenever a missing key is searched using `d[k]` syntax.

When instantiating a `defaultdict`, you provide a callable to produce a default value whenever `__getitem__` is passed a nonexistent key argument.

For example, given a `defaultdict` created as `dd = defaultdict(list)`, if 'new-key' is not in dd, the expression `dd['new-key']` does the following steps:

1. Calls `list()` to create a new list.
2. Inserts the list into dd using `'new-key'` as key.
3. Returns a reference to that list.

The callable that produces the default values is held in an instance attribute named `default_factory`.

Revisiting the example from previous section:

In [None]:
import collections
import re
import sys

WORD_RE = re.compile(r'\w+')

# Create a defaultdict with the list constructor as default_factory
index = collections.defaultdict(list)
# for terminal 
# with open(sys.argv[1], encoding='utf-8') as fp:
# for notebook
with open("zen.txt", encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            index[word].append(location)
# display in alphabetical order
for word in sorted(index, key=str.upper):
    print(word, index[word])

a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)]
beats [(11, 23)]
Beautiful [(3, 1)]
better [(3, 14), (4, 13), (5, 11), (6, 12), (7, 9), (8, 11), (17, 8), (18, 25)]
break [(10, 40)]
by [(1, 20)]
cases [(10, 9)]
complex [(5, 23)]
Complex [(6, 1)]
complicated [(6, 24)]
counts [(9, 13)]
dense [(8, 23)]
do [(15, 64), (21, 48)]
Dutch [(16, 61)]
easy [(20, 26)]
enough [(10, 30)]
Errors [(12, 1)]
explain [(19, 34), (20, 34)]
Explicit [(4, 1)]
explicitly [(13, 8)]
face [(14, 8)]
first [(16, 41)]
Flat [(7, 1)]
good [(20, 55)]
great [(21, 28)]
guess [(14, 52)]
hard [(19, 26)]
honking [(21, 20)]
idea [(19, 54), (20, 60), (21, 34)]
If [(19, 1), (20, 1)]
implementation [(19, 8), (20, 8)]
implicit [(4, 25)]
In [(14, 1)]
is [(3, 11), (4, 10), (5, 8), (6, 9), (7, 6), (8, 8), (17, 5), (18, 16), (19, 23), (20, 23)]
it [(15, 67), (19, 43), (20, 43)]
let [(21, 42)]
m

If no `default_factory` is provided, the usual `KeyError` is raised for missing keys.


![Figure 37](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/37.PNG)

The mechanism that makes `defaultdict` work by calling default_factory is the
`__missing__` special method:

### Approach II. The `__missing__` Method

Underlying the way mappings deal with missing keys is the aptly named `__missing__`
method. This method is not defined in the base `dict` class, but `dict` is aware of it: if
you subclass `dict` and provide a `__missing__` method, the standard `dict.__getitem__` will call it whenever a key is not found, instead of raising `KeyError`

**Example:** When searching for a nonstring key, `StrKeyDict0` converts it to `str`
when it is not found

Tests for item retrieval using `d[key]` notation:

In [38]:
from strkeydict0 import StrKeyDict0
d = StrKeyDict0([('2', 'two'), ('4', 'four')])
d

{'2': 'two', '4': 'four'}

In [39]:
d['2']

'two'

In [40]:
d[4]

'four'

In [41]:
d[1]

KeyError: '1'

In [49]:
d['one']

KeyError: 'one'

Tests for item retrieval using `d.get(key)` notation:

In [42]:
d.get('2')

'two'

In [43]:
d.get(4)

'four'

In [44]:
d.get(1, 'N/A')

'N/A'

In [45]:
d.get(1, 'Dummy Key')

'Dummy Key'

Tests for the `in` operator:

In [46]:
2 in d

True

In [47]:
1 in d

False

![Figure 38](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/38.PNG)

**Q:** Why the test isinstance(key, str) is necessary in the `__missing__` implementation?  
Without that test, our `__missing__` method would work OK for any key `k`—`str` or not `str`—whenever `str(k)` produced an existing key. But if `str(k)` is not an existing key, we’d have an infinite recursion. In the last line of `__missing__`, `self[str(key)]`
would call `__getitem__`, passing that `str` key, which in turn would call `__missing__` again.

The author had a specific reason to use `self.keys()` in tht `__contains__` method:
The check for the unmodified key—key in `self.keys()`—is necessary for correctness because `StrKeyDict0` does not enforce that all keys in the dictionary must be of type str. Our only goal with this simple example is to make searching
"friendlier" and not enforce types.

![Figure 39](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/39.PNG)

### Inconsistent Usage of `__missing__` in the Standard Library

Consider the following scenarios, and how the missing key lookups are affected:

- #### `dict` subclass
    A subclass of dict implementing only `__missing__` and no other method. In this case, `__missing__` may be called only on `d[k]`, which will use the `__getitem__` inherited from dict.



- #### `collections.UserDict` subclass
    Likewise, a subclass of UserDict implementing only `__missing__` and no other
    method. The get method inherited from UserDict calls `__getitem__`. This
    means `__missing__` may be called to handle lookups with `d[k]` and `d.get(k)`.




- #### `abc.Mapping` subclass with the simplest possible `__getitem__` 
    A minimal subclass of abc.Mapping implementing `__missing__` and the requiredabstract methods, including an implementation of `__getitem__` that does not call `__missing__`. The `__missing__` method is never triggered in this class.


The four scenarios just described assume minimal implementations. If your subclass
implements `__getitem__`, get, and `__contains__`, then you can make those methods
use `__missing__` or not, depending on your needs. The point of this section is to
show that you must be careful when subclassing standard library mappings to use
`__missing__`, because the base classes support different behaviors by default.

Don’t forget that the behavior of `setdefault` and `update` is also affected by key lookup.