<h1>Chapter 03. Dictionaries and Sets.</h1>

<h2>The Modern Syntax of Dictionaries.</h2>

<h3>Dictionary Entries</h3>

Using dictionary entries to construct two dictionaries from the same tuple list

In [1]:
DIAL_CODES = [
    (880, 'Bangladesh'),
    (55, 'Brazil'),
    (86, 'China'),
    (91, 'India'),
    (62, 'Indonesia'),
    (81, 'Japan'),
    (234, 'Nigeria'),
    (92, 'Pakistan'),
    (7, 'Russian Federation'),
    (1, 'United States of America')
]

country_dial = {
    country: code for code,
    country in DIAL_CODES
}
country_dial

{'Bangladesh': 880,
 'Brazil': 55,
 'China': 86,
 'India': 91,
 'Indonesia': 62,
 'Japan': 81,
 'Nigeria': 234,
 'Pakistan': 92,
 'Russian Federation': 7,
 'United States of America': 1}

In [2]:
{
    code: country.upper()
    for country, code in sorted(country_dial.items())
    if code < 70
}

{55: 'BRAZIL',
 62: 'INDONESIA',
 7: 'RUSSIAN FEDERATION',
 1: 'UNITED STATES OF AMERICA'}

<h3>Unpacking mappings</h3>

The `**` operator unpacks dictionaries, making it easier to pass multiple keyword arguments to functions or merge dictionaries quickly.

In [3]:
def dump(**kwargs):
    return kwargs

In [4]:
dump(**{'x': 1}, y=2, **{'z': 3})

{'x': 1, 'y': 2, 'z': 3}

In [5]:
{
    'a': 0,
    **{'x': 1},
    'y': 2,
    **{'z': 3, 'x': 4}
}

{'a': 0, 'x': 4, 'y': 2, 'z': 3}

<h3>Unification of mappings by operator <code>|</code></h3>

In [6]:
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}

In [7]:
d1 | d2

{'a': 2, 'b': 4, 'c': 6}

In [8]:
d1

{'a': 1, 'b': 3}

`|=` operator modifies existing mappings

In [9]:
d1 |= d2

In [10]:
d1

{'a': 2, 'b': 4, 'c': 6}

<h2>Comparison with a Sample Mapping</h2>

`get_creator()` highlights author names from notes on works of art

In [11]:
def get_creator(record: dict) -> list:
    match record:
        case {'type': 'book', 'api': 2, 'authors': [*names]}:
            return names
        case {'type': 'book', 'api': 1, 'author': name}:
            return [name]
        case {'type': 'book'}:
            raise ValueError(f"Invalid 'book' record: {record!r}")
        case {'type': 'movie', 'director': name}:
            return [name]
        case _:
            raise ValueError(f"Invalid record: {record!r}")

In [12]:
b1 = dict(
    api=1,
    author='Douglas Hofstadter',
    type='book',
    title='Gödel, Escher, Bach'
)

In [13]:
get_creator(b1)

['Douglas Hofstadter']

In [14]:
from collections import OrderedDict


b2 = OrderedDict(
    api=2,
    type='book',
    title='Python in a Nutshell',
    authors='Martelli Ravenscroft Holden'.split()
)

In [15]:
get_creator(b2)

['Martelli', 'Ravenscroft', 'Holden']

In [16]:
get_creator({'type': 'book', 'pages': 770})

ValueError: Invalid 'book' record: {'type': 'book', 'pages': 770}

In [17]:
get_creator('Spam, spam, spam')

ValueError: Invalid record: 'Spam, spam, spam'

In [18]:
food = dict(category='ice cream', flavor='vanilla', cost=199)

In [19]:
match food:
    case {'category': 'ice cream', **details}:
        print(f"Ice cream details: {details}")

Ice cream details: {'flavor': 'vanilla', 'cost': 199}


<h2>Standart API of mapping types</h2>

<h3>What does "hashable" mean?</h3>

Hashable objects are those that can be hashed, meaning they have a hash value that remains constant throughout their lifetime and can be used as keys in a dictionary or as elements in a set.

In [20]:
tt = (1, 2, (3, 4))
tt.__hash__()

3794340727080330424

In [21]:
tl = (1, 2, [3, 4])
tl.__hash__()

TypeError: unhashable type: 'list'

`frozenset()` creates an immutable set from an iterable

In [22]:
tf = (1, 2, frozenset([3, 4]))
tf.__hash__()

131250961768736263

<h3>Inserting and updating mutable values</h3>

In [23]:
ZEN_TEXT = """
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"
"""

`ZEN_TEXT` processing. Each line shows a word and a list of its occurences represented in pairs (row number, column number).

In [24]:
import re


WORD_RE = re.compile(r'\w+')

index = {}
for line_no, line in enumerate(ZEN_TEXT.split('\n'), start=1):
    for match in WORD_RE.finditer(line):
        word = match.group()
        column_no = match.start() + 1
        location = (line_no, column_no)
        
        occurrences = index.get(word, [])
        occurrences.append(location)
        
        index[word] = occurrences

# display in alphabetical order
for word in sorted(index, key=str.upper)[:10]:  # let's reduce the output to 10 lines
    print(word, index[word])

a [(20, 48), (21, 53)]
Although [(12, 1), (17, 1), (19, 1)]
ambiguity [(15, 16)]
and [(16, 23)]
are [(22, 12)]
aren [(11, 15)]
at [(17, 38)]
bad [(20, 50)]
be [(16, 14), (17, 27), (21, 50)]
beats [(12, 23)]


Using method `dict.setdefault()`

In [25]:
index = {}
for line_no, line in enumerate(ZEN_TEXT.split('\n'), start=1):
    for match in WORD_RE.finditer(line):
        word = match.group()
        column_no = match.start() + 1
        location = (line_no, column_no)
        
        index.setdefault(word, []).append(location)

for word in sorted(index, key=str.upper)[:10]:
    print(word, index[word])

a [(20, 48), (21, 53)]
Although [(12, 1), (17, 1), (19, 1)]
ambiguity [(15, 16)]
and [(16, 23)]
are [(22, 12)]
aren [(11, 15)]
at [(17, 38)]
bad [(20, 50)]
be [(16, 14), (17, 27), (21, 50)]
beats [(12, 23)]


<h2>Automatic handling of missing keys</h2>

<h3><code>defaultdict</code> another approach to handle missing keys</h3>

Using instance `defaultdict` instead of method `setdefault`

In [26]:
from collections import defaultdict
import re


WORD_RE = re.compile(r'\w+')

index = defaultdict(list)  # create 'defaultdict' by setting as 'default_factory' constructor 'list' 
for line_no, line in enumerate(ZEN_TEXT.split('\n'), start=1):
    for match in WORD_RE.finditer(line):
        word = match.group()
        column_no = match.start() + 1
        location = (line_no, column_no)

        index[word].append(location)  # if 'word' is not in 'index', 'default_factory' create an empty list

# display in alphabetical order
for word in sorted(index, key=str.upper)[:10]:  # let's reduce the output to 10 lines
    print(word, index[word])

a [(20, 48), (21, 53)]
Although [(12, 1), (17, 1), (19, 1)]
ambiguity [(15, 16)]
and [(16, 23)]
are [(22, 12)]
aren [(11, 15)]
at [(17, 38)]
bad [(20, 50)]
be [(16, 14), (17, 27), (21, 50)]
beats [(12, 23)]


<h3>Method <code>__missing__</code></h3>

`__missing__` is a special method in dictionaries. It's used to define custom behavior when a requested key is not found. This method allows customization for handling missing keys in dictionary subclasses.

In [27]:
class StrKeyDict(dict):

    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def get(self, key, default=None):
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self, key):
        return key in self.keys() or str(key) in self.keys()