<h1>Chapter 03. Dictionaries and Sets.</h1>

<h2>The Modern Syntax of Dictionaries.</h2>

<h3>Dictionary Entries</h3>

Using dictionary entries to construct two dictionaries from the same tuple list

In [1]:
DIAL_CODES = [
    (880, 'Bangladesh'),
    (55, 'Brazil'),
    (86, 'China'),
    (91, 'India'),
    (62, 'Indonesia'),
    (81, 'Japan'),
    (234, 'Nigeria'),
    (92, 'Pakistan'),
    (7, 'Russian Federation'),
    (1, 'United States of America')
]

country_dial = {
    country: code for code,
    country in DIAL_CODES
}
country_dial

{'Bangladesh': 880,
 'Brazil': 55,
 'China': 86,
 'India': 91,
 'Indonesia': 62,
 'Japan': 81,
 'Nigeria': 234,
 'Pakistan': 92,
 'Russian Federation': 7,
 'United States of America': 1}

In [2]:
{
    code: country.upper()
    for country, code in sorted(country_dial.items())
    if code < 70
}

{55: 'BRAZIL',
 62: 'INDONESIA',
 7: 'RUSSIAN FEDERATION',
 1: 'UNITED STATES OF AMERICA'}

<h3>Unpacking mappings</h3>

The `**` operator unpacks dictionaries, making it easier to pass multiple keyword arguments to functions or merge dictionaries quickly.

In [3]:
def dump(**kwargs):
    return kwargs

In [4]:
dump(**{'x': 1}, y=2, **{'z': 3})

{'x': 1, 'y': 2, 'z': 3}

In [5]:
{
    'a': 0,
    **{'x': 1},
    'y': 2,
    **{'z': 3, 'x': 4}
}

{'a': 0, 'x': 4, 'y': 2, 'z': 3}

<h3>Unification of mappings by operator <code>|</code></h3>

In [6]:
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}

In [7]:
d1 | d2

{'a': 2, 'b': 4, 'c': 6}

In [8]:
d1

{'a': 1, 'b': 3}

`|=` operator modifies existing mappings

In [9]:
d1 |= d2

In [10]:
d1

{'a': 2, 'b': 4, 'c': 6}

<h2>Comparison with a Sample Mapping</h2>

`get_creator()` highlights author names from notes on works of art

In [11]:
def get_creator(record: dict) -> list:
    match record:
        case {'type': 'book', 'api': 2, 'authors': [*names]}:
            return names
        case {'type': 'book', 'api': 1, 'author': name}:
            return [name]
        case {'type': 'book'}:
            raise ValueError(f"Invalid 'book' record: {record!r}")
        case {'type': 'movie', 'director': name}:
            return [name]
        case _:
            raise ValueError(f"Invalid record: {record!r}")

In [12]:
b1 = dict(
    api=1,
    author='Douglas Hofstadter',
    type='book',
    title='Gödel, Escher, Bach'
)

In [13]:
get_creator(b1)

['Douglas Hofstadter']

In [14]:
from collections import OrderedDict


b2 = OrderedDict(
    api=2,
    type='book',
    title='Python in a Nutshell',
    authors='Martelli Ravenscroft Holden'.split()
)

In [15]:
get_creator(b2)

['Martelli', 'Ravenscroft', 'Holden']

In [16]:
get_creator({'type': 'book', 'pages': 770})

ValueError: Invalid 'book' record: {'type': 'book', 'pages': 770}

In [17]:
get_creator('Spam, spam, spam')

ValueError: Invalid record: 'Spam, spam, spam'

In [18]:
food = dict(category='ice cream', flavor='vanilla', cost=199)

In [19]:
match food:
    case {'category': 'ice cream', **details}:
        print(f"Ice cream details: {details}")

Ice cream details: {'flavor': 'vanilla', 'cost': 199}


<h2>Standart API of mapping types</h2>

<h3>What does "hashable" mean?</h3>

Hashable objects are those that can be hashed, meaning they have a hash value that remains constant throughout their lifetime and can be used as keys in a dictionary or as elements in a set.

In [20]:
tt = (1, 2, (3, 4))
tt.__hash__()

3794340727080330424

In [21]:
tl = (1, 2, [3, 4])
try:
    tl.__hash__()
except TypeError as e:
    print(e.__repr__())

TypeError("unhashable type: 'list'")


`frozenset()` creates an immutable set from an iterable

In [22]:
tf = (1, 2, frozenset([3, 4]))
tf.__hash__()

131250961768736263

<h3>Inserting and updating mutable values</h3>

In [23]:
ZEN_TEXT = """
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"
"""

`ZEN_TEXT` processing. Each line shows a word and a list of its occurences represented in pairs (row number, column number).

In [24]:
import re


WORD_RE = re.compile(r'\w+')

index = {}
for line_no, line in enumerate(ZEN_TEXT.split('\n'), start=1):
    for match in WORD_RE.finditer(line):
        word = match.group()
        column_no = match.start() + 1
        location = (line_no, column_no)
        
        occurrences = index.get(word, [])
        occurrences.append(location)
        
        index[word] = occurrences

# display in alphabetical order
for word in sorted(index, key=str.upper)[:10]:  # let's reduce the output to 10 lines
    print(word, index[word])

a [(20, 48), (21, 53)]
Although [(12, 1), (17, 1), (19, 1)]
ambiguity [(15, 16)]
and [(16, 23)]
are [(22, 12)]
aren [(11, 15)]
at [(17, 38)]
bad [(20, 50)]
be [(16, 14), (17, 27), (21, 50)]
beats [(12, 23)]


Using method `dict.setdefault()`

In [25]:
index = {}
for line_no, line in enumerate(ZEN_TEXT.split('\n'), start=1):
    for match in WORD_RE.finditer(line):
        word = match.group()
        column_no = match.start() + 1
        location = (line_no, column_no)
        
        index.setdefault(word, []).append(location)

for word in sorted(index, key=str.upper)[:10]:
    print(word, index[word])

a [(20, 48), (21, 53)]
Although [(12, 1), (17, 1), (19, 1)]
ambiguity [(15, 16)]
and [(16, 23)]
are [(22, 12)]
aren [(11, 15)]
at [(17, 38)]
bad [(20, 50)]
be [(16, 14), (17, 27), (21, 50)]
beats [(12, 23)]


<h2>Automatic handling of missing keys</h2>

<h3><code>defaultdict</code> another approach to handle missing keys</h3>

Using instance `defaultdict` instead of method `setdefault`

In [26]:
from collections import defaultdict
import re


WORD_RE = re.compile(r'\w+')

index = defaultdict(list)  # create 'defaultdict' by setting as 'default_factory' constructor 'list' 
for line_no, line in enumerate(ZEN_TEXT.split('\n'), start=1):
    for match in WORD_RE.finditer(line):
        word = match.group()
        column_no = match.start() + 1
        location = (line_no, column_no)

        index[word].append(location)  # if 'word' is not in 'index', 'default_factory' create an empty list

# display in alphabetical order
for word in sorted(index, key=str.upper)[:10]:  # let's reduce the output to 10 lines
    print(word, index[word])

a [(20, 48), (21, 53)]
Although [(12, 1), (17, 1), (19, 1)]
ambiguity [(15, 16)]
and [(16, 23)]
are [(22, 12)]
aren [(11, 15)]
at [(17, 38)]
bad [(20, 50)]
be [(16, 14), (17, 27), (21, 50)]
beats [(12, 23)]


<h3>Method <code>__missing__</code></h3>

`__missing__` is a special method in dictionaries. It's used to define custom behavior when a requested key is not found. This method allows customization for handling missing keys in dictionary subclasses.

In [27]:
class StrKeyDict(dict):

    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def get(self, key, default=None):
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self, key):
        return key in self.keys() or str(key) in self.keys()

<h2>Variations on theme <code>dict</code></h2>

<h3><code>collections.OrderedDict</code></h3>

`collections.OrderedDict` is a Python container that maintains the order of inserted key-value pairs. It extends the functionality of a regular dictionary by remembering the order in which items were added.

In [28]:
from collections import OrderedDict


ord_dict = OrderedDict()
ord_dict['one'] = 1
ord_dict['two'] = 2
ord_dict['three'] = 3

ord_dict

OrderedDict([('one', 1), ('two', 2), ('three', 3)])

In [29]:
d1['one'] = 1
d1['two'] = 2
d1['three'] = 3

d2['three'] = 3
d2['two'] = 2
d2['one'] = 1

In [30]:
print(d1 == d2)
print(d2 == ord_dict)

True
False


<h3><code>collections.ChainMap</code></h3>

`collections.ChainMap` in Python is a class that provides the ability to link multiple dictionaries or mappings into a single view. It allows you to access and update the combined mappings as if they were a single dictionary.

In [31]:
d1 = dict(a=1, b=3)
d2 = dict(a=2, b=4, c=6)

In [32]:
from collections import ChainMap


chain = ChainMap(d1, d2)
chain

ChainMap({'a': 1, 'b': 3}, {'a': 2, 'b': 4, 'c': 6})

In [33]:
chain['a']

1

In [34]:
chain['c']

6

All `ChainMap` modifications and insertions apply only to the first of the input mappings.

In [35]:
chain['c'] = -1

In [36]:
d1

{'a': 1, 'b': 3, 'c': -1}

In [37]:
d2

{'a': 2, 'b': 4, 'c': 6}

<h3><code>collections.Counter</code></h3>

`collections.Counter` is used for counting the occurrences of elements in a collection, typically in an iterable like a list or a string. The elements and their counts are stored as dictionary-like key-value pairs.

In [38]:
from collections import Counter


ct = Counter('abracadabra')
ct

Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

In [39]:
ct.update('aaazzz')
ct

Counter({'a': 8, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

<h3>Creating a <code>UserDict</code> subclass instead of <code>dict</code></h3>

`UserDict` is a class in Python's `collections` module that provides a simple way to create user-defined dictionary-like objects. It is designed to be subclassed, allowing users to define their own custom dictionary behavior by extending the `UserDict` class.

In [40]:
from collections import UserDict


class MyDict(UserDict):

    def __contains__(self, key):
        return key in self.data

    def __setitem__(self, key, value):
        if key in self.data:
            self.data[key] = value  # update existing value
        else:
            self.data[key] = value  # add new key-value pair

    def __delitem__(self, key):
        if key in self.data:
            del self.data[key]
        else:
            raise KeyError(key)

<h2>Immutable mappings</h2>

`MappingProxyType` in Python's `types` module creates a read-only proxy for mappings (like dictionaries). It enables viewing data without allowing modifications, useful for controlled access and immutability.

In [41]:
from types import MappingProxyType


d = {1: 'A'}
d_proxy = MappingProxyType(d)

d_proxy

mappingproxy({1: 'A'})

In [42]:
d_proxy[1]

'A'

In [43]:
try:
    d_proxy[2] = 'B'
except TypeError as e:
    print(e.__repr__())

TypeError("'mappingproxy' object does not support item assignment")


The presentation of the `d_proxy` is dynamic: any change is immediately displayed

In [44]:
d[2] = 'B'

d_proxy

mappingproxy({1: 'A', 2: 'B'})

In [45]:
d_proxy[2]

'B'

<h2>Dictionary represantations</h2>

In [46]:
d = dict(a=10, b=20, c=30)
d

{'a': 10, 'b': 20, 'c': 30}

In [47]:
values = d.values()
values

dict_values([10, 20, 30])

In [48]:
len(values)

3

In [49]:
list(values)

[10, 20, 30]

In [50]:
reversed(values)

<dict_reversevalueiterator at 0x10aae8180>

In [51]:
try:
    values[0]
except TypeError as e:
    print(e.__repr__())

TypeError("'dict_values' object is not subscriptable")


In [52]:
d['z'] = 99
d

{'a': 10, 'b': 20, 'c': 30, 'z': 99}

In [53]:
values

dict_values([10, 20, 30, 99])

In [54]:
values_class = type({}.values())
try:
    v = values_class()
except TypeError as e:
    print(e.__repr__())

TypeError("cannot create 'dict_values' instances")


<h2>Set theory</h2>

A `set` is a collection of unigue objects.

In [55]:
l = ['spam', 'spam', 'eggs', 'spam', 'bacon', 'eggs']
set(l)

{'bacon', 'eggs', 'spam'}

In [56]:
list(set(l))

['eggs', 'bacon', 'spam']

Eliminating duplicates while preserving the order of occurance using `dict`

In [57]:
list(dict.fromkeys(l).keys())

['spam', 'eggs', 'bacon']

Counting the number of occurances of `needles` in `haystack`

In [58]:
# if variables are of type set
haystack = set()
needles = set()
found = len(needles & haystack)

# if variables are of a different type, for example list
haystack = []
needles = []
found = len(set(needles) & set(haystack))  # option 1
found = len(set(needles).intersection(set(haystack)))  # option 2

<h3>Set iclusion <i>(setcomp)</i></h3>

Construction of a `set` of Latin-1 characters with the word "SIGN" in their Unicode names 

In [59]:
from unicodedata import name


{
    chr(i) for i in range(32, 256)
    if 'SIGN' in name(chr(i), '')
}

{'#',
 '$',
 '%',
 '+',
 '<',
 '=',
 '>',
 '¢',
 '£',
 '¤',
 '¥',
 '§',
 '©',
 '¬',
 '®',
 '°',
 '±',
 'µ',
 '¶',
 '×',
 '÷'}