# Dictionaries and sets

## Introduction
_Python is basically dicts wrapped in loads of syntactic sugar._

We use dictionaries in all our programs, even if they are not explicitly defined, since they are a fundamental part of Python's implementation.

Class, namespaces and all the keyword arguments are some of the core Python constructs, represented by dictionaries in memory.

`__builtins__.__dict__` stores all built-in types, objects and functions.

Because of their core role, `dict` objects are highly optimized, they are built on top of _hash tables_ helping in their performance.

Other builtin Python types which rely on _hash tables_ are `set` and it's variation `frozenset`, this is a particular Python type used to implement all the funcamental set arithmetic, with them, we can express algorithms in a more pythonic way.

### Dict comprehensions

Since Python 2.7, the syntax used for _genexps_ and _listcomps_ has been used for _dict comprehensions_, syntax that you'll find very familiar.

In [2]:
codes = [
    (55, "CDMX"),
    (33, "GDL"),
    (81, "MTY"),
]

dial = {country: code for country, code in codes}
print(dial)

{55: 'CDMX', 33: 'GDL', 81: 'MTY'}


Please note that, in this particular case, the input for our dial dict comprehension is actually a list of tuples, the _dictcomp_ expression uses curly braces `{}` and expects to use `key: value` as the first element, after that you can use the typical `for` syntax used in _listcomps_ and _genexps_.

As I said, this particular case uses a "double input" but you can always use a simple input of only one iterator variable, it all depends on the implementation you need.

In [4]:
words = [
    "stuff",
    "gain",
    "harmony",
    "integrity",
    "shoulder",
    "blind",
    "rugby",
    "marriage",
    "treat",
    "month",
]
letter_count = {word: len(word) for word in words}
print(letter_count)

{'stuff': 5, 'gain': 4, 'harmony': 7, 'integrity': 9, 'shoulder': 8, 'blind': 5, 'rugby': 5, 'marriage': 8, 'treat': 5, 'month': 5}


In the example above, I generated a new dict of a string key using the words from the list, and with a value using the length of it, only with one iterator variable.

Although, dictionaries are not quite sortable, it does not meant you can't sort them, but we'll see further an specific varaition of this type named `OrderedDict` used in this cases, by the meantime you can sort using a pipeline of `items`, `sorted` and `dict` functions, using `sorted` you are able to specify the ordering method with the `key` callable argument.

In [11]:
# Order alphabetically
dict(sorted(letter_count.items()))

{'blind': 5,
 'gain': 4,
 'harmony': 7,
 'integrity': 9,
 'marriage': 8,
 'month': 5,
 'rugby': 5,
 'shoulder': 8,
 'stuff': 5,
 'treat': 5}

In [13]:
# Order by length
dict(sorted(letter_count.items(), key=lambda k: k[1]))

{'gain': 4,
 'stuff': 5,
 'blind': 5,
 'rugby': 5,
 'treat': 5,
 'month': 5,
 'harmony': 7,
 'shoulder': 8,
 'marriage': 8,
 'integrity': 9}

### Unpacking dicts

Dictionaries are unpackable as tuples and sequences, the first way to unpack a dictionary is using the `**` operator, this works when keys are all strings and unique across all arguments, sicne duplicated arguments are forbidden.

In [21]:
def dump(**kwargs):
    return kwargs

print(dump(a=1, b=2, c=3))
print(dump(name="John", lastname="Doe"))

{'a': 1, 'b': 2, 'c': 3}
{'name': 'John', 'lastname': 'Doe'}


In [23]:
# You can actually use the ** operator to unpack in unpackings
dump(**{"x": 1}, y=2, **{"z": 3})

{'x': 1, 'y': 2, 'z': 3}

### Merging dicts

Since Python 3.9, the operator `|` and `|=` is available to merge dicts, this operator does not affect any of the inputs, instead, it creates a new `dict` object with the merged values of all the inputs, please note that the interpreter will always take the last value for repeated keys.

In [27]:
person = {"name": "John", "lastname": "Doe", "age": 25}
occupation = {"role": "Python Dev", "seniority": "Senior"}

person | occupation

{'name': 'John',
 'lastname': 'Doe',
 'age': 25,
 'role': 'Python Dev',
 'seniority': 'Senior'}

If a key is repeated, the interpreter will store the last occurrency of the key.

In [29]:
desired_job = {"role": "Artist"}

person | occupation | desired_job

{'name': 'John',
 'lastname': 'Doe',
 'age': 25,
 'role': 'Artist',
 'seniority': 'Senior'}

### Pattern matching

The `match/case` statement supports subjects that are mapping objects, patterns for mappings look like `dict` instances but they can match insances of any actual or virtual sublcass of `collections.abc.Mapping`.

Thanks to destructurung, pattern matching is a powerful tool to process records structured as JSON and semi-destructured schemas like MongoDB.

In [1]:
def get_creators(record: dict) -> list:
    match record:
        case {"type": "book", "api": 2, "authors": [*names]}:
            return names
        case {"type": "book", "api": 1, "author": name}:
            return [name]
        case {"type": "book"}:
            raise ValueError(f"Invalid record {record!r}")
        case {"type": "movie", "director": name}:
            return [name]
        case _:
            raise ValueError(f"Invalid record {record!r}")

case_1 = {"type": "book", "api": 2, "authors": ["John Doe", "Jane Doe"]}
print("Case 1 ", get_creators(case_1))

case_2 = {"type": "book", "api": 1, "author": "Jane Doe"}
print("Case 2 ", get_creators(case_2))

case_3 = {"type": "movie", "director": "John Doe"}
print("Case 3 ", get_creators(case_3))

Case 1  ['John Doe', 'Jane Doe']
Case 2  ['Jane Doe']
Case 3  ['John Doe']


In [6]:
from collections import OrderedDict

case_ord = OrderedDict(type="book", api=2, authors=["Bradbury", "Gaiman"])

print("Case with ordered dict", get_creators(case_ord))

Case with ordered dict ['Bradbury', 'Gaiman']


In the last example above, note that the order of the keys is still irrelevant, even when the `Mapping` object actually cares about the order.

Let's create a more complete example, sometimes, the `Mapping` objects may arrive with many more keys, although pattern will match, if you need to handle all of those extra key-value pars as a dict, you can use `**extra`.

In [10]:
def get_creators(record: dict) -> list:
    match record:
        case {"type": "book", "api": 2, "authors": [*names], **extra}:
            print(extra)
            return names
        case {"type": "book", "api": 1, "author": name, **extra}:
            print(**extra)
            return [name]
        case {"type": "book"}:
            raise ValueError(f"Invalid record {record!r}")
        case {"type": "movie", "director": name, **extra}:
            print(extra)
            return [name]
        case _:
            raise ValueError(f"Invalid record {record!r}")
            
case_extra = dict(type="movie", director="Francis Ford Coppola", title="apocalypse now", duration="2h33m")
print(get_creators(case_extra))

{'title': 'apocalypse now', 'duration': '2h33m'}
['Francis Ford Coppola']


### Standard API of mapping types

As usual, `collections.abc` module provides the `Mapping` and  `MutableMApping` ABCs describing the interfaces of dict and similar types.

In [12]:
from collections import abc

my_dict = dict()
isinstance(my_dict, abc.Mapping)

True

In [13]:
isinstance(my_dict, abc.MutableMapping)

True

To implement a custom mapping, the best idea might be to extend `collections.UserDict`, or to wrap `dict` by compositio rather than subclassing the ABCs. Since they are all based on a hash table functionality, the main limitation is that *all keys must be hashable* therefore *inmutable*.

What is hashable?
*According to the Python Glossary, an object is hashable is it has a hash code that never changes during its lifetime (it needs a `__hash__()` method), and can be compared to other objects (it needs an `__eq__()` method). Hashable objects which compare equal must have the same hash code.*

Numeric and flat inmutable types `str` and `bytes` are all hashable, Container types are hashable if they are inmutables, and all their items are hashable as well.

User defined types are hashable by default, because their hash code is calculated from their `id()` and the `__eq__()` method is inherited from the `object` class that compares their object IDs.

In [15]:
class User:
    
    def __init__(self, name, age):
        self.name = name
        self.age = age

user_1 = User("Aldo", 25)
user_2 = User("Santiago", 19)

print("user_1 id: ", id(user_1))
print("user_2 id: ", id(user_2))

print("user_1 hash: ", hash(user_1))
print("user_2 hash: ", hash(user_2))

user_1 id:  4424263392
user_2 id:  4421431600
user_1 hash:  276516462
user_2 hash:  276339475
