### Section 8, 128. Named Tuples - Application - Alternative to Dictionaries

Note, only works for dictionaries with `string` keys.

In [1]:
from collections import namedtuple

In [2]:
data_dict = dict(key1=100, key2=200, key3=300)

In [3]:
data_dict

{'key1': 100, 'key2': 200, 'key3': 300}

In [4]:
Data = namedtuple('Data', data_dict.keys())

In [6]:
Data._fields

('key1', 'key2', 'key3')

We can create an instance of the `Data` namedtuple using the data in the `data_dict` dictionary.

We could try the following (bad idea):

In [13]:
values = data_dict.values()
print(values)
print(type(values))

dict_values([100, 200, 300])
<class 'dict_values'>


In [14]:
d1 = Data(*data_dict.values())

In [15]:
d1

Data(key1=100, key2=200, key3=300)

This looks like it worked.

Now consider this second dictionary, where we do not create the keys in the same order.

In [16]:
data_dict_2 = dict(key1=100, key3=300, key2=200)

In [17]:
d2 = Data(*data_dict_2.values())

In [18]:
d2

Data(key1=100, key2=300, key3=200)

Obviously wrong.

We cannot guarantee that the order of the values() will be in the same order as the keys in the named tuple.

Instead, unpack the dictionary itself, resulting in keyword arguments.

In [19]:
d2 = Data(**data_dict_2)

In [20]:
d2

Data(key1=100, key2=200, key3=300)

So, the pattern to create a named tuple from a single dictionary:

For any dictionary `d` we can create a named tuple class and insert data into it as follows:

- `Struct = namedtuple('Struct', d.keys())`
- `data = Struct(**d)`

In [21]:
data_dict = dict(first_name='John', last_name='Cleese', age=42, complaint='dead parrot')

In [22]:
data_dict

{'first_name': 'John',
 'last_name': 'Cleese',
 'age': 42,
 'complaint': 'dead parrot'}

In [23]:
data_dict.keys()

dict_keys(['first_name', 'last_name', 'age', 'complaint'])

In [24]:
type(data_dict.keys())

dict_keys

In [25]:
sorted(data_dict.keys())

['age', 'complaint', 'first_name', 'last_name']

In [26]:
Struct = namedtuple('Struct', sorted(data_dict.keys()))

In [28]:
Struct._fields

('age', 'complaint', 'first_name', 'last_name')

We can still put the correct values from the dictionary into the correct slots of the named tuple by unpacking the dictionary instead of the values:

In [29]:
d1 = Struct(**data_dict)

In [30]:
d1

Struct(age=42, complaint='dead parrot', first_name='John', last_name='Cleese')

Since this is now a named tuple we can access the data using the field name:

In [31]:
d1.complaint

'dead parrot'

instead of how we would have done it with the dictionary:

In [32]:
data_dict['complaint']

'dead parrot'

Pointing out that with dictionaries often end up with code where the key is stored in some variable and the referenced this way:

In [33]:
key_name = 'age'
data_dict[key_name]

42

We cannot use this approach directly with named tuples.

For example this will not work:

In [34]:
key_name = 'age'
d1.key_name

AttributeError: 'Struct' object has no attribute 'key_name'

However, we can use the `getattr` function:

In [36]:
key_name = 'age'
getattr(d1, key_name)

42

The dictionary has a `get` method that can specify a default value to return if the key does not exist

In [38]:
data_dict.get('age', None), data_dict.get('invalid_key', None)

(42, None)

The same can be done with the `getattr` function:

In [39]:
getattr(d1, 'age', None), getattr(d1, 'invalid_field', None)

(42, None)

Now this is not very useful if you are only working with a single instance of a dictionary that has the same set of keys.

You do not want to create a new named tuple for every instance of a dictionary that would be way too much overhead.

But in cases where you have a collection of dictionaries that share a common set of keys, this can be very useful. As long as you are willing to live with the fact that you now have immutable structures.

Let's suppose we have this data list:

In [40]:
data_list = [
    {'key1': 1, 'key2': 2},
    {'key1': 3, 'key2': 4},
    {'key1': 5, 'key2': 6, 'key3': 7},
    {'key2': 100}
]

In [41]:
data_list

[{'key1': 1, 'key2': 2},
 {'key1': 3, 'key2': 4},
 {'key1': 5, 'key2': 6, 'key3': 7},
 {'key2': 100}]

First we need to figure out all the possible keys that have been used in the dictionaries in this list.

The easiest way to do this is to extract all the keys of all the dictionaries and then make a `set out of them, to eliminate duplicate key name:

We could do it this way, using a simple loop:

In [43]:
keys = set()
for d in data_list:
    for key in d.keys():
        # print(key)
        keys.add(key)

In [46]:
keys

{'key1', 'key2', 'key3'}

A more efficient way would be to use a comprehension:

In [60]:
keys = set().union(*(dict_.keys() for dict_ in data_list))

In [61]:
keys

{'key1', 'key2', 'key3'}

exploratory code

In [55]:
def func(*args):
    print(*args)


In [56]:
func(1, 2, 3)

1 2 3


In [59]:
func(*(dict_.keys() for dict_ in data_list))

dict_keys(['key1', 'key2']) dict_keys(['key1', 'key2']) dict_keys(['key1', 'key2', 'key3']) dict_keys(['key2'])


Now we can create a named tuple with the keys as fields:

In [62]:
Struct = namedtuple('Struct', keys)

In [64]:
Struct._fields

('key2', 'key3', 'key1')

Sets do not preserve order, so we can create the named tuple with the keys sorted:

In [65]:
Struct = namedtuple('Struct', sorted(keys))

In [66]:
Struct._fields

('key1', 'key2', 'key3')

We need to provide default values, since the dictionaries do not have all the keys.

We'll set the default values to `None`.

In [67]:
Struct.__new__.__defaults__ = (None,) * len(Struct._fields)

In [69]:
help(Struct)

Help on class Struct in module __main__:

class Struct(builtins.tuple)
 |  Struct(key1=None, key2=None, key3=None)
 |
 |  Struct(key1, key2, key3)
 |
 |  Method resolution order:
 |      Struct
 |      builtins.tuple
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __getnewargs__(self) from collections.Struct
 |      Return self as a plain tuple.  Used by copy and pickle.
 |
 |  __replace__ = _replace(self, /, **kwds)
 |
 |  __repr__(self) from collections.Struct
 |      Return a nicely formatted representation string
 |
 |  _asdict(self) from collections.Struct
 |      Return a new dict which maps field names to their values.
 |
 |  _replace(self, /, **kwds) from collections.Struct
 |      Return a new Struct object replacing specified fields with new values
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  _make(iterable) from collections.Struct
 |      Make a new Struct object from a sequence or iterable
 

In [73]:
data_list

[{'key1': 1, 'key2': 2},
 {'key1': 3, 'key2': 4},
 {'key1': 5, 'key2': 6, 'key3': 7},
 {'key2': 100}]

In [74]:
tuple_list = [Struct(**dict_) for dict_ in data_list]

In [75]:
tuple_list

[Struct(key1=1, key2=2, key3=None),
 Struct(key1=3, key2=4, key3=None),
 Struct(key1=5, key2=6, key3=7),
 Struct(key1=None, key2=100, key3=None)]

Finally, package up the code in a function that takes an iterable of dictionaries, or an arbitrary number of dictionaries as positional arguments, and returns a list of named tuples:

In [81]:
def tuplify_dicts(dicts):
    keys = {key for dict_ in dicts for key in dict_.keys()}
    Struct = namedtuple('Struct', sorted(keys))
    Struct.__new__.__defaults__ = (None,) * len(Struct._fields)
    return [Struct(**dict_) for dict_ in dicts]

In [82]:
tuplify_dicts(data_list)

[Struct(key1=1, key2=2, key3=None),
 Struct(key1=3, key2=4, key3=None),
 Struct(key1=5, key2=6, key3=7),
 Struct(key1=None, key2=100, key3=None)]

Note, this is great code.

Will be useful in **many situations!**