# The Collections Module

Python has a ```collections``` module which has a number of supplementary collections to those explored in Pythons ```builtins```. 

## Categorize_Identifiers Module

This notebook will use the following functions ```dir2```, ```variables``` and ```view``` in the custom module ```categorize_identifiers``` which is found in the same directory as this notebook file. ```dir2``` is a variant of ```dir``` that groups identifiers into a ```dict``` under categories and ```variables``` is an IPython based a variable inspector. ```view``` is used to view a ```Collection``` in more detail:

In [1]:
from categorize_identifiers import dir2, variables, view

## Importing the Collections Module

The ```collections``` module can be imported using:

In [2]:
import collections

The identifiers for the ```collections``` module can be examined. The identifiers of most interest are the classes, most of these are in CamelCase, however a couple of classes are lowercase as they were originally designed to be incorporated into ```builtins```:

In [3]:
dir2(collections)

{'module': ['abc'],
 'method': ['namedtuple'],
 'lower_class': ['defaultdict', 'deque'],
 'upper_class': ['ChainMap',
                 'Counter',
                 'OrderedDict',
                 'UserDict',
                 'UserList',
                 'UserString'],
 'datamodel_attribute': ['__all__',
                         '__builtins__',
                         '__cached__',
                         '__doc__',
                         '__file__',
                         '__loader__',
                         '__name__',
                         '__package__',
                         '__path__',
                         '__spec__'],
 'internal_attribute': ['_collections_abc', '_sys'],
 'internal_method': ['_Link',
                     '_OrderedDictItemsView',
                     '_OrderedDictKeysView',
                     '_OrderedDictValuesView',
                     '_chain',
                     '_count_elements',
                     '_deque_iterator',
                    

The modules docstring gives a quick overview of these classes:

In [4]:
collections?

[1;31mType:[0m        module
[1;31mString form:[0m <module 'collections' from 'C:\\Users\\phili\\Anaconda3\\envs\\vscode-env\\Lib\\collections\\__init__.py'>
[1;31mFile:[0m        c:\users\phili\anaconda3\envs\vscode-env\lib\collections\__init__.py
[1;31mDocstring:[0m  
This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict  dict subclass that remembers the order entries were added
* defaultdict  dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list ob

## NamedTuple

A ```tuple``` can be conceptualised as an immutable archive of records. Each record has a numeric index associated with the value and doesn't have a field name to describe what the value in the archive is. For example in the following ```tuple``` it is hard to distinguish, what field is what, whether for example the UK date format ```(day, month, year)``` or US date format ```(month, day, year)``` is used:

In [5]:
date_archive = (4, 3, 2023)

The ```dict``` class has ```item``` which are ```key``` and ```value``` pairs:

In [6]:
day1 = dict(day=4, month=3, year=2023)

When dealing with a large number of archives, the keys need to be specified every time:

In [7]:
day1 = dict(day=4, month=3, year=2024)
day2 = dict(day=3, month=4, year=2024)
day3 = dict(day=4, month=5, year=2024)

Finally there is no check to make sure the keys are input consistently:

In [8]:
badday1 = dict(d=15, m=3, y=2023)
badday2 = dict(day=15, month=4)

It is more convenient to group all of the data above into a subclass that has a data structure similar to a ```tuple```, but is fixed length and associates each record with an appropriate field name. 

```namedtuple``` is a factory function that creates a ```NamedTuple``` subclass; essentially a ```tuple``` based data structure or template that in this case has only three fields the  ```'day'```, ```'month'``` and ```'year'``` respectively. Once this subclass is created, it can be instantiated for each date above. 

Do not confuse the factory function ```namedtuple``` which is lower case and the abstract class ```NamedTuple``` which is CamelCase. Note that the abstract class ```NamedTuple``` isn't used directly, instead subclasses with a predefined number of fields and field names are created.

The ```namedtuple``` factory function can be imported from the collections module using:

In [9]:
from collections import namedtuple

The docstring of the ```namedtuple``` factory function can be viewed:

In [10]:
namedtuple?

[1;31mSignature:[0m
[0mnamedtuple[0m[1;33m([0m[1;33m
[0m    [0mtypename[0m[1;33m,[0m[1;33m
[0m    [0mfield_names[0m[1;33m,[0m[1;33m
[0m    [1;33m*[0m[1;33m,[0m[1;33m
[0m    [0mrename[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mdefaults[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mmodule[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Returns a new subclass of tuple with named fields.

>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point.__doc__                   # docstring for the new class
'Point(x, y)'
>>> p = Point(11, y=22)             # instantiate with positional args or keywords
>>> p[0] + p[1]                     # indexable like a plain tuple
33
>>> x, y = p                        # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y                       # fields also accessible by name
33
>>> d = p._asdict()                 # convert to 

For example the ```NamedTuple``` subclass ```DateTuple``` can be created using:

In [11]:
DateTuple = namedtuple('DateTuple', ['day', 'month', 'year'])

The return value of the ```namedtuple``` factory function returns a custom subclass and PascalCase is used to denote third party classes, in this case the ```class``` name ```DateTuple``` is selected.

The first positional input of the ```namedtuple``` factory function is ```typename``` and this is typically the ```str``` of the ```class``` name, in this case ```'DateTuple'```

The next input argument is ```field_names``` and this typically provided using a ```list``` of ```str``` instances, which will correspond to the field names. The field names must be valid identifier names. 

Now that the factory function has created the ```DateTuple``` (```NamedTuple``` subclass), the docstring of this subclass can be examined. Notice the initialisation signature has the field names:

In [12]:
DateTuple?

[1;31mInit signature:[0m [0mDateTuple[0m[1;33m([0m[0mday[0m[1;33m,[0m [0mmonth[0m[1;33m,[0m [0myear[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      DateTuple(day, month, year)
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

Notice that most the identifiers are consistent to a ```tuple```:

In [13]:
dir2(DateTuple, tuple, consistent_only=True)

{'method': ['count', 'index'],
 'datamodel_attribute': ['__doc__'],
 'datamodel_method': ['__add__',
                      '__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getnewargs__',
                      '__getstate__',
                      '__gt__',
                      '__hash__',
                      '__init__',
                      '__init_subclass__',
                      '__iter__',
                      '__le__',
                      '__len__',
                      '__lt__',
                      '__mul__',
                      '__ne__',
                      '__new__',
                      '__reduce__',
                      '__reduce_ex__',
         

However there are additional identifiers. Notice that each of the field names are present as attributes. There are a handful of identifiers that begin with an underscore. Normally prefixing an identifier with an underscore indicates an internal identifier but in the case of a named ```tuple```, the underscore is used to distinguish the identifiers from the field names:

In [14]:
dir2(DateTuple, tuple, unique_only=True)

{'attribute': ['day', 'month', 'year'],
 'datamodel_attribute': ['__match_args__', '__module__', '__slots__'],
 'internal_attribute': ['_field_defaults', '_fields'],
 'internal_method': ['_asdict', '_make', '_replace']}


The method resolution order can be examined using:

In [15]:
DateTuple.mro()

[__main__.DateTuple, tuple, object]

Here ```DateTuple``` is seen to be a subclass of a ```tuple``` which is in turn a subclass of an ```object``` and if ```help``` is used only a handful of methods are defined in ```DateTuple``` meaning most the methods are inherited from the ```tuple``` and therefore can be used in an identical manner:

In [16]:
help(DateTuple)

Help on class DateTuple in module __main__:

class DateTuple(builtins.tuple)
 |  DateTuple(day, month, year)
 |
 |  DateTuple(day, month, year)
 |
 |  Method resolution order:
 |      DateTuple
 |      builtins.tuple
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.
 |
 |  _replace(self, /, **kwds)
 |      Return a new DateTuple object replacing specified fields with new values
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  _make(iterable) from builtins.type
 |      Make a new DateTuple object from a sequence or iterable
 |
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |
 |  __new__(_cls, day, month,

The identifiers from a ```tuple``` behave identically and do not need to be revised. 

Three instances can be instantiated:

In [17]:
day1 = DateTuple(day=4, month=3, year=2024)
day2 = DateTuple(day=3, month=4, year=2024)
day3 = DateTuple(day=4, month=5, year=2024)

The formal representation of one of these ```DateTuple``` instances, matches that of the initialisation signature:

In [18]:
repr(day1)

'DateTuple(day=4, month=3, year=2024)'

In [19]:
str(day2)

'DateTuple(day=3, month=4, year=2024)'

And the instances can be seen under variables:

In [20]:
variables(['day1', 'day2', 'day3'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
day1,DateTuple,3,"DateTuple(day=4, month=3, year=2024)"
day2,DateTuple,3,"DateTuple(day=3, month=4, year=2024)"
day3,DateTuple,3,"DateTuple(day=4, month=5, year=2024)"


The attributes ```_fields``` and ```_field_defaults``` returns a ```tuple``` of field names and a ```dict``` where the ```keys``` are the field names and the ```values``` are the default values:

In [21]:
day1._fields

('day', 'month', 'year')

In [22]:
day1._field_defaults

{}

The ```_asdict``` method casts the ```DateTuple``` to a ```dict``` instance:

In [23]:
day1._asdict()

{'day': 4, 'month': 3, 'year': 2024}

The alternative constructor ```_make``` is used to cast a ```tuple``` to a ```DateTuple```:

In [24]:
day1._make((5, 1, 3))

DateTuple(day=5, month=1, year=3)

The alternative constructor ```_replace``` can be used to create another ```DateTuple``` instance based on the existing ```DateTuple``` instance with one or more of the fields replaced:

In [25]:
day1._replace(day=5)

DateTuple(day=5, month=3, year=2024)

The named parameter ```defaults``` can be assigned to a ```dict``` of default values:

In [26]:
DateTuple = namedtuple('DateTuple', ['day', 'month', 'year'], defaults=(1, 1, 2024))

In [27]:
day1 = DateTuple(day=4, month=3, year=2024)
day2 = DateTuple(day=3, month=4)
day3 = DateTuple()

Notice the unfilled fields now take on their default value:

In [28]:
variables(['day1', 'day2', 'day3'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
day1,DateTuple,3,"DateTuple(day=4, month=3, year=2024)"
day2,DateTuple,3,"DateTuple(day=3, month=4, year=2024)"
day3,DateTuple,3,"DateTuple(day=1, month=1, year=2024)"


And the ```dict``` instance ```_field_defaults``` is now not empty:

In [29]:
day1._field_defaults

{'day': 1, 'month': 1, 'year': 2024}

When ```rename=True```, field names that are not valid identifiers (for example the field names with spaces) will be renamed using their numeric value:

In [30]:
BadDateTuple = namedtuple('DateTuple', [' day', 'month', ' year'], defaults=(1, 1, 2024), rename=True)

In [31]:
BadDateTuple._fields

('_0', 'month', '_2')

Notice that the field names are provided as a ```list``` instance and the default values are provided as a ```tuple```. This syntax is deliberate:

```python
DateTuple = namedtuple('DateTuple', ['day', 'month', 'year'], defaults=(1, 1, 2024))
```

There is a complementary uppercase ```NamedTuple``` factory function from the ```typing``` module which can be used to assign the expected datatype of each field. Notice that field names is now a ```list``` instance of 2-element ```tuple```instances, the first value is the field name and the second value is the type:

In [32]:
from typing import NamedTuple

In [33]:
DateTuple = NamedTuple('DateTuple', [('day', int), ('month', int), ('year', int)])

The type hinted ```DateTuple``` will display the expected datatype for each instance when the docstring is examined:

In [34]:
DateTuple?

[1;31mInit signature:[0m [0mDateTuple[0m[1;33m([0m[0mday[0m[1;33m:[0m [0mint[0m[1;33m,[0m [0mmonth[0m[1;33m:[0m [0mint[0m[1;33m,[0m [0myear[0m[1;33m:[0m [0mint[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      DateTuple(day, month, year)
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

Unfortunately, ```typing.NamedTuple``` does not accept the field ```defaults```.

## deque


The doubly ended queue ```deque``` class was originally designed to be incorporated into ```builtins``` module and is therefore lowercase like ```tuple``` and ```list``` however it was later compartmentalised into the ```collections``` module. It can be imported using: 

In [35]:
from collections import deque

A ```list``` is optimised for operations at the end and has for example the methods ```append``` and ```extend```. The ```deque``` is similar and has the following consistent identifiers to a ```list```:

In [36]:
dir2(deque, list, consistent_only=True)

{'method': ['append',
            'clear',
            'copy',
            'count',
            'extend',
            'index',
            'insert',
            'pop',
            'remove',
            'reverse'],
 'datamodel_attribute': ['__doc__', '__hash__'],
 'datamodel_method': ['__add__',
                      '__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__delitem__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getstate__',
                      '__gt__',
                      '__iadd__',
                      '__imul__',
                      '__init__',
                      '__init_subclass__',
                      '__iter__',
                      '__le__',
                     

However as the name suggests it is doubly ended and optimised for operations at the front and back:

In [37]:
dir2(deque, list, unique_only=True)

{'attribute': ['maxlen'],
 'method': ['appendleft', 'extendleft', 'popleft', 'rotate'],
 'datamodel_attribute': ['__module__'],
 'datamodel_method': ['__copy__']}


```appendleft```, ```extendleft``` and ```popleft``` are counterparts to ```append```, ```extend``` and ```pop``` and assumes the ```Collection``` is displayed over a single line (shown in the cell below) and not vertically (shown in the cell output):

In [38]:
[object(), object(), object(), object(), object()]

[<object at 0x20a8154c720>,
 <object at 0x20a8154c660>,
 <object at 0x20a8154c610>,
 <object at 0x20a8154c630>,
 <object at 0x20a8154c5e0>]

The second major difference between a ```list``` and a ```deque``` is that a ```deque``` can be of fixed length and therefore has a ```maxlen``` attribute.

The initialisation signature of the ```deque``` class can be viewed:

In [39]:
deque?

[1;31mInit signature:[0m [0mdeque[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
deque([iterable[, maxlen]]) --> deque object

A list-like sequence optimized for data accesses near its endpoints.
[1;31mFile:[0m           c:\users\phili\anaconda3\envs\vscode-env\lib\collections\__init__.py
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

The ```deque``` takes an iterable as input argument such as a ```list``` or ```tuple``` and has the optional input argument, ```maxlen``` which specifies the maximum length of the ```deque```:

In [40]:
active = [1, 2, 3, 4, 5, 6]
duoactive = deque(active, 9)

In [41]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,6,"[1, 2, 3, 4, 5, 6]"
duoactive,deque,6,"deque([1, 2, 3, 4, 5, 6], maxlen=9)"


Both ```Collections``` have the method ```append``` which can be used to append a single ```object``` to the right of the ```Collection```:

In [42]:
active.append(7) # mutable inplace

In [43]:
duoactive.append(7) # mutable inplace

In [44]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,7,"[1, 2, 3, 4, 5, 6, 7]"
duoactive,deque,7,"deque([1, 2, 3, 4, 5, 6, 7], maxlen=9)"


And the method ```extend``` which can be used to extend the right of the ```Collection``` by multiple elements in an iterable:

In [45]:
active.extend((8, 9, 10)) # mutable inplace

In [46]:
duoactive.extend((8, 9, 10)) # mutable inplace

In [47]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,10,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
duoactive,deque,9,"deque([2, 3, 4, 5, 6, 7, 8, 9, 10], maxlen=9)"


Notice the ```size``` is different, because:

In [48]:
duoactive.maxlen

9

All the ```objects``` in the iterable have been included in ```duoactive``` however once the ```maxlen``` has been exceeded, appending or extending from the right further will result in the first ```object``` from the left being ejected. In this case the number ```1``` has been ejected.

The ```deque``` has the complementary method ```appendleft```:

In [49]:
duoactive.appendleft(-1) # mutable inplace

In [50]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,10,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
duoactive,deque,9,"deque([-1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=9)"


Notice that the value ```10``` furthest on the right was ejected because the ```maxlen``` is exceeded.

And the complementary method ```extendleft```:

In [51]:
duoactive.extendleft([-2, -3, -4]) # mutable inplace

In [52]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,10,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
duoactive,deque,9,"deque([-4, -3, -2, -1, 2, 3, 4, 5, 6], maxlen=9)"


Notice the order of the iterable has been reversed as the first value is essentially left appended, then the second value is left appended and so on.

Both ```Collections``` also have the method ```pop```. Recall this method is mutable but has a return value. ```pop``` by default pops of the last index from the right and returns it:

In [53]:
active.pop() # mutable inplace

10

In [54]:
duoactive.pop() # mutable inplace

6

In [55]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,9,"[1, 2, 3, 4, 5, 6, 7, 8, 9]"
duoactive,deque,8,"deque([-4, -3, -2, -1, 2, 3, 4, 5], maxlen=9)"


For a ```list```, ```pop``` can also be used to pop off an element by specifying an idnex as a positional parameter. For example the value ```2``` is at index ```1```:

In [56]:
active.pop(1) # mutable inplace

2

This positional parameter is not available for the ```deque``` which can only use ```pop``` to pop from the right or ```popleft``` to op from the left:

In [57]:
duoactive.popleft() # mutable inplace

-4

In [58]:
variables(['active', 'duoactive'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
active,list,8,"[1, 3, 4, 5, 6, 7, 8, 9]"
duoactive,deque,7,"deque([-3, -2, -1, 2, 3, 4, 5], maxlen=9)"


## OrderedDict

Originally a ```dict``` was closer in behaviour to a ```set``` and was unordered:

In [59]:
unique = {'c', 1, 'b', 2, 'a', 1}

In [60]:
unique

{1, 2, 'a', 'b', 'c'}

However in modern versions of Python, a ```dict``` is ordered:

In [61]:
mapping = {'c': 1, 'b': 2, 'a': 1}

In [62]:
mapping

{'c': 1, 'b': 2, 'a': 1}

The ```collections``` module contains an ```OrderedDict``` class which was used to complement the unordered ```dict``` in older versions of Python. However since the default behaviour of a ```dict``` is ordered, the ```OrderedDict``` is generally redundant. The ```OrderedDict``` can be imported:

In [63]:
from collections import OrderedDict

Notice most of its identifiers are consistent with the ```dict```:

In [64]:
dir2(OrderedDict, dict, consistent_only=True)

{'method': ['clear',
            'copy',
            'fromkeys',
            'get',
            'items',
            'keys',
            'pop',
            'popitem',
            'setdefault',
            'update',
            'values'],
 'datamodel_attribute': ['__doc__', '__hash__'],
 'datamodel_method': ['__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__delitem__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getstate__',
                      '__gt__',
                      '__init__',
                      '__init_subclass__',
                      '__ior__',
                      '__iter__',
                      '__le__',
                      '__len__',
                      '__lt__',


The only additional identifiers it has is the datamdoel attribute ```__dict__``` which is used to read of the ```dict``` equivalent as an attribute and the method ```move_to_end``` which is not present in a ```dict```:

In [65]:
dir2(OrderedDict, dict, unique_only=True)

{'method': ['move_to_end'], 'datamodel_attribute': ['__dict__']}


In [66]:
ordered_mapping = OrderedDict({'c': 1, 'b': 2, 'a': 1})

In [67]:
ordered_mapping.move_to_end('b')

In [68]:
variables(['mapping', 'ordered_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
mapping,dict,3,"{'c': 1, 'b': 2, 'a': 1}"
ordered_mapping,OrderedDict,3,"OrderedDict({'c': 1, 'a': 1, 'b': 2})"


## defaultdict

If the following ```mapping``` is instantiated:

In [69]:
mapping = {'red': '#FF0000', 'green': '#00B050', 'blue': '#0070C0'}

A known key can be used to retrieve a value:

In [70]:
mapping['red']

'#FF0000'

An unknown key:

```python
mapping['yellow']
```

will raise a ```KeyError```.

The ```defaultdict``` is a subclass of ```dict``` that has a ```default_factory``` callable. When indexed with a ```key``` that doesn't exist in the current keys, the ```key```: ```default_factory()``` pair is added, instead of flagging up a ```KeyError```. 

Like the ```deque``` class, the ```defaultdict``` was originally designed to be incorporated into ```builtins``` module however it was later compartmentalised into the ```collections``` module. It is therefore lowercase indicating how closely associated it is with a ```builtins``` class. It can be imported using:

In [71]:
from collections import defaultdict

Most of the identifiers in the ```defaultdict``` are consistent with the ```dict``` class:

In [72]:
dir2(defaultdict, dict, consistent_only=True)

{'method': ['clear',
            'copy',
            'fromkeys',
            'get',
            'items',
            'keys',
            'pop',
            'popitem',
            'setdefault',
            'update',
            'values'],
 'datamodel_attribute': ['__doc__', '__hash__'],
 'datamodel_method': ['__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__delitem__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getstate__',
                      '__gt__',
                      '__init__',
                      '__init_subclass__',
                      '__ior__',
                      '__iter__',
                      '__le__',
                      '__len__',
                      '__lt__',


There are a few additions, the ```default_factory``` attribute gives details about the factory function that is used to provide a default value. The datamodel method ```__missing__``` is also defined which creates a new ```key``` when a ```key``` is missing setting its ```value``` to the default value:

In [73]:
dir2(defaultdict, dict, unique_only=True)

{'attribute': ['default_factory'],
 'datamodel_attribute': ['__module__'],
 'datamodel_method': ['__copy__', '__missing__']}


Its initialisation signature can be viewed:

In [74]:
defaultdict?

[1;31mInit signature:[0m [0mdefaultdict[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
defaultdict(default_factory=None, /, [...]) --> dict with default factory

The default factory is called without arguments to produce
a new value when a key is not present, in __getitem__ only.
A defaultdict compares equal to a dict with the same items.
All remaining arguments are treated the same as if they were
passed to the dict constructor, including keyword arguments.
[1;31mFile:[0m           c:\users\phili\anaconda3\envs\vscode-env\lib\collections\__init__.py
[1;31mType:[0m           type
[1;31mSubclasses:[0m     FreezableDefaultDict

Its first positional input argument is ```default_factory``` which is a function that is called when a ```key``` is not found. The second positional input argument is a ```dict``` instance:

In [75]:
default_mapping = defaultdict(str, {'red': '#FF0000', 'green': '#00B050', 'blue': '#0070C0'})

The ```default_factory``` function reference above is the ```str``` class:

In [76]:
str

str

When a ```key``` is missing, it is called and provides an empty ```str``` which is used for the ```value```: 

In [77]:
str()

''

```'red'``` and ```'green'``` are existing ```keys``` and can be indexed into ```default_mapping``` to retrieve their respective value:

In [78]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,3,"defaultdict(<class 'str'>, {'red': '#FF0000', 'green': '#00B050', 'blue': '#0070C0'})"


In [79]:
default_mapping['red']

'#FF0000'

In [80]:
default_mapping['green']

'#00B050'

```'yellow'``` is not a value and therefore the ```default_factory``` is called to provide the value which is the empty ```str``` instance:

In [81]:
default_mapping['yellow']

''

This is now added to ```default_mapping```:

In [82]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,4,"defaultdict(<class 'str'>, {'red': '#FF0000', 'green': '#00B050', 'blue': '#0070C0', 'yellow': ''})"


In other cases it may be more useful to set ```default_factory``` to the ```list``` class. Recall when the ```list``` class is called, an empty ```list``` is returned:

In [83]:
default_mapping = defaultdict(list, {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1]})

In [84]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,3,"defaultdict(<class 'list'>, {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1]})"


In [85]:
default_mapping['red']

[1, 0, 0]

In [86]:
default_mapping['green']

[0, 1, 0]

In [87]:
default_mapping['yellow']

[]

In [88]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,4,"defaultdict(<class 'list'>, {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1], 'yellow': []})"


Because this is an empty ```list``` by default, ```list``` methods can be called from it:

In [89]:
default_mapping['black'].extend([0, 0, 0])

In [90]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,5,"defaultdict(<class 'list'>, {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1], 'yellow': [], 'black': [0, 0, 0]})"


```default_factory``` can be assigned to an anonymous function using a ```lambda``` expression. Recall the general form for a ```lambda``` expression is:

```python
default_factory = lambda input0, input1, ... : value
```

And recall ```default_factory``` is supplied positionally within the initialisation signature of the ```defaultdefaultdict``` and therefore this is normally simplified down to just return a default value:

```python
lambda : value
```


Instead of an empty ```list```, a 3 element ```list``` of zero values can be provided as a default:

In [91]:
default_mapping = defaultdict(lambda : [0, 0, 0], {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1]})

In [92]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,3,"defaultdict(<function <lambda> at 0x0000020A829BCC20>, {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1]})"


In [93]:
default_mapping['red']

[1, 0, 0]

In [94]:
default_mapping['green']

[0, 1, 0]

In [95]:
default_mapping['yellow']

[0, 0, 0]

In [96]:
default_mapping['black']

[0, 0, 0]

In [97]:
variables(['default_mapping'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default_mapping,defaultdict,5,"defaultdict(<function <lambda> at 0x0000020A829BCC20>, {'red': [1, 0, 0], 'green': [0, 1, 0], 'blue': [0, 0, 1], 'yellow': [0, 0, 0], 'black': [0, 0, 0]})"


Setting up a default behaviour like this can make it a bit safer for operations such as looping which would otherwise display index errors when the default values aren't setup to the expected 3-element ```list```:

In [98]:
for color in default_mapping:
    red_channel = default_mapping[color][0]
    green_channel = default_mapping[color][1]
    blue_channel = default_mapping[color][2]
    print(red_channel, green_channel, blue_channel)

1 0 0
0 1 0
0 0 1
0 0 0
0 0 0


## Counter

It is quite common to count the number of occurrences in an ```iterable```. If the following ```str``` instance is examined:

In [99]:
text = 'hello world!'

It can be cast to a ```set```, to view the unique letters:

In [100]:
unique = set(text)

In [101]:
unique

{' ', '!', 'd', 'e', 'h', 'l', 'o', 'r', 'w'}

And a ```dict``` instance can be instantiated using the alternative constructor ```fromkeys``` using the ```set``` instance ```unique``` and the constant ```value``` of ```0```: 

In [102]:
frequency = dict.fromkeys(unique, 0)

In [103]:
frequency

{'h': 0, ' ': 0, 'd': 0, 'l': 0, 'e': 0, 'o': 0, '!': 0, 'r': 0, 'w': 0}

The frequency of each letter in the word can be counted using a ```for``` loop:

In [104]:
for letter in text:
    frequency[letter] += 1

In [105]:
frequency

{'h': 1, ' ': 1, 'd': 1, 'l': 3, 'e': 1, 'o': 2, '!': 1, 'r': 1, 'w': 1}

This can obtained more conveniently using a ```Counter```:

In [106]:
from collections import Counter

The ```Counter``` is a ```dict``` subclass and has the following consistent identifiers:

In [107]:
dir2(Counter, dict, consistent_only=True)

{'method': ['clear',
            'copy',
            'fromkeys',
            'get',
            'items',
            'keys',
            'pop',
            'popitem',
            'setdefault',
            'update',
            'values'],
 'datamodel_attribute': ['__doc__', '__hash__'],
 'datamodel_method': ['__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__delitem__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getstate__',
                      '__gt__',
                      '__init__',
                      '__init_subclass__',
                      '__ior__',
                      '__iter__',
                      '__le__',
                      '__len__',
                      '__lt__',


It also has a number of additional identifiers, that behave similar to their counterparts in a ```set```:

In [108]:
dir2(Counter, dict, unique_only=True)

{'method': ['elements', 'most_common', 'subtract', 'total'],
 'datamodel_attribute': ['__dict__', '__module__', '__weakref__'],
 'datamodel_method': ['__add__',
                      '__and__',
                      '__iadd__',
                      '__iand__',
                      '__isub__',
                      '__missing__',
                      '__neg__',
                      '__pos__',
                      '__sub__'],
 'internal_method': ['_keep_positive']}


The initialisation signature of the ```Counter``` class can be examined:

In [109]:
Counter?

[1;31mInit signature:[0m [0mCounter[0m[1;33m([0m[0miterable[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m**[0m[0mkwds[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Dict subclass for counting hashable items.  Sometimes called a bag
or multiset.  Elements are stored as dictionary keys and their counts
are stored as dictionary values.

>>> c = Counter('abcdeabcdabcaba')  # count elements from a string

>>> c.most_common(3)                # three most common elements
[('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c)                       # list all unique elements
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements()))   # list elements with repetitions
'aaaaabbbbcccdde'
>>> sum(c.values())                 # total of all counts
15

>>> c['a']                          # count of letter 'a'
5
>>> for elem in 'shazam':           # update counts from an iterable
...     c[elem] += 1                # by adding 1 to each element's coun

And in the example above, the simplification can be made:

In [110]:
frequency2 = Counter('hello world!')

In [111]:
frequency2

Counter({'l': 3,
         'o': 2,
         'h': 1,
         'e': 1,
         ' ': 1,
         'w': 1,
         'r': 1,
         'd': 1,
         '!': 1})

Notice the slight difference, in the two variables. ```frequency1``` has a random order of elements as the ```dict``` was instantiated using a ```set``` and the ```set``` was unordered. ```frequency2``` on the other hand is listed in descending order by the value:

In [112]:
variables(['frequency', 'frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency,dict,9,"{'h': 1, ' ': 1, 'd': 1, 'l': 3, 'e': 1, 'o': 2, '!': 1, 'r': 1, 'w': 1}"
frequency2,Counter,9,"Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1})"


The ```most_common``` method returns a ```list``` of 2 element ```tuple``` instances where the 1st element is the ```key``` and the second element is the ```value```. This is similar to ```items``` however ```most_common``` lists the items by reverse order of the values and ```items``` lists the items by insertion order:

In [113]:
frequency2.most_common()

[('l', 3),
 ('o', 2),
 ('h', 1),
 ('e', 1),
 (' ', 1),
 ('w', 1),
 ('r', 1),
 ('d', 1),
 ('!', 1)]

In [114]:
frequency2.items()

dict_items([('h', 1), ('e', 1), ('l', 3), ('o', 2), (' ', 1), ('w', 1), ('r', 1), ('d', 1), ('!', 1)])

Because each ```value``` is always an ```int```, the ```Counter``` has additional methods such as ```total``` which counts the sum of the values:

In [115]:
frequency2.total()

12

In [116]:
len(text)

12

The mutable method ```subtract``` can be used to subtract values from a ```Counter``` instance using another iterable such as a second ```str``` instance:

In [117]:
frequency2.subtract('hello') # mutable inplace

In [118]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,9,"Counter({'l': 1, 'o': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1, 'h': 0, 'e': 0})"


If subtracted again there are positive and negative values:

In [119]:
frequency2.subtract('hello') # mutable inplace

In [120]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,9,"Counter({' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1, 'o': 0, 'h': -1, 'e': -1, 'l': -1})"


The unitary immutable datamodel identifiers ```__pos__```, ```__neg__``` are defined allowing use of the unitary operators ```+``` and ```-``` respectively. These return ```Counter``` instances of values that are greater than ```0``` and less than ```0``` respectively. Both these instances only display positive values:

In [121]:
+frequency2

Counter({' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1})

In [122]:
-frequency2

Counter({'h': 1, 'e': 1, 'l': 1})

The binary datamodel methods ```__add__``` and ```__sub__``` are defined allowing addition or substraction of two Counter instances:

In [123]:
Counter('hello') + Counter('bye')

Counter({'e': 2, 'l': 2, 'h': 1, 'o': 1, 'b': 1, 'y': 1})

In [124]:
Counter('hello') - Counter('bye')

Counter({'l': 2, 'h': 1, 'o': 1})

The inplace binary datamodel methods are also defined:

In [125]:
frequency2 += Counter('hello')

In [126]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,7,"Counter({'l': 1, 'o': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1})"


In [127]:
frequency2 += Counter('hello')

In [128]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,9,"Counter({'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1, 'h': 1, 'e': 1})"


In [129]:
frequency2 -= Counter('hello')

In [130]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,7,"Counter({'l': 1, 'o': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1})"


The binary operator ```__and__``` is also defined, allowing the ```&``` operator to be used which returns a ```Counter``` of elements that are common in both ```Counter``` instances:

In [131]:
Counter('hello') & Counter('byebye')

Counter({'e': 1})

The inplace binary operator ```__iand__``` is also defined allowing use of ```&=```.

Recall that a ```key``` is essentially an instance name:

```python
frequency2['r']
```

And retrieves a ```value``` which in the case of a ```Counter``` is an ```int``` instance. It is common to use ```int``` inplace assignment. Recall an ```int``` instance is immutable so this involves two separate steps, a calculation using the original instance and then reassignment of the instance name to the new value calculated:


In [132]:
frequency2['r'] += 3

In [133]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,7,"Counter({'r': 4, 'l': 1, 'o': 1, ' ': 1, 'w': 1, 'd': 1, '!': 1})"


The datamodel method ```__missing__``` is also defined for a ```Counter``` and is used when a ```key``` that doesn't exist is accessed and assumes the ```value``` is ```0```:

In [134]:
frequency2['z']

0

This isn't added to the Counter until its ```value``` has been made non-zero:

In [135]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,7,"Counter({'r': 4, 'l': 1, 'o': 1, ' ': 1, 'w': 1, 'd': 1, '!': 1})"


In [136]:
frequency2['z'] += 1

In [137]:
variables(['frequency2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
frequency2,Counter,8,"Counter({'r': 4, 'l': 1, 'o': 1, ' ': 1, 'w': 1, 'd': 1, '!': 1, 'z': 1})"


The ```elements``` method returns an iterator of the values:

In [138]:
forward = frequency2.elements()

In [139]:
forward

<itertools.chain at 0x20a829f34f0>

```next``` can be used to step through this iterator one letter at a time:

In [140]:
next(forward)

'l'

In [141]:
next(forward)

'o'

Alternatively it can be cast into a ```tuple```, exhausting all remaining letters:

In [142]:
tuple(forward)

(' ', 'w', 'r', 'r', 'r', 'r', 'd', '!', 'z')

A ```Counter``` instance can also be initialised from a ```dict``` of keys and ```int``` values:

In [143]:
Counter({'a': 5, 'b': 7, 'c': 14})

Counter({'c': 14, 'b': 7, 'a': 5})

Or by using keyword input arguments assigned to ```int``` instances:

In [144]:
Counter(a=5, b=7, c=14)

Counter({'c': 14, 'b': 7, 'a': 5})

A ```Counter``` can be used for another iterable such as a ```tuple```:

In [145]:
colors = ('red', 'red', 'red', 
          'blue', 'blue', 
          'green')

In [146]:
Counter(colors)

Counter({'red': 3, 'blue': 2, 'green': 1})

## ChainMap

A ```ChainMap``` is used to chain ```dict``` instances together. It can be imported using:

In [147]:
from collections import ChainMap

The identifiers of the ```ChainMap``` are consistent with a ```dict```:

In [148]:
dir2(ChainMap, dict, consistent_only=True)

{'method': ['clear',
            'copy',
            'fromkeys',
            'get',
            'items',
            'keys',
            'pop',
            'popitem',
            'setdefault',
            'update',
            'values'],
 'datamodel_attribute': ['__doc__', '__hash__', '__reversed__'],
 'datamodel_method': ['__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__delitem__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getstate__',
                      '__gt__',
                      '__init__',
                      '__init_subclass__',
                      '__ior__',
                      '__iter__',
                      '__le__',
                      '__len__',
                

However it has the attribute ```parents``` which return the parent ```dict``` instances and ```__dict__``` which can be used to cast the ```ChainMap``` to a ```dict```:

In [149]:
dir2(ChainMap, dict, unique_only=True)

{'attribute': ['parents'],
 'method': ['new_child'],
 'datamodel_attribute': ['__abstractmethods__',
                         '__dict__',
                         '__module__',
                         '__slots__',
                         '__weakref__'],
 'datamodel_method': ['__bool__', '__copy__', '__missing__'],
 'internal_attribute': ['_MutableMapping__marker', '_abc_impl']}


The initialisation signature of the ```ChainMap``` class can be examined:

In [150]:
ChainMap?

[1;31mInit signature:[0m [0mChainMap[0m[1;33m([0m[1;33m*[0m[0mmaps[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
A ChainMap groups multiple dicts (or other mappings) together
to create a single, updateable view.

The underlying mappings are stored in a list.  That list is public and can
be accessed or updated using the *maps* attribute.  There is no other
state.

Lookups search the underlying mappings successively until a key is found.
In contrast, writes, updates, and deletions only operate on the first
mapping.
[1;31mInit docstring:[0m
Initialize a ChainMap by setting *maps* to the given mappings.
If no mappings are provided, a single empty dictionary is used.
[1;31mFile:[0m           c:\users\phili\anaconda3\envs\vscode-env\lib\collections\__init__.py
[1;31mType:[0m           ABCMeta
[1;31mSubclasses:[0m     DeepChainMap

There is a variable number of positional input arguments ```*maps``` which correspond to a variable number of maps to be chained. Maps are normally ```dict``` instances but can be ```defaultdict``` and ```Counter``` instances.

A common use case is combining a ```dict``` instance of default options:

In [151]:
default = {'textcolor': '#000000', 
           'font': 'Times New Roman', 
           'fontsize': 12}

With one for user preferences:

In [152]:
settings = {'font': 'Arial Black'}

The total configuration is therefore:

In [153]:
config = ChainMap(settings, default)

In [154]:
variables(['default', 'settings', 'config'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default,dict,3,"{'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12}"
settings,dict,1,{'font': 'Arial Black'}
config,ChainMap,3,"ChainMap({'font': 'Arial Black'}, {'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12})"


If this ```ChainMap``` instance is ```cast``` to a ```dict```, notice that the ```dict``` returned uses the order of the parent mapping ```default``` but changes the values to those defined in ```settings``` when applicable:

In [155]:
config2 = dict(config)

In [156]:
variables(['default', 'settings', 'config', 'config2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default,dict,3,"{'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12}"
settings,dict,1,{'font': 'Arial Black'}
config,ChainMap,3,"ChainMap({'font': 'Arial Black'}, {'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12})"
config2,dict,3,"{'textcolor': '#000000', 'font': 'Arial Black', 'fontsize': 12}"


The ```ChainMap``` instance ```config``` is setup to behaves as the explicitly case ```dict``` instance ```config2``` by default for example when looping:

In [157]:
for key in config:
    print(key, config[key])

textcolor #000000
font Arial Black
fontsize 12


It has the attributes ```maps``` which return details about the maps used to make the ```ChainMap```:

In [158]:
config.maps

[{'font': 'Arial Black'},
 {'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12}]

The map at index 0 is the primary map, in this case the ```dict``` instance ```settings```. The ```ChainMap``` is linked to this primary instance. If a new item is added to ```settings```, and another new item is added to ```config```:

In [159]:
settings['bordercolor'] = 'red'

In [160]:
config['borderwidth'] = 5

Notice that the item added to the parent map ```settings``` is also added to the ```ChainMap``` instance ```config``` and the item added to the ```ChainMap``` instance is also added to the parent map ```settings```. In contrast the ```dict``` instance ```config2``` that was previously cast from the ```ChainMap``` instance ```config``` is not updated:

In [161]:
variables(['default', 'settings', 'config', 'config2'])

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
default,dict,3,"{'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12}"
settings,dict,3,"{'font': 'Arial Black', 'bordercolor': 'red', 'borderwidth': 5}"
config,ChainMap,5,"ChainMap({'font': 'Arial Black', 'bordercolor': 'red', 'borderwidth': 5}, {'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12})"
config2,dict,3,"{'textcolor': '#000000', 'font': 'Arial Black', 'fontsize': 12}"


```parents``` returns details a ```ChainMap``` instance constructed from all the maps except the primary map:

In [162]:
config.parents

ChainMap({'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12})

i.e. is equivalent to:

In [163]:
 ChainMap(*config.maps[1:])

ChainMap({'textcolor': '#000000', 'font': 'Times New Roman', 'fontsize': 12})

## UserString, UserList, UserDict

```UserList```, ```UserDict```, ```UserString``` behave similarly to the ```list```, ```dict``` and ```str``` classes in ```builtins```:

In [164]:
from collections import UserList, UserDict, UserString

If the method resolution order for these is examined, notice there are a number of abstract base classes showing the design pattern used in each case:

In [165]:
UserList.mro()

[collections.UserList,
 collections.abc.MutableSequence,
 collections.abc.Sequence,
 collections.abc.Reversible,
 collections.abc.Collection,
 collections.abc.Sized,
 collections.abc.Iterable,
 collections.abc.Container,
 object]

In [166]:
UserDict.mro()

[collections.UserDict,
 collections.abc.MutableMapping,
 collections.abc.Mapping,
 collections.abc.Collection,
 collections.abc.Sized,
 collections.abc.Iterable,
 collections.abc.Container,
 object]

In [167]:
UserString.mro()

[collections.UserString,
 collections.abc.Sequence,
 collections.abc.Reversible,
 collections.abc.Collection,
 collections.abc.Sized,
 collections.abc.Iterable,
 collections.abc.Container,
 object]

These are the three most commonly ```Collections``` that are routinely subclassed from the ```builtins``` module. When subclassing it is recommended to use the counterparts ```UserList```, ```UserDict``` and ```UserString``` which simplify the initialisation signature and are configured to use abstract base classes in the method resolution order. 

In [168]:
UserString?

[1;31mInit signature:[0m [0mUserString[0m[1;33m([0m[0mseq[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
All the operations on a read-only sequence.

Concrete subclasses must override __new__ or __init__,
__getitem__, and __len__.
[1;31mFile:[0m           c:\users\phili\anaconda3\envs\vscode-env\lib\collections\__init__.py
[1;31mType:[0m           ABCMeta
[1;31mSubclasses:[0m     

Recall that a ```list```, ```dict``` and ```str``` are typically instantiated shorthand using square brackets, braces and quotations respectively but the initialisation signature of the class can be used directly:

In [169]:
'hello'

'hello'

In [170]:
str('hello')

'hello'

A custom class, ```HelloString``` can use ```UserString``` as a parent class. 

```python
class HelloString(UserString):
    """str subclass with 'hello ' prefix"""
    pass
```

If the initialisation signature ```__init__``` is defined, then the initialisation signature in the child class will be used instead of the version defined in the parent class:

```python
class HelloString(UserString):
    def __init__(self, value):
        pass
```

Instead of passing, the initialisation signature of the parent class can be invoked explicitly using ```super()```:

```python
class HelloString(UserString):
    def __init__(self, value):
        super(HelloString, self).__init__(value)

```

Because in line 1 the parent class is supplied and in line 2 the instance (```self``` means *this instance*) is supplied, the implicit form is usually used:

```python
class HelloString(UserString):
    def __init__(self, value):
        super().__init__(value)

```

The initialisation of the parent class creates an instance ```self``` attribute ```data``` which is accessed from the instance in the form ```self.data```. This attribute can be reassigned to include a ```'hello '``` prefix:

```python
class HelloString(UserString):
    def __init__(self, value):
        super(HelloString, self).__init__(value)
        self.data = f'hello {self.data}'
```

The ```__getitem__``` instance datamodel method is used for indexing using an index with square brackets. By default it will index into ```self.data``` provided by the updated initialisation signature, which recall includes the ```'hello '``` prefix. This is not wanted in the index and the prefix can be removed for indexing purposes:

```python
class HelloString(UserString):
    def __init__(self, value):
        super(HelloString, self).__init__(value)
        self.data = f'hello {self.data}'

    def __getitem__(self, index):
        return self.data.removeprefix('')[index]
```

The ```__len__``` instance datamodel method by default returns the length of ```self.data``` which includes the prefix. The length of the characters in the prefix are not wanted and can be removed:

```python
class HelloString(UserString):
    def __init__(self, value):
        super(HelloString, self).__init__(value)
        self.data = f'hello {self.data}'

    def __getitem__(self, index):
        return self.data.removeprefix('')[index]

    def __len__(self):
        return len(self.data) - len('hello ')
```

The class can be declared using:

In [171]:
class HelloString(UserString):
    """str subclass with 'hello ' prefix"""
    def __init__(self, value):
        super(HelloString, self).__init__(value)
        self.data = f'hello {self.data}'

    def __getitem__(self, index):
        return self.data.removeprefix('')[index]

    def __len__(self):
        return len(self.data) - len('hello ')

Its method resolution order can be examined:

In [172]:
HelloString.mro()

[__main__.HelloString,
 collections.UserString,
 collections.abc.Sequence,
 collections.abc.Reversible,
 collections.abc.Collection,
 collections.abc.Sized,
 collections.abc.Iterable,
 collections.abc.Container,
 object]

Its initialisation signature can be examined:

In [173]:
HelloString?

[1;31mInit signature:[0m [0mHelloString[0m[1;33m([0m[0mvalue[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      str subclass with 'hello ' prefix
[1;31mType:[0m           ABCMeta
[1;31mSubclasses:[0m     

And an instance can be created using:

In [174]:
greeting = HelloString('world')

The value of ```greeting``` can be seen in a cell:

In [175]:
greeting

'hello world'

The letter at index ```0``` can be retrieved, recalling that ```__getitem__``` ignores the ```'hello '``` prefix:

In [176]:
greeting[0]

'h'

The length can be determined, recalling that ```__len__``` ignores the ```'hello '``` prefix:

In [177]:
len(greeting)

5

If ```help``` is used:

In [178]:
help(HelloString)

Help on class HelloString in module __main__:

class HelloString(collections.UserString)
 |  HelloString(value)
 |
 |  str subclass with 'hello ' prefix
 |
 |  Method resolution order:
 |      HelloString
 |      collections.UserString
 |      collections.abc.Sequence
 |      collections.abc.Reversible
 |      collections.abc.Collection
 |      collections.abc.Sized
 |      collections.abc.Iterable
 |      collections.abc.Container
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __getitem__(self, index)
 |
 |  __init__(self, value)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __len__(self)
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __abstractmethods__ = frozenset()
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from collections.UserString:
 |
 |  __add__(self, other)
 |
 |  __complex__(self)
 |
 |  __cont

Notice ```HelloString``` will use the 3 methods defined in ```HelloString``` but also has access to all the other methods defined in ```UserString```. For example the method ```removesuffix``` can be used to remove the last ```'d'```:

In [179]:
greeting.removesuffix('d')

'hello hello worl'

Notice because this is an immutable method that returns a new ```HelloString``` instance an additional ```'hello '``` prefix is supplied (one from the new instance, and another from the original instance). The class could be tailored to only add the prefix when it doesn't already exist:

In [180]:
class HelloString(UserString):
    """str subclass with 'hello ' prefix"""
    def __init__(self, value):
        super(HelloString, self).__init__(value)
        if not self.data.startswith('hello '):
            self.data = f'hello {self.data}'

    def __getitem__(self, index):
        return self.data.removeprefix('')[index]

    def __len__(self):
        return len(self.data) - len('hello ')

In [181]:
greeting = HelloString('world')

In [182]:
greeting

'hello world'

In [183]:
greeting.removesuffix('d')

'hello worl'

Typically the class would need to be further tested for all of the behaviour desired and any other method amended accordingly.

[Return to Python Tutorials](../readme.md)