# The Collections Module

Python has a ```collections``` module which has a number of supplementary collections to those explored in Pythons ```builtins```. 

## Categorize_Identifiers Module

This notebook will use the following functions ```dir2```, ```variables``` and ```view``` in the custom module ```categorize_identifiers``` which is found in the same directory as this notebook file. ```dir2``` is a variant of ```dir``` that groups identifiers into a ```dict``` under categories and ```variables``` is an IPython based a variable inspector. ```view``` is used to view a ```Collection``` in more detail:

In [1]:
from categorize_identifiers import dir2, variables, view

## Importing the Collections Module

The ```collections``` module can be imported using:

In [2]:
import collections

The identifiers for the ```collections``` module can be examined. The identifiers of most interest are the classes, most of these are in CamelCase, however a couple of classes are lowercase as they were originally designed to be incorporated into ```builtins```:

In [3]:
dir2(collections)

{'attribute': ['abc'],
 'method': ['namedtuple'],
 'lower_class': ['defaultdict', 'deque'],
 'upper_class': ['ChainMap',
                 'Counter',
                 'OrderedDict',
                 'UserDict',
                 'UserList',
                 'UserString'],
 'datamodel_attribute': ['__all__',
                         '__builtins__',
                         '__cached__',
                         '__doc__',
                         '__file__',
                         '__loader__',
                         '__name__',
                         '__package__',
                         '__path__',
                         '__spec__'],
 'internal_attribute': ['_collections_abc', '_sys'],
 'internal_method': ['_Link',
                     '_OrderedDictItemsView',
                     '_OrderedDictKeysView',
                     '_OrderedDictValuesView',
                     '_chain',
                     '_count_elements',
                     '_deque_iterator',
                 

The modules docstring gives a quick overview of these classes:

In [4]:
collections?

[1;31mType:[0m        module
[1;31mString form:[0m <module 'collections' from 'C:\\Users\\phili\\Anaconda3\\envs\\vscode-env\\Lib\\collections\\__init__.py'>
[1;31mFile:[0m        c:\users\phili\anaconda3\envs\vscode-env\lib\collections\__init__.py
[1;31mDocstring:[0m  
This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict  dict subclass that remembers the order entries were added
* defaultdict  dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list ob

## NamedTuple

A ```tuple``` can be conceptualised as an immutable archive of records. Each record only has a numeric index associated with the value and doesn't have a field name to describe what the value is. For example in the following ```tuple``` it is hard to distinguish what field is what:

In [5]:
(4, 3, 2023)

(4, 3, 2023)

Some more details may be deduced from the variable name:

In [6]:
date = (4, 3, 2023)

However it is still unclear whether the 4 is the day (UK format) or the month (US format). In such a case, a dictionary may be more appropriate. The ```dict``` class can be used to instantiate the dictionary:

In [7]:
date = dict(day=4, month=3, year=2023)

In [8]:
date

{'day': 4, 'month': 3, 'year': 2023}

When dealing with a large number of archives, the keys need to be specified every time:

In [9]:
day1 = dict(day=4, month=3, year=2024)
day2 = dict(day=3, month=4, year=2024)
day3 = dict(day=4, month=5, year=2024)

And there is no check to make sure the keys are input consistently:

In [10]:
day4 = dict(d=15, m=3, y=2023)

It is more convenient to group all of the data above into a subclass that has a data structure similar to a ```tuple``` but is fixed length and associates each record with an appropriate field name. 

```namedtuple``` is a factory function that creates a ```NamedTuple``` subclass; essentially a ```tuple``` based data structure or template that in this case has only three fields the  ```'day'```, ```'month'``` and ```'year'``` respectively. Once this subclass is created, it can be instantiated for each date above. 

Do not confuse the factory function ```namedtuple``` which is lower case and the abstract class ```NamedTuple``` which is CamelCase. Note that the abstract class ```NamedTuple``` isn't used directly, instead subclasses with a predefined number of fields and field names are created.

The ```namedtuple``` factory function can be imported from the collections module using:

In [11]:
from collections import namedtuple

The docstring of the ```namedtuple``` factory function can be viewed:

In [12]:
namedtuple?

[1;31mSignature:[0m
[0mnamedtuple[0m[1;33m([0m[1;33m
[0m    [0mtypename[0m[1;33m,[0m[1;33m
[0m    [0mfield_names[0m[1;33m,[0m[1;33m
[0m    [1;33m*[0m[1;33m,[0m[1;33m
[0m    [0mrename[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mdefaults[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mmodule[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Returns a new subclass of tuple with named fields.

>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point.__doc__                   # docstring for the new class
'Point(x, y)'
>>> p = Point(11, y=22)             # instantiate with positional args or keywords
>>> p[0] + p[1]                     # indexable like a plain tuple
33
>>> x, y = p                        # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y                       # fields also accessible by name
33
>>> d = p._asdict()                 # convert to 

For example the ```NamedTuple``` subclass ```DateTuple``` can be created using:

In [13]:
DateTuple = namedtuple('DateTuple', ['day', 'month', 'year'])

The return value of the ```namedtuple``` factory function returns a custom subclass and PascalCase is used to denote third party classes, in this case the ```class``` name ```DateTuple``` is selected.

The first positional input of the ```namedtuple``` factory function is ```typename``` and this is typically the ```str``` of the ```class``` name, in this case ```'DateTuple'```

The next input argument is ```field_names``` and this typically provided using a ```list``` of ```str``` instances, which will correspond to the field names. The field names must be valid identifier names. 

Now that the factory function has created the ```DateTuple``` (```NamedTuple``` subclass), the docstring of this subclass can be examined. Notice the initialisation signature has the field names:

In [14]:
DateTuple?

[1;31mInit signature:[0m [0mDateTuple[0m[1;33m([0m[0mday[0m[1;33m,[0m [0mmonth[0m[1;33m,[0m [0myear[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      DateTuple(day, month, year)
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

Notice that most the identifiers are consistent to a ```tuple```:

In [15]:
dir2(DateTuple, tuple, consistent_only=True)

{'method': ['count', 'index'],
 'datamodel_attribute': ['__doc__'],
 'datamodel_method': ['__add__',
                      '__class__',
                      '__class_getitem__',
                      '__contains__',
                      '__delattr__',
                      '__dir__',
                      '__eq__',
                      '__format__',
                      '__ge__',
                      '__getattribute__',
                      '__getitem__',
                      '__getnewargs__',
                      '__getstate__',
                      '__gt__',
                      '__hash__',
                      '__init__',
                      '__init_subclass__',
                      '__iter__',
                      '__le__',
                      '__len__',
                      '__lt__',
                      '__mul__',
                      '__ne__',
                      '__new__',
                      '__reduce__',
                      '__reduce_ex__',
         

However there are additional identifiers. Notice that each of the field names are present as attributes. There are a handful of identifiers that begin with an underscore. Normally prefixing an identifier with an underscore indicates an internal identifier but in the case of a named ```tuple```, the underscore is used to distinguish the identifiers from the field names:

In [16]:
dir2(DateTuple, tuple, unique_only=True)

{'attribute': ['day', 'month', 'year'],
 'datamodel_attribute': ['__match_args__', '__module__', '__slots__'],
 'internal_attribute': ['_field_defaults', '_fields'],
 'internal_method': ['_asdict', '_make', '_replace']}


The method resolution order can be examined using:

In [17]:
DateTuple.mro()

[__main__.DateTuple, tuple, object]

Here ```DateTuple``` is seen to be a subclass of a ```tuple``` which is in turn a subclass of an ```object``` and if ```help``` is used only a handful of methods are defined in ```DateTuple``` meaning most the methods are inherited from the ```tuple``` and therefore can be used in an identical manner:

In [18]:
help(DateTuple)

Help on class DateTuple in module __main__:

class DateTuple(builtins.tuple)
 |  DateTuple(day, month, year)
 |
 |  DateTuple(day, month, year)
 |
 |  Method resolution order:
 |      DateTuple
 |      builtins.tuple
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.
 |
 |  _replace(self, /, **kwds)
 |      Return a new DateTuple object replacing specified fields with new values
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  _make(iterable) from builtins.type
 |      Make a new DateTuple object from a sequence or iterable
 |
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |
 |  __new__(_cls, day, month,

The identifiers from a ```tuple``` behave identically and do not need to be revised. 

Three instances can be instantiated:

In [19]:
day1 = DateTuple(day=4, month=3, year=2024)
day2 = DateTuple(day=3, month=4, year=2024)
day3 = DateTuple(day=4, month=5, year=2024)

In [20]:
repr(day1)

'DateTuple(day=4, month=3, year=2024)'

In [21]:
type(day1)

__main__.DateTuple

In [22]:
variables()

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
date,dict,3,"{'day': 4, 'month': 3, 'year': 2023}"
day4,dict,3,"{'d': 15, 'm': 3, 'y': 2023}"


In [23]:
isinstance(day1, tuple)

True

Returning to use of the NamedTuple factory function and adding default values for the fields:

In [None]:
DateTuple = namedtuple('DateTuple', 
                       ('day', 'month', 'year'),
                       defaults=(14, 3, 2023))

DateTuple instances, that is archives with the fields day, month and year can be instantiated. Consistency in their format (UK date format) is maintained by following the docstring. Instances can be created using default values, named parameters or positional parameters:

In [None]:
today = DateTuple()
today

In [None]:
tomorrow = DateTuple(day=15, month=3, year=2023)
tomorrow

In [None]:
yesterday = DateTuple(14, 3, 2023)
yesterday

The field names day, month and year data descriptors can be used to read off each named element:

In [None]:
today.day

In [None]:
today.month

In [None]:
today.year

The internal attributes _fields which returns the tuple of field names and _field_defaults which returns a dictionary where the keys are the field names and the values are the default values

In [None]:
yesterday._fields

In [None]:
yesterday._field_defaults

asdict is a method that returns a dictionary where the keys are the field names and the values are the values

In [None]:
yesterday._asdict()

Note that attempting to cast a NamedTuple subclass into a dictionary using the dict class attempts to convert only the tuple properties into the dictionary and gives a TypeError.

_replace will create a new instance using replaced field names. In this example only the field name year is replaced:

In [None]:
yesterday._replace(year=2024)

\_make is a class method which will create a new instance from a tuple

In [None]:
t1 = (17, 3, 2024)
DateTuple._make(t1)

tuple and dict unpacking can also be used for this purpose:

In [None]:
t1 = (17, 3, 2024)
DateTuple(*t1)

In [None]:
d1 = {'day': 17, 'month': 3, 'year': 2024}
DateTuple(**d1)

The \_\_module\_\_ data model attribute will give the module that the DateTuple class template was defined in.

In [None]:
DateTuple.__module__

The \_\_match_args\_\_ data model attribute is the same as the internal attribute _fields but is used during instantiation to make sure the field names are correct:

In [None]:
DateTuple.__match_args__

In [None]:
yesterday._fields

The data model identifier \_\_slots\_\_ is used for memory management optimisation, saving space in objects and is not typically used by the end user as an attribute:

In [None]:
DateTuple.__slots__

The formal string is updated to match the initialisation signature of the DateTuple. Recall this is defined using the data model identifier \_\_repr\_\_ data model identifier which controls the behaviour of the builtins function repr:

In [None]:
repr(today)

The formal and informal string representation match:

In [None]:
str(today)

The field names provided to the namedtuple factory function, as mentioned should follow the rules behind Python objects. By default, rename is False and therefore when an invalid object name is assigned a ValueError is given. 

If this is assigned to True, the bad names will be renamed:

In [None]:
CustomNamedTuple = namedtuple('CustomNamedTuple', 
                              ('goodname', 'bad name', 'bad name 2', '3bad name'),
                              rename=True)

The bad names will be renamed using an underscore followed by the field index:

In [None]:
CustomNamedTuple._fields

## deque


The doubly ended queue deque class was originally designed to be a builtins class and is therefore lower case like tuple or list.

A list is optimised for operations at the end and has the methods append and extend. The deque is list-like but as the name suggests is doubly ended and optimised for operations at the front and back. It can be imported using:

In [None]:
from collections import deque

The init signature of the deque class can be viewed:

In [None]:
? deque

The deque takes an iterable as input argument such as a list or tuple and has the optional input argument, keylen which specifies the maximum length of the deque:

In [None]:
archive = (1, True, 3.14, 'hello', 'hello', 'bye')
duoactive = deque(archive, 9)

The formal string representation uses the \_\_repr\_\_ data model identifier and controls the behaviour of the builtins function repr:

In [None]:
repr(duoactive)

The formal and informal string representation are the same:

In [None]:
str(duoactive)

The identifiers of the deque class can be examined using:

In [None]:
print(dir(deque), end=' ')

Although the method resolution order of the deque is based directly on an object, it behaves very similarly to a list and is a mutable collection of references:

In [None]:
deque.mro()

The only additional attribute is maxlen:

In [None]:
for identifier in dir(deque):
    isfunction = callable(getattr(deque, identifier))
    isintuple = identifier in dir(list)
    isdatamodel = identifier[0] == '_'
    if (not isfunction and not isdatamodel and not isintuple):
        print(identifier, end=' ')

In [None]:
for identifier in dir(deque):
    isfunction = callable(getattr(deque, identifier))
    isintuple = identifier in dir(list)
    isdatamodel = identifier[0] == '_'
    if (not isfunction and isdatamodel and not isintuple):
        print(identifier, end=' ')

And there are four additional methods:

In [None]:
for identifier in dir(deque):
    isfunction = callable(getattr(deque, identifier))
    isintuple = identifier in dir(list)
    isdatamodel = identifier[0] == '_'
    if (isfunction and not isdatamodel and not isintuple):
        print(identifier, end=' ')

In [None]:
for identifier in dir(deque):
    isfunction = callable(getattr(deque, identifier))
    isintuple = identifier in dir(list)
    isdatamodel = identifier[0] == '_'
    if (isfunction and isdatamodel and not isintuple):
        print(identifier, end=' ')

Three of these methods are left counterparts to the right counterparts:

|left|right|
|---|---|
|appendleft|append|
|extendleft|extend|
|popleft|pop|

In [None]:
duoactive

In [None]:
duoactive.append('hi')

In [None]:
duoactive

In [None]:
duoactive.appendleft('howdy')

In [None]:
duoactive

The rotate method can be used to rotate the references in the deque instance one space to the right by default:

In [None]:
? duoactive.rotate

In [None]:
duoactive.rotate()

Notice that the reference at the end is now at the start:

In [None]:
duoactive

So far the maxlen of the deque has not been exceeded. If two more values are appended:

In [None]:
len(duoactive)

'world' can be appended to reach the maxlen:

In [None]:
duoactive.append('world')

In [None]:
duoactive

If 'earth' is also appended, recall the value is appended at the end which is the right. Because maxlen has now be exceed, appending this value will rotate each item to the left by 1 and eject 'hi' which was at index 0:

In [None]:
duoactive.append('earth')

In [None]:
duoactive

And if the value 'hi' is appended to the left, it will rotate each reference to the right by 1 and eject 'earth':

In [None]:
duoactive.appendleft('hi')

In [None]:
duoactive

Recall append adds a single reference to the end right index, even if that item is a collection. appendleft likewise adds a single reference to the start left index:

In [None]:
duoactive.appendleft(('bye1', 'bye2', 'bye3'))

In [None]:
duoactive

And extend, extends a collection over multiple indexes:

In [None]:
duoactive.extendleft(('bye1', 'bye2', 'bye3'))

In [None]:
duoactive

Notice the order of the collection that was supplied as an input argument is reversed because extendleft was used.

The first value can be popped using leftpop:

In [None]:
duoactive.popleft()

In [None]:
duoactive

Notice that the length of the deque is now 8 (nested collections occupy 1 index) and previous objects ejected are not retrieved:

In [None]:
len(duoactive)

## defaultdict and OrderedDict

The defaultdict and OrderedDict are subclasses of the dict class. This can be seen by importing these collections and examining their method resolution order:

In [None]:
from collections import defaultdict, OrderedDict

In [None]:
defaultdict.mro()

In [None]:
OrderedDict.mro()

In previous versions of Python the dict was unordered, initially designed to be similar to a set and the OrderedDict was added as a supplementary subclass of dict that instead maintained insertion order. 

As this was useful particularly for looping, the dict in Python was updated to maintain insertion order largely making the OrderedDict redundant:

In [None]:
mapping = {'r': 'red', 'g': 'green', 'b': 'blue'}

In [None]:
for key in mapping:
    print(key)

Although the dictionary is ordered, the set is still unordered. This can be seen below by comparing the ordered tuple to the disordered set:

In [None]:
tuple(mapping.keys())

In [None]:
set(mapping.keys())

Recall that it does not make sense to order a set as sets cannot contain duplicates and therefore having an ordered numberic index for each value does not make sense e.g. should 'r' have the first or last index when cast from an ordered collection: 

In [None]:
set(('r', 'r', 'g', 'b', 'r'))

Originally dict keys were more set like as they also have to be unique but having the insertion order is useful for looping.

The defaultdict is a subclass of dict that has a default_factory callable. When indexed with a key that doesn't exist in the current keys, the key: default_factory() pair is added, instead of an KeyError. The behaviour of a dict is shown:


In [None]:
mapping = {'red': '#FF0000', 
           'green': '#00B050', 
           'blue': '#0070C0'}

In [None]:
mapping['red']

Notice the KeyError because 'yellow' is not in Keys:

Instead of instantiating the dict with items. It is possible to instantiate an empty dict and then add items to it:

In [None]:
mapping = {}
mapping['red'] = '#FF0000'
mapping['green'] = '#00B050'
mapping['blue'] = '#0070C0'

In [None]:
mapping

This can be used to examine the workflow of a defaultdict:

In [None]:
? defaultdict

The defaultdict has a single input argument default_factory. This is followed by a / and must be provided positionally. This input argument takes a callable without any arguments. For example, the str class:

In [None]:
mapping = defaultdict(str)

If the directory function dir is used, all the identifiers are identical to that of the dict class with the exception to the identifier default_factory and the data model identifiers \_\_missing\_\_ and \_\_copy\_\_:

In [None]:
for identifier in dir(defaultdict):
    isindict = identifier in dir(dict)
    if (not isindict):
        print(identifier, end=' ')

When attempting to access a missing key, the \_\_missing\_\_ data model is invoked. In the dict class, the missing \_\_datamodel\_\_ method is not defined so a KeyError is given. In the defaultdict, \_\_missing\_\_ calls the provided default_factory callable generating a new key: value pair.

Each key is added and assigned to a value as before:


In [None]:
mapping['red'] = '#FF0000'
mapping['green'] = '#00B050'
mapping['blue'] = '#0070C0'

In [None]:
mapping

When a key is indexed that does not exist, the default_factory callable is used. For example: 


In [None]:
mapping['yellow']

The formal string representation can be examined. Notice it has the class as the first input argument and a dictionary of existing values as the second:

In [None]:
repr(mapping)

In [None]:
mapping = defaultdict(str, {'red': '#FF0000',
                            'green': '#00B050',
                            'blue': '#0070C0'})

In this example the default_factory was str which returned an empty string:

In [None]:
mapping['yellow']

Another example is to have a default_factory that is an empty list:

In [None]:
mapping = defaultdict(list)

Each key is added and assigned to a value:

In [None]:
mapping['a'] = ['apples', 'apricots', 'avocado']

In [None]:
mapping['b'] = ['bananas', 'beetroot']

When a key is indexed that does not exist, the default_factory callable is used giving an empty list:

In [None]:
mapping['c']

In [None]:
mapping

Because this is an empty list, list methods can be called from it:

In [None]:
mapping['d'].append('dragonfruit')

In [None]:
mapping

The default_factory can be assigned to an anonymous function using a lambda expression. Recall the general form is:

The default_factory has to be callable without any input arguments, therefore the form below is used:

And recall default_factory is positionally so has to be supplied as:

A hexadecimal value has the form #rrggbb. A default str can zero these values #000000 and this can be supplied via the return value of a lambda expression:

In [None]:
mapping = defaultdict(lambda : '#000000', {'r': '#FF0000',
                                           'g': '#00B050',
                                           'b': '#0070C0'})

When a key that exists is indexed, the value is returned:

In [None]:
mapping['red']

When a key is indexed that does not exist, the default_factory callable is used which returns the default value from the lambda expression which in this case is '#000000':

In [None]:
mapping['yellow']

In [None]:
mapping

The defaultdict inherits all the identifiers from its parent dictionary class including setdefault. In this context, setdefault is used for a one time default value and does not change the default_factory:

In [None]:
mapping.setdefault('white', '#ffffff')

In [None]:
mapping

In [None]:
mapping['black']

In [None]:
mapping

Note that the formal representation of the defaultdict with a lambda expression does not show what the lambda expression is but merely indicaates there is one:

In [None]:
repr(mapping)

## Counter

In Python it is quite common to count the number of occurrences in an iterable. For example:

In [None]:
text = 'hello world!'

Conventionally this is done by casting text into a set, recalling a set can only contain unique values:

In [None]:
unique = set(text)

In [None]:
unique

An initial value of 0 is used:

In [None]:
value = 0

And a dictionary is instantiated using the alternative constructor: 

In [None]:
frequency = dict.fromkeys(unique, value)

In [None]:
frequency

The frequency of each letter in the word can be counted using a for loop:

In [None]:
for letter in text:
    frequency[letter] += 1

Notice all of the keys are letters and all of the values are integers:

In [None]:
frequency

This can obtained more coveniently using a Counter:

In [None]:
from collections import Counter

The initialisation signature of the Counter subclass can be examined:

In [None]:
? Counter

And in the example above, the simplification can be made:

In [None]:
text = 'hello world!'
frequency = Counter(text)

In [None]:
frequency

The list of identifiers from the Counter subclass can be seen by using:

In [None]:
print(dir(Counter), end= ' ')

And the additional identifiers not in the dict parent class can be examined using:

In [None]:
for identifier in dir(Counter):
    isindict = identifier in dir(dict)
    if (not isindict):
        print(identifier, end=' ')

In [None]:
frequency

most_common returns a list of tuples, similar in form to the dictionary method items, cast to a list:

In [None]:
frequency.most_common()

It has a positional input argument which can be used to specify the number of tuple pairs to return in the list:

In [None]:
frequency.most_common(2)

The total method returns the sum of all the counts in the Counters which in the case of this example should return a value equivalent to the length of the string used to make the Counter:

In [None]:
frequency.total()

In [None]:
len(text)

The elements method returns an iterator of the values:

In [None]:
frequency.elements()

In [None]:
forward = frequency.elements()

These are ordered by the insertion order however duplicate letters are placed beside one another:

In [None]:
tuple(forward)

The method subtract can be used to subtract values from a Counter using another iterable such as a second string:

In [None]:
frequency

In [None]:
frequency.subtract('hello')

In [None]:
frequency

If this is used twice there are positive and negative values:

In [None]:
frequency.subtract('hello')

In [None]:
frequency

The unitary data model identifiers \_\_pos\_\_ and \_\_neg\_\_ are defined and retrieve the positive and negative values respectively:

In [None]:
+frequency

In [None]:
-frequency

_keep_positive is a mutatable equivalent of \_\_pos\_\_. For some reason this method mutates the instance and displays a return value:

In [None]:
frequency._keep_positive()

In [None]:
frequency

 the binary operators \_\_add\_\_, \_\_sub\_\_ and \_\_and\_\_ are also defined:

In [None]:
frequency1 = Counter('hello')

In [None]:
frequency2 = Counter('world')

Counter addition and subtraction perform addition and subtraction but also invoke _keep_positive on the return value and therefore only return positive values:

In [None]:
frequency1 + frequency2

In [None]:
frequency1 - frequency2

In [None]:
frequency1 & frequency2

In [None]:
frequency1.subtract('hello')

In [None]:
frequency1.subtract('hello')

In [None]:
frequency1 

In [None]:
frequency1 + frequency2

In [None]:
frequency2

In [None]:
frequency2 - frequency1

Counter and returns counts that are present in both instances:

In [None]:
Counter('oll') & Counter('lll')

\_\_add\_\_, \_\_sub\_\_ and \_\_add\_\_ are all immutable methods returning a new instance. They have mutable counterparts \_\_iadd\_\_, \_\_isub\_\_ and \_\_iadd\_\_ which mutate the instance in place.

In [None]:
frequency1

In [None]:
frequency2

In [None]:
frequency1 -= frequency2

In [None]:
frequency1

These binaray data model methods should not be confused with the binary methods of an itneger, which can be used when an integer value is returned using an integer key:

In [None]:
frequency1['h'] += 5

In [None]:
frequency1

In [None]:
frequency1['h'] = 7

In [None]:
frequency1

The data model method \_\_missing\_\_ is defined for the COunter. When a letter is indexed that is missing, it is assumed to have a count of 0:

In [None]:
frequency1['z']

The data model method \_\_dict\_\_ is used when the dict class is used to cast the Counter to a dictionary:

In [None]:
dict(frequency1)

The data model attribute, \_\_module\_\_ is used to return the module that the Counter class is defined in, which is collections:

In [None]:
frequency1.__module__

A Counter can be used for another collection such as a tuple:

In [None]:
archive = ('hello', 
           'hello', 
           'world', 
           'world', 
           'world', 
           'earth', 
           1, 
           3.14, 
           True)

In [None]:
Counter(archive)

A Counter can also be initialised from a dicitonary of keys and integer values:

In [None]:
Counter({'a': 5, 'b': 7, 'c': 14})

Or by using keyword input arguments assigned to integers:

In [None]:
Counter(a=5, b=7, c=14)

## ChainMap

A ChainMap is used to chain dictionaries together. It can be imported using:

In [None]:
from collections import ChainMap

The identifiers of the ChainMap can be examined:

In [None]:
print(dir(ChainMap), end=' ')

Its method resolution order can be examined using:

In [None]:
ChainMap.mro()

The ChainMap is a MutatableMapping like a dict and most of the identifiers found in it are consistent to those found in a dict:

In [None]:
for identifier in dir(ChainMap):
    isindict = identifier in dir(dict)
    if (not isindict):
        print(identifier, end=' ')

The docstring of the ChainMap initialisation signature can be examined:

In [None]:
? ChainMap

*maps is a variable number of input arguments. 

A common use case is combining a dictionary with default options:

In [None]:
default = {'textcolor': '#000000', 
           'font': 'Times New Roman', 
           'fontsize': 12}

With one for user preferences:

In [None]:
settings = {'textcolor': '#FF0000'}

A ChainMap is essentially a dict like data structure that takes any setting from setting where available and when not available takes it from default:

In [None]:
config = ChainMap(settings, default)

settings will be the primary dict where custom preferences have been specified and default will be the secondary dict where default preferences are specified:

In [None]:
config

If a key is indexed, it will display the best value. In this case 'textcolor' is defined in both settings and default however since settings takes precedence over default the value from the settings dict is provided:

In [None]:
config['textcolor']

In this case, 'font' is only provided in default and therefore this value is taken:

In [None]:
config['font']

The ChainMap instance config is linked to the two original dictionaries. If a value in settings is changed, it is updated in config:

In [None]:
settings['fontsize'] = 72

In [None]:
config

Likewise if a value in config is changed, it is updated in settings:

In [None]:
config['fontstyle'] = 'italic'

In [None]:
settings

The ChainMap instance config has keys, values and items. The forml representation for these have been updated:

In [None]:
config.keys()

In [None]:
config.values()

In [None]:
config.items()

However when used in a for loop behave identically to their counterpart in the dict class:

In [None]:
for key in config.keys():
    print(f'{key}: {config[key]}')

The attribute parents shows the form of the ChainMap parent in dict-like form. The related attribute maps returns a list of the maps by order:

In [None]:
config.parents

In [None]:
config.maps

The method new_child can be used to create a new ChainMap instance using the new_child map supplied as an input argument as the primary dict and the previous maps as the secondary dict and tertiary maps. For example a company policy can be added:

In [None]:
company_policy = {'logo': 'anaconda'}

In [None]:
config2 = config.new_child(company_policy)

In [None]:
config2

In [None]:
company_policy['border'] = 5

In [None]:
config2

## UserString, UserList, UserDict

UserString, UserList and UserDict behave similarly to the str, list and dict classes in builtins. Their main purpose is user custom  subclassing which will be discussed in a subsequent tutorial.