<h1>collections</h1>

|  | |
|:--|:--| 
|namedtuple() | factory function for creating tuple subclasses with named fields|
|deque|list-like container with fast appends and pops on either end|
|ChainMap|dict-like class for creating a single view of multiple mappings|
|Counter|dict subclass for counting hashable objects|
|OrderedDict|dict subclass that remembers the order entries were added|
|defaultdict|dict subclass that calls a factory function to supply missing values|
|UserDict|wrapper around dictionary objects for easier dict subclassing|
|UserList|wrapper around list objects for easier list subclassing|
|UserString|wrapper around string objects for easier string subclassing|


<a src="https://docs.python.org/3/library/collections.html#:~:text=defaultdict%20is%20a%20subclass%20of,attribute%3B%20it%20defaults%20to%20None%20">Source</a>

## ChainMap objects
A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple update() calls.

In [1]:
# %load command1.py
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity='all'

%config InlineBackend.figure_format='svg'
plt.rcParams['figure.dpi']=120

pd.options.display.float_format='{:,.2f}'.format
pd.set_option('display.max_colwidth', None)


In [2]:
from collections import ChainMap
baseline = {'music': 'bach', 'art': 'rembrandt'}
adjustments = {'art': 'van gogh', 'opera': 'carmen'}
list(ChainMap(adjustments, baseline))

combined = baseline.copy()
combined.update(adjustments)
list(combined)


['music', 'art', 'opera']

['music', 'art', 'opera']

## Counter objects
A counter tool is provided to support convenient and rapid tallies.

In [3]:
from collections import Counter

# Tally occurrences of words in a list
li=['red','blue', 'red', 'green', 'blue', 'blue']
cnt = Counter() # pass sequence data into Counter()
for word in li:
    cnt[word] += 1
cnt  # cnt is dictionary

cnt=Counter(li)
cnt


count={}
for item in li:
    if item in count:
        count[item]+=1
    else:
        count[item]=1   
count

# Find the ten most common words in Hamlet
import re
words = re.findall(r'\w+', open('hamlet.txt').read().lower())
Counter(words).most_common(10)

Counter({'red': 2, 'blue': 3, 'green': 1})

Counter({'red': 2, 'blue': 3, 'green': 1})

{'red': 2, 'blue': 3, 'green': 1}

[('the', 1091),
 ('and', 969),
 ('to', 767),
 ('of', 675),
 ('i', 633),
 ('a', 571),
 ('you', 558),
 ('my', 520),
 ('in', 451),
 ('it', 421)]

```python
class collections.Counter([iterable-or-mapping])
```

In [4]:
c = Counter()                           # a new, empty counter
c
c = Counter('gallahad')                 # a new counter from an iterable
c
c = Counter({'red': 4, 'blue': 2})      # a new counter from a mapping
c
c = Counter(cats=4, dogs=8)             # a new counter from keyword args
c

Counter()

Counter({'g': 1, 'a': 3, 'l': 2, 'h': 1, 'd': 1})

Counter({'red': 4, 'blue': 2})

Counter({'cats': 4, 'dogs': 8})

In [5]:
c = Counter(['eggs', 'ham'])
c['bacon']                              # count of a missing element is zero
c

c['sausage'] = 0                        # counter entry with a zero count
c
del c['sausage']                        # del actually removes the entry
c

0

Counter({'eggs': 1, 'ham': 1})

Counter({'eggs': 1, 'ham': 1, 'sausage': 0})

Counter({'eggs': 1, 'ham': 1})

In [6]:
# elements
c = Counter(a=4, b=2, c=0, d=-2)
sorted(c.elements())

# most_common
Counter('abracadabra').most_common(3)

# substract
c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
c

# total
c = Counter(a=10, b=5, c=0)
# c.total()

['a', 'a', 'a', 'a', 'b', 'b']

[('a', 5), ('b', 2), ('r', 2)]

Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

In [7]:
# c.total()                       # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
#Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
#c.most_common()[:-n-1:-1]       # n least common elements
+c                              # remove zero and negative counts

[]

set()

{}

dict_items([])

Counter()

In [8]:
c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
c + d                       # add two counters together:  c[x] + d[x]

c - d                       # subtract (keeping only positive counts)

c & d                       # intersection:  min(c[x], d[x])

c | d                       # union:  max(c[x], d[x])

c == d                      # equality:  c[x] == d[x]

# c <= d                      # inclusion:  c[x] <= d[x]

Counter({'a': 4, 'b': 3})

Counter({'a': 2})

Counter({'a': 1, 'b': 1})

Counter({'a': 3, 'b': 2})

False

## defaultdict & setdefault
**Using list as the default_factory, it is easy to group a sequence of key-value pairs into a dictionary of lists:**

In [9]:
from collections import defaultdict

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]

**grouping using `defaultdict`**

- define the type of value in advance

In [10]:
d = defaultdict(list)

for k, v in s:
    d[k].append(v)

sorted(d.items())

[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

- When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the default_factory function which returns an empty list. 
- The list.append() operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the list.append() operation adds another value to the list. This technique is simpler and faster than an equivalent technique using dict.setdefault():

**grouping using `setdefault`**

In [11]:
d = {}
for k, v in s:
    d.setdefault(k, []).append(v)

sorted(d.items())

[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

**Setting the default_factory to int makes the defaultdict useful for counting (like a bag or multiset in other languages):**

In [12]:
# set default value to int

s='mississippi'
d=defaultdict(int)

for k in s:
    d[k]+=1
    
sorted(d.items())


[('i', 4), ('m', 1), ('p', 2), ('s', 4)]

When a letter is first encountered, it is missing from the mapping, so the default_factory function calls int() to supply a default count of zero. The increment operation then builds up the count for each letter. <br>
The function int() which always returns zero is just a special case of constant functions. A faster and more flexible way to create constant functions is to use a lambda function which can supply any constant value (not just zero):

In [13]:
def constant_factory(value):
    return lambda: value

d=defaultdict(constant_factory('<missing>'))
d.update(name='John', action='ran')
d


print('%(name)s %(action)s to %(object)s' % d)

print(f"{d['name']} {d['action']} to {d['object']}")

defaultdict(<function __main__.constant_factory.<locals>.<lambda>()>,
            {'name': 'John', 'action': 'ran'})

John ran to <missing>
John ran to <missing>


**using set as a default datatype**

In [14]:
# set default value to set

s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
d = defaultdict(set)

for k, v in s:
    d[k].add(v)

sorted(d.items())

e={}
for k, v in s:
    e.setdefault(k, set()).add(v)
    
sorted(e.items())

[('blue', {2, 4}), ('red', {1, 3})]

[('blue', {2, 4}), ('red', {1, 3})]

In [15]:
from collections import defaultdict

def def_value():
    return "Not present"

d=defaultdict(def_value)

d['a']=1
d['b']=2

d

defaultdict(<function __main__.def_value()>, {'a': 1, 'b': 2})

In [16]:
print(d['c'])

Not present


In [17]:
d=defaultdict(lambda:'Not present')
d['a']=1
d['b']=2

print(d['c'])

Not present


In [18]:
from collections import defaultdict

d=defaultdict(list)

for i in range(5):
    d[i].append(i)
print(d)

defaultdict(<class 'list'>, {0: [0], 1: [1], 2: [2], 3: [3], 4: [4]})


In [19]:
from collections import defaultdict

d=defaultdict(int)

L=[1,2,3,4,2,3,4,2]

for i in L:
    d[i]+=1
    
print(d)

defaultdict(<class 'int'>, {1: 1, 2: 3, 3: 2, 4: 2})


In [20]:
dict1={'A':'Geeks', 'B':'For'}

val=dict1.setdefault('A')
print(val)
print(dict1)
val=dict1.setdefault('C')
print(val)
print(dict1)
val=dict1.setdefault('D', 'Geeks')
print(val)
print(dict1)

Geeks
{'A': 'Geeks', 'B': 'For'}
None
{'A': 'Geeks', 'B': 'For', 'C': None}
Geeks
{'A': 'Geeks', 'B': 'For', 'C': None, 'D': 'Geeks'}


## namedtuple() Factory Function for Tuples with Named Fields

In [21]:
from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22)     # instantiate with positional or keyword arguments
p[0] + p[1]             # indexable like the plain tuple (11, 22)
x, y = p                # unpack like a regular tuple
x, y
p.x + p.y               # fields also accessible by name
p                       # readable __repr__ with a name=value style

33

(11, 22)

33

Point(x=11, y=22)

In [22]:
EmployeeRecord=namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')

import csv

with open('employees.csv', 'r') as file:
    reader=csv.reader(file)
    for line in reader:
        print(line)
print()

for emp in map(EmployeeRecord._make, csv.reader(open('employees.csv', 'r'))):
    print(emp.name, emp.title)

['kason', ' 35', ' CEO', ' sales', ' A']
['cloe', ' 32', ' Manager', ' ad', ' B']

kason  CEO
cloe  Manager


**classmethod `somenamedtuple._make(iterable)` <br>
Class method that makes a new instance from an existing sequence or iterable.**

In [23]:
t=[11, 12]

t1=Point._make(t)
t1

Point(x=11, y=12)

**`somenamedtuple._asdict()`
Return a new dict which maps field names to their corresponding values:**

In [24]:
p=Point(x=11, y=22)
p._asdict()

{'x': 11, 'y': 22}

`*somenamedtuple._replace(**kwargs)`
Return a new instance of the named tuple replacing specified fields with new values:**

In [25]:
p = Point(x=11, y=22)
p._replace(x=33)


# for partnum, record in inventory.items():
#     inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())

Point(x=33, y=22)

**`somenamedtuple._fields`
Tuple of strings listing the field names. Useful for introspection and for creating new named tuple types from existing named tuples.**

In [26]:
p._fields            # view the field names

Color = namedtuple('Color', 'red green blue')
Pixel = namedtuple('Pixel', Point._fields + Color._fields)
Pixel(11, 22, 128, 255, 0)

('x', 'y')

Pixel(x=11, y=22, red=128, green=255, blue=0)

In [27]:
getattr(p, 'x')

11

In [28]:
d = {'x': 11, 'y': 22}
Point(**d)

Point(x=11, y=22)