### Counter

#### Using Dictionaries to Maintain Counters

We've already seen how we can use regular dict and defaultdict for counters

In [None]:
d = {}
d[key] = d.get(key, 0) + 1

or

In [None]:
d = defaultdict(int)
d[key] += 1

Certain operations can be tedious:
- Count the frequency of characters in a string(or items in an iterable in general)
- Update one counter dictionary with another counter dictionary (adding or subtracting)
- From multiple counter dictionaries, find the max/min counter value for each key

#### The collections.Counter class

The Counter class is a specialized dictionary that makes certain operations easier
- Acts like a defaultdict with a default of 0
- Support same constructor options as regular dicts
- Additional functionality to auto calculate a frequency table based on any iterable
- Iterate through every key, repeating each key as many times as the corresponding counter value
- Find the n most common items (by count)
- Increment/decrement counters based on another Counter or dict or iterable  
  
  
- fromkeys is not supported
- update works differently than a regular dict
 - in-place addition of counts
 - iterable is just a sequence of elements, not tuples

#### Code Examples

In [1]:
from collections import defaultdict, Counter

In [2]:
sentence = 'the quick brown fox jumps over the lazy dog'

In [3]:
counter = defaultdict(int)

In [4]:
for c in sentence:
    counter[c] += 1

In [5]:
counter

defaultdict(int,
            {'t': 2,
             'h': 2,
             'e': 3,
             ' ': 8,
             'q': 1,
             'u': 2,
             'i': 1,
             'c': 1,
             'k': 1,
             'b': 1,
             'r': 2,
             'o': 4,
             'w': 1,
             'n': 1,
             'f': 1,
             'x': 1,
             'j': 1,
             'm': 1,
             'p': 1,
             's': 1,
             'v': 1,
             'l': 1,
             'a': 1,
             'z': 1,
             'y': 1,
             'd': 1,
             'g': 1})

In [6]:
counter = Counter()
for c in sentence:
    counter[c] += 1

In [7]:
counter

Counter({'t': 2,
         'h': 2,
         'e': 3,
         ' ': 8,
         'q': 1,
         'u': 2,
         'i': 1,
         'c': 1,
         'k': 1,
         'b': 1,
         'r': 2,
         'o': 4,
         'w': 1,
         'n': 1,
         'f': 1,
         'x': 1,
         'j': 1,
         'm': 1,
         'p': 1,
         's': 1,
         'v': 1,
         'l': 1,
         'a': 1,
         'z': 1,
         'y': 1,
         'd': 1,
         'g': 1})

In [8]:
c1 = Counter('able was I ere I saw elba')

In [9]:
c1

Counter({'a': 4,
         'b': 2,
         'l': 2,
         'e': 4,
         ' ': 6,
         'w': 2,
         's': 2,
         'I': 2,
         'r': 1})

In [10]:
c1 = Counter([1, 2, 3, 2, 4, 5, 5, 5, 6])

In [11]:
c1

Counter({1: 1, 2: 2, 3: 1, 4: 1, 5: 3, 6: 1})

In [12]:
import random
random.seed(0)

In [13]:
my_list = [random.randint(0, 10) for _ in range(1_000)]

In [14]:
c2 = Counter(my_list)

In [15]:
c2

Counter({6: 95,
         0: 97,
         4: 91,
         8: 76,
         7: 94,
         5: 89,
         9: 85,
         3: 80,
         2: 88,
         1: 107,
         10: 98})

In [16]:
c2 = Counter(a=1, b=10)

In [17]:
c2

Counter({'a': 1, 'b': 10})

In [18]:
c3 = Counter({'a':1, 'b': 20})

In [19]:
c3

Counter({'a': 1, 'b': 20})

In [20]:
import re

In [21]:
sentence = '''
his module implements pseudo-random number generators for various distributions.

For integers, there is uniform selection from a range. For sequences, there is uniform selection of a random element, a function to generate a random permutation of a list in-place, and a function for random sampling without replacement.

On the real line, there are functions to compute uniform, normal (Gaussian), lognormal, negative exponential, gamma, and beta distributions. For generating distributions of angles, the von Mises distribution is available.

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.'''

In [22]:
words = (re.split('\W', sentence))

In [23]:
type(words)

list

In [24]:
words

['',
 'his',
 'module',
 'implements',
 'pseudo',
 'random',
 'number',
 'generators',
 'for',
 'various',
 'distributions',
 '',
 '',
 'For',
 'integers',
 '',
 'there',
 'is',
 'uniform',
 'selection',
 'from',
 'a',
 'range',
 '',
 'For',
 'sequences',
 '',
 'there',
 'is',
 'uniform',
 'selection',
 'of',
 'a',
 'random',
 'element',
 '',
 'a',
 'function',
 'to',
 'generate',
 'a',
 'random',
 'permutation',
 'of',
 'a',
 'list',
 'in',
 'place',
 '',
 'and',
 'a',
 'function',
 'for',
 'random',
 'sampling',
 'without',
 'replacement',
 '',
 '',
 'On',
 'the',
 'real',
 'line',
 '',
 'there',
 'are',
 'functions',
 'to',
 'compute',
 'uniform',
 '',
 'normal',
 '',
 'Gaussian',
 '',
 '',
 'lognormal',
 '',
 'negative',
 'exponential',
 '',
 'gamma',
 '',
 'and',
 'beta',
 'distributions',
 '',
 'For',
 'generating',
 'distributions',
 'of',
 'angles',
 '',
 'the',
 'von',
 'Mises',
 'distribution',
 'is',
 'available',
 '',
 '',
 'Almost',
 'all',
 'module',
 'functions',
 'depen

In [25]:
word_count = Counter(words)

In [26]:
word_count

Counter({'': 38,
         'his': 1,
         'module': 2,
         'implements': 1,
         'pseudo': 1,
         'random': 7,
         'number': 2,
         'generators': 2,
         'for': 4,
         'various': 1,
         'distributions': 3,
         'For': 3,
         'integers': 1,
         'there': 3,
         'is': 7,
         'uniform': 3,
         'selection': 2,
         'from': 1,
         'a': 8,
         'range': 2,
         'sequences': 1,
         'of': 5,
         'element': 1,
         'function': 3,
         'to': 2,
         'generate': 1,
         'permutation': 1,
         'list': 1,
         'in': 4,
         'place': 1,
         'and': 5,
         'sampling': 1,
         'without': 1,
         'replacement': 1,
         'On': 1,
         'the': 7,
         'real': 1,
         'line': 1,
         'are': 1,
         'functions': 2,
         'compute': 1,
         'normal': 1,
         'Gaussian': 1,
         'lognormal': 1,
         'negative': 1,
         'expon

In [27]:
word_count.most_common(5)

[('', 38), ('a', 8), ('random', 7), ('is', 7), ('the', 7)]

In [28]:
c1 = Counter('abba')

In [29]:
c1

Counter({'a': 2, 'b': 2})

In [30]:
for c in c1:
    print(c)

a
b


In [31]:
for c in c1.elements():
    print(c)

a
a
b
b


In [32]:
list(c1.elements())

['a', 'a', 'b', 'b']

In [33]:
l = []
for i in range(1, 11):
    for _ in range(i):
        l.append(i)
print(l)

[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]


In [34]:
c1 = Counter()
for i in range(1, 11):
    c1[i] = i

In [35]:
c1

Counter({1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10})

In [36]:
print(list(c1.elements()))

[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]


In [37]:
c1.elements()

<itertools.chain at 0x1dee94e6788>

In [38]:
d = {'1': 1, '2': 2, '3': 3}

In [39]:
for key, value in d.items():
    for _ in range(value):
        print(key)

1
2
2
3
3
3


In [40]:
class RepeatIterable:
    def __init__(self, **kwargs):
        self.d = kwargs
        
    def __setitem__(self, key, value):
        self.d[key] = value
        
    def __getitem__(self, key):
        self.d[key] = self.d.get(key, 0)
        return self.d[key]

In [41]:
r = RepeatIterable(x=10, y=20)

In [42]:
r.d

{'x': 10, 'y': 20}

In [43]:
r['a']

0

In [44]:
r.d

{'x': 10, 'y': 20, 'a': 0}

In [45]:
r['x']

10

In [46]:
r.d

{'x': 10, 'y': 20, 'a': 0}

In [47]:
r['y'] = 100

In [48]:
r.d

{'x': 10, 'y': 100, 'a': 0}

In [49]:
class RepeatIterable:
    def __init__(self, **kwargs):
        self.d = kwargs
        
    def __setitem__(self, key, value):
        self.d[key] = value
        
    def __getitem__(self, key):
        self.d[key] = self.d.get(key, 0)
        return self.d[key]
    
    def elements(self):
        for k, frequency in self.d.items():
            for _ in range(frequency):
                yield k

In [50]:
r = RepeatIterable(a=2, b=3, c=1)

In [51]:
for e in r.elements():
    print(e)

a
a
b
b
b
c


In [52]:
c1 = Counter(a=1, b=2, c=3)
c2 = Counter(b=1, c=2, d=3)

In [53]:
c1 + c2

Counter({'a': 1, 'b': 3, 'c': 5, 'd': 3})

In [54]:
c1.update(c2)
print(c1)

Counter({'c': 5, 'b': 3, 'd': 3, 'a': 1})


In [55]:
c1 = Counter(a=1, b=2, c=3)
c2 = Counter(a=1, b=2, c=3)

In [56]:
(c1 - c2) - c2

Counter()

In [57]:
c1.subtract(c2)
print(c1)

Counter({'a': 0, 'b': 0, 'c': 0})


In [58]:
c1.subtract(c2)

In [59]:
c1

Counter({'a': -1, 'b': -2, 'c': -3})

In [60]:
c1 = Counter('aabbccddee')
print(c1)
c1.update('abcdef')
print(c1)

Counter({'a': 2, 'b': 2, 'c': 2, 'd': 2, 'e': 2})
Counter({'a': 3, 'b': 3, 'c': 3, 'd': 3, 'e': 3, 'f': 1})


In [61]:
c1 = Counter('aabbcc')
c2 = Counter('abc')
c1 + c2

Counter({'a': 3, 'b': 3, 'c': 3})

In [62]:
c1 - c2

Counter({'a': 1, 'b': 1, 'c': 1})

In [63]:
c1 = Counter(a=5, b=1)
c2 = Counter(a=1, b=10)

In [64]:
c1 & c2

Counter({'a': 1, 'b': 1})

In [65]:
c1 | c2

Counter({'a': 5, 'b': 10})

In [66]:
c1 = Counter(a=10, b=-10, c=0)

In [67]:
print(c1)

Counter({'a': 10, 'c': 0, 'b': -10})


In [68]:
+c1

Counter({'a': 10})

In [69]:
c1

Counter({'a': 10, 'b': -10, 'c': 0})

In [70]:
-c1

Counter({'b': 10})

In [71]:
c1

Counter({'a': 10, 'b': -10, 'c': 0})

In [72]:
import random
random.seed(0)

widgets = ['battery', 'charger', 'cable', 'case', 'keyboard', 'mouse']

orders = [(random.choice(widgets), random.randint(1, 5)) for _ in range(100)]
refunds = [(random.choice(widgets), random.randint(1, 5)) for _ in range(20)]

In [73]:
print(orders)

[('case', 4), ('battery', 3), ('keyboard', 4), ('case', 3), ('case', 3), ('keyboard', 2), ('keyboard', 2), ('cable', 2), ('battery', 5), ('cable', 5), ('mouse', 5), ('charger', 3), ('battery', 1), ('mouse', 3), ('case', 5), ('battery', 3), ('case', 3), ('keyboard', 2), ('keyboard', 4), ('case', 5), ('cable', 1), ('keyboard', 1), ('battery', 4), ('mouse', 1), ('keyboard', 4), ('cable', 2), ('mouse', 3), ('mouse', 1), ('charger', 5), ('charger', 2), ('charger', 5), ('case', 1), ('battery', 3), ('keyboard', 4), ('battery', 3), ('keyboard', 3), ('mouse', 1), ('keyboard', 3), ('keyboard', 2), ('keyboard', 5), ('keyboard', 3), ('case', 1), ('keyboard', 4), ('cable', 5), ('charger', 3), ('charger', 2), ('charger', 1), ('keyboard', 3), ('case', 1), ('battery', 2), ('charger', 1), ('battery', 5), ('mouse', 4), ('mouse', 5), ('cable', 5), ('charger', 2), ('mouse', 5), ('case', 5), ('cable', 4), ('case', 3), ('battery', 3), ('keyboard', 1), ('case', 5), ('mouse', 3), ('charger', 2), ('battery', 3

In [74]:
print(refunds)

[('battery', 2), ('charger', 3), ('keyboard', 3), ('battery', 5), ('case', 2), ('battery', 4), ('mouse', 4), ('keyboard', 5), ('cable', 3), ('case', 3), ('charger', 5), ('mouse', 1), ('case', 1), ('cable', 1), ('keyboard', 3), ('charger', 2), ('case', 3), ('keyboard', 3), ('mouse', 3), ('keyboard', 5)]


In [75]:
sold_counter = Counter()
refund_counter = Counter()

for order in orders:
    sold_counter[order[0]] += order[1]
    
for refund in refunds:
    refund_counter[refund[0]] += refund[1]

In [76]:
print(sold_counter)

Counter({'keyboard': 65, 'battery': 61, 'mouse': 46, 'case': 41, 'cable': 39, 'charger': 35})


In [77]:
print(refund_counter)

Counter({'keyboard': 19, 'battery': 11, 'charger': 10, 'case': 9, 'mouse': 8, 'cable': 4})


In [78]:
net_counter = sold_counter - refund_counter

In [79]:
net_counter

Counter({'case': 32,
         'battery': 50,
         'keyboard': 46,
         'cable': 35,
         'mouse': 38,
         'charger': 25})

In [80]:
net_counter.most_common(3)

[('battery', 50), ('keyboard', 46), ('mouse', 38)]

In [81]:
orders[0]

('case', 4)

In [82]:
orders[1]

('battery', 3)

In [83]:
Counter(['case', 'case', 'case', 'case', 'battery', 'battery', 'battery'....])

SyntaxError: invalid syntax (<ipython-input-83-f17e00c556a1>, line 1)

In [84]:
from itertools import repeat

In [85]:
repeat('battery', 3)

repeat('battery', 3)

In [86]:
list(repeat('battery', 3))

['battery', 'battery', 'battery']

In [87]:
list(repeat(*orders[0]))

['case', 'case', 'case', 'case']

In [88]:
list(repeat(*orders[1]))

['battery', 'battery', 'battery']

In [89]:
from itertools import chain

In [90]:
list(chain.from_iterable(repeat(*order) for order in orders))

['case',
 'case',
 'case',
 'case',
 'battery',
 'battery',
 'battery',
 'keyboard',
 'keyboard',
 'keyboard',
 'keyboard',
 'case',
 'case',
 'case',
 'case',
 'case',
 'case',
 'keyboard',
 'keyboard',
 'keyboard',
 'keyboard',
 'cable',
 'cable',
 'battery',
 'battery',
 'battery',
 'battery',
 'battery',
 'cable',
 'cable',
 'cable',
 'cable',
 'cable',
 'mouse',
 'mouse',
 'mouse',
 'mouse',
 'mouse',
 'charger',
 'charger',
 'charger',
 'battery',
 'mouse',
 'mouse',
 'mouse',
 'case',
 'case',
 'case',
 'case',
 'case',
 'battery',
 'battery',
 'battery',
 'case',
 'case',
 'case',
 'keyboard',
 'keyboard',
 'keyboard',
 'keyboard',
 'keyboard',
 'keyboard',
 'case',
 'case',
 'case',
 'case',
 'case',
 'cable',
 'keyboard',
 'battery',
 'battery',
 'battery',
 'battery',
 'mouse',
 'keyboard',
 'keyboard',
 'keyboard',
 'keyboard',
 'cable',
 'cable',
 'mouse',
 'mouse',
 'mouse',
 'mouse',
 'charger',
 'charger',
 'charger',
 'charger',
 'charger',
 'charger',
 'charger',
 'ch

In [91]:
sold_counter = Counter((chain.from_iterable(repeat(*order) for order in orders)))
refund_counter = Counter((chain.from_iterable(repeat(*refund) for refund in refunds)))
net_counter = sold_counter - refund_counter
print(net_counter.most_common(3))

[('battery', 50), ('keyboard', 46), ('mouse', 38)]


In [92]:
c1 = Counter(a=2, b=10)
c2 = Counter(a=10, b=5)
c1 - c2

Counter({'b': 5})

In [93]:
c1.subtract(c2)
c1

Counter({'a': -8, 'b': 5})

In [94]:
net_sales = {}

for order in orders:
    key = order[0]
    cnt = order[1]
    net_sales[key] = net_sales.get(key, 0) + cnt
    
for refund in refunds:
    key = refund[0]
    cnt = refund[1]
    net_sales[key] = net_sales.get(key, 0) - cnt
    
net_sales = {k: v for k, v in net_sales.items() if v > 0}

sorted_net_sales = sorted(net_sales.items(), key=lambda el: el[1], reverse=True)

In [95]:
type(sorted_net_sales)

list

In [96]:
sorted_net_sales[:3]

[('battery', 50), ('keyboard', 46), ('mouse', 38)]

In [97]:
isinstance(c1, dict)

True