# Week 2 - Python Data Structures

### What is Counter
Counter is a dictionary like class from the collections module. It maps items to their counts. It is very handy for counting words, letters, or any repeated values. It also provides helpers like most_common, arithmetic on counters, and easy updates.


### Basic Counting

In [7]:
from collections import Counter

nums = [1, 2, 2, 3, 3, 3, 4]
cnt = Counter(nums)
cnt

Counter({3: 3, 2: 2, 1: 1, 4: 1})

### Counting Words From Text

In [9]:
text = """
Ariel heals with moonlight and ocean water, Kira codes and studies.
Ariel, Kira, Nova, Ariel.
""".lower()

# very light cleanup, split on non letters and keep words
import re
words = [w for w in re.split(r"[^a-z]+", text) if w]
Counter(words)

Counter({'ariel': 3,
         'and': 2,
         'kira': 2,
         'heals': 1,
         'with': 1,
         'moonlight': 1,
         'ocean': 1,
         'water': 1,
         'codes': 1,
         'studies': 1,
         'nova': 1})

### Top N Items With most_common

In [10]:
word_counts = Counter(words)
word_counts.most_common(3)  # top three words

[('ariel', 3), ('and', 2), ('kira', 2)]

### Update Counts From Another Iterable

In [11]:
c = Counter("banana")
c.update("bandana")
c

Counter({'a': 6, 'n': 4, 'b': 2, 'd': 1})

### Arithmetic With Counters

In [12]:
a = Counter("abracadabra")
b = Counter("barcelona")

a_plus_b = a + b          # add counts, negative results are dropped
a_minus_b = a - b         # subtract, negatives are dropped
a_intersection = a & b    # min counts
a_union = a | b           # max counts

a_plus_b, a_minus_b, a_intersection, a_union

(Counter({'a': 7,
          'b': 3,
          'r': 3,
          'c': 2,
          'd': 1,
          'e': 1,
          'l': 1,
          'o': 1,
          'n': 1}),
 Counter({'a': 3, 'b': 1, 'r': 1, 'd': 1}),
 Counter({'a': 2, 'b': 1, 'r': 1, 'c': 1}),
 Counter({'a': 5,
          'b': 2,
          'r': 2,
          'c': 1,
          'd': 1,
          'e': 1,
          'l': 1,
          'o': 1,
          'n': 1}))

### Convert Back To A Regular Dict Or List

In [13]:
c = Counter("kira kira nova")
dict(c), list(c.elements())[:20]

({'k': 2, 'i': 2, 'r': 2, 'a': 3, ' ': 2, 'n': 1, 'o': 1, 'v': 1},
 ['k', 'k', 'i', 'i', 'r', 'r', 'a', 'a', 'a', ' ', ' ', 'n', 'o', 'v'])

### What I learned
Counter simplifies frequency problems that would usually take many lines with a regular dictionary. The tradeoff is that I still need to normalize my inputs, for example lowercase words and remove punctuation, so that counts are meaningful.

### References

- Python docs, collections.Counter