The collections module is a built-in module that implements specialized container data types providing alternatives to Python’s general purpose built-in containers.

# Counter
Counter is a dict subclass which helps count hashable objects. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value.

In [1]:
from collections import Counter

In [2]:
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]
Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

In [3]:
Counter('aabsbsbsbhshhbbsbs')

Counter({'b': 7, 's': 6, 'h': 3, 'a': 2})

In [5]:
s = 'How many times does each word show up in this sentence word times each each word'
words = s.split()
Counter(words)

Counter({'each': 3,
         'word': 3,
         'times': 2,
         'How': 1,
         'many': 1,
         'does': 1,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

In [6]:
# Methods with Counter()
c = Counter(words)
c.most_common(2)

[('each', 3), ('word', 3)]

### Common patterns when using the Counter() object
```python
sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
c.most_common()[:-n-1:-1]       # n least common elements
c += Counter()                  # remove zero and negative counts

#### Exercise: Word Frequency Counter

Write a Python function that takes a string of text and returns the top N most common words, ignoring punctuation and case.

**Instructions:**
1. Use `Counter` to count word occurrences.
2. Ignore punctuation (e.g., .,!?) and make the count case-insensitive.
3. Return the top N most common words as a list of tuples `(word, count)`.
4. If N is greater than the number of unique words, return all unique words sorted by frequency.

In [16]:
from collections import Counter
import re

def top_n_words(text, N):
    text = re.findall(r'\b\w+\b', text.lower())
    return Counter(text).most_common(N)

In [19]:
text = "Hello, hello! How are you? You look great. Great great day!"
N = 30
print(top_n_words(text, N))

[('great', 3), ('hello', 2), ('you', 2), ('how', 1), ('are', 1), ('look', 1), ('day', 1)]
