# Objects in Python
*Baking code cookies with Plato*

## The Eternal Exercise

Let's start this class with our "eternal" exercise: constructing a character-level frequency distribution of a string, using a `dict`. Let's start from the first lines of Virgil's *Aeneid*. You know the drill by now.

In [294]:
aeneid = """Arma virumque cano, Troiae qui primus ab oris
Italiam fato profugus Lavinaque venit
litora—multum ille et terris iactatus et alto
vi superum, saevae memorem Iunonis ob iram,
5 multa quoque et bello passus, dum conderet urbem
inferretque deos Latio; genus unde Latinum
Albanique patres atque altae moenia Romae.
Musa, mihi causas memora, quo numine laeso
quidve dolens regina deum tot volvere casus
10 insignem pietate virum, tot adire labores
impulerit. tantaene animis caelestibus irae?"""

Let us start with the simplest approach and explore some variations:

In [295]:
d = {}

for char in aeneid:
    if char in d:
        d[char] += 1
    else:
        d[char] = 1

print(d)

{'A': 2, 'r': 25, 'm': 26, 'a': 42, ' ': 63, 'v': 9, 'i': 36, 'u': 35, 'q': 10, 'e': 51, 'c': 6, 'n': 21, 'o': 25, ',': 7, 'T': 1, 'p': 7, 's': 26, 'b': 7, '\n': 10, 'I': 2, 't': 30, 'l': 17, 'f': 3, 'g': 4, 'L': 3, '—': 1, '5': 1, 'd': 8, ';': 1, 'R': 1, '.': 2, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1}


Implement the following simple extensions, step by step:
- make sure all characters are lowercased
- ignore all punctuation symbols
- ignore all whitespace characters
- ignore digits (cf. line numbering)
- print the three characters with the highest frequency (i.e. "sort and slice")

In [296]:
# code for variations here

This exercise is important because, if you understand the components well, you can introduce minor adaptations to the code that will help you solve a wide range of related problems (like filtering). We will use this standard of block of code today to introduce two additional concepts in Python: (1) Exceptions and (2) Objects.

## Exceptions

The `if`/`else` statement in our exercise is necessary, because we cannot increment the frequency for a particular character, if we didn't **initialize** that count in the first place:

In [297]:
d = {}

for char in aeneid:
    d[char] += 1

KeyError: 'A'

Already in the very first iteration, something goes wrong, because `d` is empty and yet we try to **index** it for the non-existing key `'A'`. We run into a **KeyError** and the execution of our program gets **interrupted** -- that's technical term that you should remember. Another way this is commonly described, is that an "exception gets raised". This means that the normal flow of the program gets disturbed because an "exceptional" situation is encountered. The problem isn't so much that you run into an error -- trust me, it's not like your computer will explode in such cases -- but more, that Python isn't explicitly instructed what to do in such situations. This is exactly why the program gets halted: Python doesn't know what to do next.

Luckily, we can work around this. In fact, if we anticipate that certain errors might come about, we can add explicit instructions to our scripts as to what needs to be done in these exceptional circumstances. That might sound a little abstract, so below goes a straightforward example of this:

In [298]:
d = {}

for char in aeneid:
    try:
        d[char] += 1
    except:
        d[char] = 1

print(d)

{'A': 2, 'r': 25, 'm': 26, 'a': 42, ' ': 63, 'v': 9, 'i': 36, 'u': 35, 'q': 10, 'e': 51, 'c': 6, 'n': 21, 'o': 25, ',': 7, 'T': 1, 'p': 7, 's': 26, 'b': 7, '\n': 10, 'I': 2, 't': 30, 'l': 17, 'f': 3, 'g': 4, 'L': 3, '—': 1, '5': 1, 'd': 8, ';': 1, 'R': 1, '.': 2, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1}


Here, you see that we use two new **keywords** (mind the code highlighting), namely `try` and `except`. In terms of syntax, there is nothing new under the sun because this construction is very similar to the `if`/`else` that you already know. What does it do? In a way, Python tells you already: you instruct Python to `try` and execute the first indented block, `except` if something unexpected would happen, and in that case, the second block should be executed.

This block is perfect for our problem: by default, our script will assume that a certain character is already present in `d` and it will `try` to augment its count. An exception will raised, however, when our script hits the `KeyError` which we anticipate, and in that case the second block will be executed (i.e. the instantiation of a new value:key pair). This behaviour is perfectly equivalent to our earlier solution with `if/else`.

> *Question: in many cases, especially for longer `for`-loops, the use of an `try/except` construction will be faster than using the more straightforward `if/else`* construction. Can you guess why?

The previous code block will "catch" any exception/error. That seem easy, but typically that's considered bad practice. We want to be more specific and only catch the errors that we truly anticipate. If not, we might be ignoring really bad exceptions. Therefore, we want to be more explicit and name the actual exception, like so:

In [299]:
d = {}

for char in aeneid:
    try:
        d[char] += 1
    except KeyError:
        d[char] = 1

print(d)

{'A': 2, 'r': 25, 'm': 26, 'a': 42, ' ': 63, 'v': 9, 'i': 36, 'u': 35, 'q': 10, 'e': 51, 'c': 6, 'n': 21, 'o': 25, ',': 7, 'T': 1, 'p': 7, 's': 26, 'b': 7, '\n': 10, 'I': 2, 't': 30, 'l': 17, 'f': 3, 'g': 4, 'L': 3, '—': 1, '5': 1, 'd': 8, ';': 1, 'R': 1, '.': 2, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1}


Here, we limit the "emergency" solutions to a more limited set of scenarios, which is safer.

> *Question: Exceptions come in many forms. Can you think of other exceptions that we've already encountered?*

An IndexError is another commonly encountered error, which might be very inconvenients:

In [300]:
cnt = 0
while True:
    print(aeneid[cnt], end='')
    cnt += 1

Arma virumque cano, Troiae qui primus ab oris
Italiam fato profugus Lavinaque venit
litora—multum ille et terris iactatus et alto
vi superum, saevae memorem Iunonis ob iram,
5 multa quoque et bello passus, dum conderet urbem
inferretque deos Latio; genus unde Latinum
Albanique patres atque altae moenia Romae.
Musa, mihi causas memora, quo numine laeso
quidve dolens regina deum tot volvere casus
10 insignem pietate virum, tot adire labores
impulerit. tantaene animis caelestibus irae?

IndexError: string index out of range

> *Question: change the code block above and jump out of the `while` loop once we have printed all characters.*

Reading files is another context in which exceptions are often used. Often, if you work with large corpora, a small number of files will have encoding errors and you might want to ignore these. Catching the relevant exception (the `UnicodeEncodeError`) will save you a lot of troubles. Do you understand this dummy example?

```python
for filename in filenames:
    try:
        with open(filename) as f:
            text = f.read()
        print(filename, 'correctly parsed!')
    except UnicodeEncodeError:
        pass # do you remember what this does?
```

As always, remember that errors are your friend in Python and you should always pay close attention to what they are saying. They might be annoying, but you can good them to use, using `try/except`. Remember that exeptions protect you from somthing even worse.

## The Secret Counter

We've make the frequency dictionary exercise a couple of times by now. You might hate us for saying this, but there's a little secret that we haven't told you about before... Don't hate us, but we are about to tell about one of the best kept secrets in the Python universe:

In [301]:
from collections import Counter

cnt = Counter()
cnt.update(aeneid)

print(cnt)

Counter({' ': 63, 'e': 51, 'a': 42, 'i': 36, 'u': 35, 't': 30, 'm': 26, 's': 26, 'r': 25, 'o': 25, 'n': 21, 'l': 17, 'q': 10, '\n': 10, 'v': 9, 'd': 8, ',': 7, 'p': 7, 'b': 7, 'c': 6, 'g': 4, 'f': 3, 'L': 3, 'A': 2, 'I': 2, '.': 2, 'T': 1, '—': 1, '5': 1, ';': 1, 'R': 1, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1})


Wown that's amazing! This gives us everything we need, in just two lines of code. We introduce it only now, because of two reasons:

1. It's crucial to understand how low-level Python dictionaries work
2. `Counter` is an object, which is the real topic of this class.

Note that we explicitly have to import `Counter` from the `collections` module in Python's Standard Library (which has many other really useful functions). After that, we can **instantiate** or **initialize** a Counter through what is know as the **constructor function**.

In [302]:
counter = Counter() # constructor
print(type(counter))

<class 'collections.Counter'>


Our variable `counter` has a type that indicates that it's not of one of the primitive data types that we covered so far but a more specific kind of object. Using `help()`, we can find out about what it has to offer.

In [303]:
help(Counter)

Help on class Counter in module collections:

class Counter(builtins.dict)
 |  Dict subclass for counting hashable items.  Sometimes called a bag
 |  or multiset.  Elements are stored as dictionary keys and their counts
 |  are stored as dictionary values.
 |  
 |  >>> c = Counter('abcdeabcdabcaba')  # count elements from a string
 |  
 |  >>> c.most_common(3)                # three most common elements
 |  [('a', 5), ('b', 4), ('c', 3)]
 |  >>> sorted(c)                       # list all unique elements
 |  ['a', 'b', 'c', 'd', 'e']
 |  >>> ''.join(sorted(c.elements()))   # list elements with repetitions
 |  'aaaaabbbbcccdde'
 |  >>> sum(c.values())                 # total of all counts
 |  15
 |  
 |  >>> c['a']                          # count of letter 'a'
 |  5
 |  >>> for elem in 'shazam':           # update counts from an iterable
 |  ...     c[elem] += 1                # by adding 1 to each element's count
 |  >>> c['a']                          # now there are seven 'a'
 |  7
 

Interestingly, the documentation tells us that `Counter` behaves like a standard `dict`, but one that is specialized in counting. Because it already expects the values to be integers (that's what counts are!), it will assume a default value of 0, whenever we try to access an element:

In [304]:
print(counter['a'])
print(counter['b'])
print(counter['c'])

0
0
0


NOT. A. SINGLE. ERROR. GETS. THROWN. How cool is that? This explains why we *can* do the following:

In [305]:
for char in aeneid:
    counter[char] += 1

print(counter)

Counter({' ': 63, 'e': 51, 'a': 42, 'i': 36, 'u': 35, 't': 30, 'm': 26, 's': 26, 'r': 25, 'o': 25, 'n': 21, 'l': 17, 'q': 10, '\n': 10, 'v': 9, 'd': 8, ',': 7, 'p': 7, 'b': 7, 'c': 6, 'g': 4, 'f': 3, 'L': 3, 'A': 2, 'I': 2, '.': 2, 'T': 1, '—': 1, '5': 1, ';': 1, 'R': 1, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1})


Amazing... And that is not all: remember how cumbersome it is in Python to sort a dictionary by its values. This is especially cumbersome for linguists, who often want frequency lists, showing the most common items first. This is really easy with `Counter`:

In [306]:
print(counter.most_common(3))

[(' ', 63), ('e', 51), ('a', 42)]


> *Questions*
> - Can you analyze what is being returned by this function?
> - What happens when you change the **argument value** `"3"`?
> - What happens when you don't specify an argument at all, when calling the function?

The `Counter`object is really cool. However, we introduce it here, because it behaves as a vanilla **object** in Python. Objects are the topic of the next section.

## Objects, Plato and Cookies

Python is what people call an **object-oriented** programming languages, meaning that objects are key to the way you program in it. In fact, you have worked a lot with objects already in Python, because *everthing* in Python is an object, including the most primitive stuff like integers or strings. To understand what objects are, it useful to turn to Plato's Allegory of cave.

> Question: Can you remember what was the deal with Plato's ideas?

<img src="https://faculty.washington.edu/smcohen/320/platoscave.gif"/>

Whenever I think of Plato, I always think of cookies:

<img src='https://images.kitchenstories.io/wagtailOriginalImages/A1050-lisa-final/A1050-lisa-final-large-landscape-150.jpg'/>

In Plato's philosophy, there something like the "ideal chocolate chip cookie": the ideal or "mould" from which all cookies are copied. The ideal is very abstract and is never actually observed: that is because no copy is perfect and since all the actual cookies are slightly different from another, and from the ideal.

Now, when working with object in Python, I want you to think of baking cookies with Plato. Execute this line:

In [307]:
cnt = Counter()

Executing this line, or calling the constructor, is like baking a single cookie using the `Counter` mould. The  constructor returns you an actual copy/cookie, constructed/baked on the basis of the abstract `Counter` **object**.

<img src="https://i.etsystatic.com/22388311/r/il/32b3e5/2322918611/il_1588xN.2322918611_8o8l.jpg"/>

Just like with real cookies, it's perfectly possible to instantiate multiple, independent cookies from the same mould (the mould always stays the same!).

In [309]:
cnt1 = Counter()
cnt2 = Counter()
cnt3 = Counter()

Do you get the similarity between a `Counter` and a Platonic ideal, blueprint, or template? Only, in Python, such an abstract template is called a **class**. When you ask for the `type()` of any object in Python, you're really asking: with which mould was this cookie made? What was the **class** that you used to instantiate this object?

In [310]:
type(cnt3)

collections.Counter

You can even pass any **iterable** to  the constructor, and the resulting object will automatically hold the correct counts:

In [311]:
freqs = Counter(aeneid)
print(freqs)

Counter({' ': 63, 'e': 51, 'a': 42, 'i': 36, 'u': 35, 't': 30, 'm': 26, 's': 26, 'r': 25, 'o': 25, 'n': 21, 'l': 17, 'q': 10, '\n': 10, 'v': 9, 'd': 8, ',': 7, 'p': 7, 'b': 7, 'c': 6, 'g': 4, 'f': 3, 'L': 3, 'A': 2, 'I': 2, '.': 2, 'T': 1, '—': 1, '5': 1, ';': 1, 'R': 1, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1})


> *Question: With counter, you can extract crucial frequency information with intuitive **oneliners**. Can you figure out what the following block does?* (We won't win a beauty context with it, but still.) Unpack the oneliner, step by step, and start with the innermost part.

In [313]:
virgils_favorite = Counter(aeneid.split()).most_common(1)[0][0]
virgils_favorite

'et'

## Methods versus functions

In the rest of your coding life, you will continue to work with many objects. Some of these can be very abstract. In Machine Learning, you'll often work with an abstract "learner", like a neural network. You will create a concrete **instance** of such a learner, using the constructor (`learner =  Learner()`)  -- note that constructors are often written with a capital letter by convention, and the actual instances, with a lowercase character. Next, you will call functions on them to train them on example data (`learner.train(train_data)`) or used the trained model to get prediction from new, unseen data (`learner.predict(new_data)`). 

Note the language here: we say that we call functions *on* an instance. We don't just pass the instance to any function. To appreciate this distinction, let's work with an example

In [314]:
l = list(aeneid) # constructor
print(l[:20])
print(type(l))

['A', 'r', 'm', 'a', ' ', 'v', 'i', 'r', 'u', 'm', 'q', 'u', 'e', ' ', 'c', 'a', 'n', 'o', ',', ' ']
<class 'list'>


By now, you should understand better what we're doing here: we're creating an instance of the class `List` by calling the constructor function `list()` and passing it an iterable (`aeneid`, a string) to initialize it.

To sort the list, we now have two options. There exist subtle differences between them. The first option is to pass `l` to the general function `sorted`:

In [315]:
sorted_list = sorted(l)
print(sorted_list)

['\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', '.', '.', '0', '1', '5', ';', '?', 'A', 'A', 'I', 'I', 'L', 'L', 'L', 'M', 'R', 'T', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e',

Thus, `sorted(l)` takes the original list and returns a sorted version of it. What it returns is a sorted **copy** of the original `l`; this is crucial: `sorted` leaves the original l intact:

In [316]:
print(l[:20])

['A', 'r', 'm', 'a', ' ', 'v', 'i', 'r', 'u', 'm', 'q', 'u', 'e', ' ', 'c', 'a', 'n', 'o', ',', ' ']


Note that we can call `sorted()` on many more iterables (like strings). It's a general function, not tied specifically to lists.

The second alternative to call `sort()` **on** the original list itself:

In [317]:
l.sort()

Nothing was printed above... That's normal and comes from the fact that `sort()` will (tacitly) sort the original string on which it was called, but it *doesn't return anything afterwards*:

In [318]:
print(l.sort())
print(l)

None
['\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', '.', '.', '0', '1', '5', ';', '?', 'A', 'A', 'I', 'I', 'L', 'L', 'L', 'M', 'R', 'T', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e',

This kind of sorting is called **in-place** sorting. This is because `sort()` isn't just any function: it is a function that is specifically to this specific list (i.e. it was part of the cookie mould that we used to construct the object). It has privileged access to the object, which is why it can sort the string directly. Functions are called **methods** in Python, when they are tied to a special object, instead of just floating around. This is an important distinction:

- all methods are functions, but not all functions are methods
- methods can only be called on the objects of a specific class (note that we cannot call `.sort()` on a string)

## Home-Baked Cookies

We should stress that this last section isn't crucial in your learning trajectory at this stage. What we want to demonstrate is how you yourself can fairly easily implement classes your own classes in Python. It will probably take a while, before you will start doing this, simply because you won't feel the need: there's a whole range of classes out there already that you can just import from external packages. Nevertheless, working through a simple example will enhance your grasp of what "object orientation" really means.

As an example we will re-implement a smaller version of the `Counter` class. The first thing that we will need is a name for our class and a constructor function. The syntax for this is as follows:

In [319]:
class OurCounter:
    
    def __init__(self):
        pass

This is a super-minimalistic example, since we don't do more than:
- declaring a name for our class (`class OurCounter:`)
- declaring a constructor method

Note that the constructor looks like a normal function definition, but:
- it is indented to make clear that it is a **method** that is tied to the OurCounter Class
- it takes a special, dedicated name (`__init__`) -- so that Python can properly identify it as the constructor
- it requires a special input argument, **self**: this is a pointer to the object itself
- it is a **stub** currently -- note the use of **pass** placeholder (which doesn't do anything)

Nevertheless, we can already use this class definition to initialize an instance, and check its type:

In [321]:
cnt = OurCounter()
print(type(cnt))

<class '__main__.OurCounter'>


Above you could see that `Counter` acts as a kind of wrapper around the conventional `dict` class. It basically inherits much of the functionality of a `dict` but adds much additional stuff. For our purposes, it would make sense to add a dict the the `OurCounter` object. We can achieve that as follows:

In [322]:
class OurCounter:
    
    def __init__(self):
        self.d = {}

This is where the `self` keyword comes in handy: we use it to assign a dictionary to the object. That means that we can actually access the dict from the `OurCounter`, after we've instantiated it:

In [323]:
cnt = OurCounter()
print(cnt.d)

{}


However, the goal of working with object orientation is **abstraction**: our goal is to make sure that the user that have to care about the dictionary. Let us now create an `add` function, that takes a key and the value (an `int`) by which the value for that key should be incremented. 

In [324]:
class OurCounter:
    
    def __init__(self):
        self.d = {}
        
    def add(self, key, value=1):
        try:
            self.d[key] += value
        except KeyError:
            self.d[key] = value

There are several important aspects here to be discussed:
- we still need to add that weird `self` thingy as the  first argument to the method definition
- we assign a default value of 1 for the increment value.
- we abstract over the default behaviour of a dictionary and take care of checking whether they  key is already present in the dictionary. (The user shouldn't have to care about this: this is the essence of **abstraction** in programming)
- inside the method, we access the `dict` associated with our object as `self.d`, because it's a property stored with the object itself.

We can know use and test that property as follows:

In [325]:
cnt = OurCounter()
cnt.add('a', 1)
cnt.add('b', 3)
cnt.add('a', 5)
print(cnt.d)

{'a': 6, 'b': 3}


Adding a `most_common` method uses the same logic:

In [326]:
from operator import itemgetter

class OurCounter:
    
    def __init__(self):
        self.d = {}
        
    def add(self, key, value=1):
        try:
            self.d[key] += value
        except KeyError:
            self.d[key] = value
            
    def most_common(self, topn=None):
        items = self.d.items()                   # extract key-value pairs
        items = sorted(items, key=itemgetter(1)) # sort by value
        items = items[::-1]                      # decreasing instead of increasing order
        return items[:topn]                      # chop off the top using topn argument

In [327]:
cnt = OurCounter()
cnt.add('a', 1)
cnt.add('b', 3)
cnt.add('c', 15)
cnt.add('d', 4)
cnt.add('e', 9)
cnt.most_common(3)

[('c', 15), ('e', 9), ('d', 4)]

Note how we hide all of the ugly code from the user! From here, the options are endless. We can adapt the constructor of our class for instance, to allow users to fill the dictionary immediately upon construction:

In [328]:
class OurCounter:
    
    def __init__(self, iterable=None):
        self.d = {}
        if iterable:
            for value in iterable:
                self.add(value)
        
    def add(self, key, value=1):
        try:
            self.d[key] += value
        except KeyError:
            self.d[key] = value
            
    def most_common(self, topn=None):
        items = self.d.items()                   # extract key-value pairs
        items = sorted(items, key=itemgetter(1)) # sort by value
        items = items[::-1]                      # decreasing instead of increasing order
        return items[:topn]                      # chop off the top using topn argument

This is probably a lot to take in, but notice how cool it is that inside the constructor, we can actually already use the methods that are defined further down (`self.add(value)`, with a default of `1` for the increment)! Now, we are getting really close to a full-blown counter:

In [330]:
cnt = OurCounter(aeneid)
cnt.most_common(5)

[(' ', 63), ('e', 51), ('a', 42), ('i', 36), ('u', 35)]

One final gimmick is that we can gain control over how our object gets printed. Note how the standard Counter returns the underlying `dict` as follows:

In [331]:
cnt = Counter(aeneid) # standard counter!
print(cnt)

Counter({' ': 63, 'e': 51, 'a': 42, 'i': 36, 'u': 35, 't': 30, 'm': 26, 's': 26, 'r': 25, 'o': 25, 'n': 21, 'l': 17, 'q': 10, '\n': 10, 'v': 9, 'd': 8, ',': 7, 'p': 7, 'b': 7, 'c': 6, 'g': 4, 'f': 3, 'L': 3, 'A': 2, 'I': 2, '.': 2, 'T': 1, '—': 1, '5': 1, ';': 1, 'R': 1, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1})


We can easily tweak our class definition to do the same, by **overriding** the standard string representation methods that all objects in Python have:

In [332]:
class OurCounter:
    
    def __init__(self, iterable=None):
        self.d = {}
        if iterable:
            for value in iterable:
                self.add(value, 1)
        
    def add(self, key, value=1):
        try:
            self.d[key] += value
        except KeyError:
            self.d[key] = value
            
    def most_common(self, topn=None):
        items = self.d.items()                   # extract key-value pairs
        items = sorted(items, key=itemgetter(1)) # sort by value
        items = items[::-1]                      # decreasing instead of increasing order
        return items[:topn]                      # chop off the top using topn argument
    
    def __str__(self):
        info = str(self.d)
        info = 'OurCounter(' + info +')'
        return info

Note the fancy underscores surrounding the method name (`__str__(self)`). This indicates that we overriding some pretty basic, **low-level** functionaliy  of the Python object. Check out what happens when we now print our object:

In [333]:
cnt = OurCounter(aeneid)
print(cnt)

OurCounter({'A': 2, 'r': 25, 'm': 26, 'a': 42, ' ': 63, 'v': 9, 'i': 36, 'u': 35, 'q': 10, 'e': 51, 'c': 6, 'n': 21, 'o': 25, ',': 7, 'T': 1, 'p': 7, 's': 26, 'b': 7, '\n': 10, 'I': 2, 't': 30, 'l': 17, 'f': 3, 'g': 4, 'L': 3, '—': 1, '5': 1, 'd': 8, ';': 1, 'R': 1, '.': 2, 'M': 1, 'h': 1, '1': 1, '0': 1, '?': 1})


Finally, remember, this section is for illustration purposes only: we don't expect you to be able to define on your own classes in the near future. (Just working with existing ones can be challenging enough!). Nevertheless, we do think that reading through this this section might help to you understand what a class does and why it's like to interact with such objects.

#### Exercise

Download a plain text novel from Gutenberg. Work on your `Counter` skills and use an instance of the class to:
- make a character-level frequency dictionary: what are the three most frequent characters?
- make a word-level frequency dictionary: : what are the three most frequent word tokens?
- make a list of `Counter` objects, containing a character-level frequency dictionary for each line.