# Classes and Objects in Python

I've been avoiding talking about this for a while.  Classes and object-orientation aren't actually needed to write good code, and they introduce a lot of really abstract ideas that are kind of hard to explain concisely without leaving something out.  But, PyTorch is built around classes, so we need to do a quick tour of them.

Python is a weird language in a lot of ways.  It's a _multi-paradigm_ language, meaning you can write code basically however you want, but the implementation of the language is extremely _object-oriented._  This means that the language is built around things called "objects" and their methods.

We've already seen lots of methods:

In [1]:
print("blah blah blah".upper())

x = [1,2,3]
print(x)
x.append(4)
print(x)

BLAH BLAH BLAH
[1, 2, 3]
[1, 2, 3, 4]


And so on. We've alo talked about types:

In [2]:
print(type("blah blah"))
print(type([1,2,3]))

<class 'str'>
<class 'list'>


But now, let's see how these actually work.

First, a definition.  An _object_, in programming terms, is just a fancy word for "thing."  It's the most abstract and general kind of thing that a thing can be.  (Unless you're one of those cool functional programming languages like Haskell, or some systems programming languages like C).  In Python, an object does two main things:
1. It stores data inside it.
2. It stores some functions inside it (these are the methods).

`class`es are most often used for a few things:
1. When you have _mutable data_, and function that only make sense on that kind of data, and which usually need to make changes to that data.
2. (Specific to Python) Overriding some special methods to change how, e.g., addition works for your class.
3. Writing functions that need to _only_ work on certain kinds of data.  E.g.: if you create a class for a 6-sided die, you might write a method `roll()`, so you can say `die.roll()`.  But it doesn't make sense to `roll()`, say, a list.
4. Writing functions that have the same name, but do different things depending on the class they're attached to.
    - This is how scikit-learn uses them.  Every model's class has a `.fit()` method; each class defines this totally differently, but it does the same thing, conceptually.
    - This is a broader design pattern called _multiple dispatch._  In general: the same function name behaves differently depending on the types of its arguments.  Class methods implement a special case of multiple dispatch--_single dispatch_--where only the first argument is used.

Python calls its object `class`es.  Let's build a simple one.  We'll make a simple counter that we can increment.

In [3]:
class Counter:
    def __init__(self, count=0):
        self.count = count
        
    def increment(self, amount=1):
        # updates the .count attribute in-place
        self.count += amount
        # returns the count attribute
        return self.count
        
my_counter = Counter()
print(my_counter.count)
print(my_counter.increment(10))
print(my_counter.count)

0
10
10


Breaking this down:

- `class Counter` starts an indented code block that contains all the definitions for our class.
- `def` statements, inside of `class` blocks, define methods.
- Method `def`initions' first argument (which is almost always `self`, but can be any name) is always interpreted as a reference to the current instance of the object.  Every method must have at least this one positional argument, and you can use it to make the method act on the current instance of the class.
- `self.counter` is accessing an _attribute_ of the class.  An attribute is basically just a variable stored inside the class.  Python lets you write `self.counter = [something]` to directly set the value.
- `__init__(self, ...)` is a special method that _initializes_ the class instance.  It is not alowed to return anything.  Usually, this sets any attributes that need to be there from the get-go.
- `Counter()` creates a new _instance_ of the class, and calles the `__init__()` method to initialize it.

`class`es are weird the first few times you see them, but they start to make sense pretty quickly.

The biggest reason to use classes in Python is the special "dunder" ("**d**ouble **under**score") methods.  There are a lot of these that every object has.  Any method those name starts and ends with two underscores--like `__init__()`--is one of these special dunder methods.  There isn't actually anything special, code-wise, about them; the double underscore are just a naming convention to signal "hey, this is a special method; Python will automatically look for a method with this name when you ask it to do certain things."

Some common dunder methods that are available to all classes:
- `__init__()`, which we already saw.
- `__eq__()`, which defines how to check equality between instances of this class and another class.
- `__add__()`, `__sub__()`, and others: defines how math infix operator work.
- `__str__()`, `__repr__()`: defines how the `str()` and `repr()` functions work.
- `__iter__()`: defines how the object behaves when you iterate through it, e.g., when you write `for i in my_class_instance:`.  It's not super common to see this overridden.

Here's an example of messing with the above Counter class's special dunder methods.

In [4]:
class Counter:
    def __init__(self, count=0):
        self.count = count
        
    def increment(self, amount=1):
        # updates the .count attribute in-place
        self.count += amount
        # returns the count attribute
        return self.count
    
    def __add__(self, other):
        self.count += other
        return self
    
    def __str__(self):
        return f"The counter's current value is {self.count:,}."
        
my_counter = Counter()
print(my_counter) # print() implicitly calls str(), which calls the .__str__() method
my_counter = my_counter + 5 # the + operator calls the .__add__() method of the *left* argument.
print(my_counter)

The counter's current value is 0.
The counter's current value is 5.


In Python, _everything_ in the language is implemented using the class interface. Some things--like the built-in data types--aren't _actually_ implemented using `class` statements, like above, but they do present all the same methods that are common to all objects.  So, for all intents and purposes, _types are classes_ in Python.

Other thing that are classes: functions (including lambda functions).  Imported modules.  Numbers.  Error messages.

As a side note, the fact that _functions are objects_ is why we can pass functions as arguments to other functions.  Which is a bit of a weird bit of logical acrobatics: we can do functional programming because Python is so obsessiely object-oriented!  (If that doesn't make sense, don't worry).

# Inheritance

Every `class` statement is basically just a template.  When you create an instance of that class, you make a copy of the template, and "fill in" a bunch of parts.  You can think of this as telling Python "give me a [whatever the class is called], _but_ with these few values changed."

You can tell Python to do something similar with the `class` definitions themselves: "give me a class definition that looks like `Counter`, but with this one method (or maybe more than one method) defined differently."  This is called _inheritance:_ when you tell Python to do this, the new class you're creating _inherits_ all the properties of whatever the starting/base/parent class is, and you choose what is different.  Sometimes this is also called _subclassing._  Let's see a quick example.  We'll subclass our `Counter` class to make a `BackwardsCounter`, which will count down instead of up.

In [5]:
class BackwardsCounter(Counter):      
    def __add__(self, other):
        self.count -= other
        return self
    
my_backwards_counter = BackwardsCounter()
my_backwards_counter = my_backwards_counter + 10
print(my_backwards_counter)

The counter's current value is -10.


The `(Counter)` part of the `class` definition tells Python to essentially make a copy of the `Counter` class definition, and to use all the methods/attributes defined there for `BackwardsCounter`, _unless_ you re-define some method or attribute here.  If you re-define something, use the new definition instead.  So in this example, the `__init__()`, `increment()`, and `__str__()` methods are all the same as for `Counter`; we've just changed what `__add__()` does.

Or in other words: we've told Python "make a new class.  It's exactly like `Counter`, except that `__add__()` works differently."

# Some musings on object-oriented design patterns

Classes are not the same as object-oriented design.  Object-oriented design has rapidly fallen out of fashion in recent years, after getting extremely popular with the advent of C++ and Java (and Modula and Smalltalk, even earlier).  I will occasionally write and use classes as an organizational tool in my code, but increasingly, I find myself not using them unless I need to override one of the special dunder methods.  Everything you can do with a class can be done just as well with some functions, and that is often easier to read (especially for non-trivial programs), and easier to re-use.

Here's the above counter logic, re-written using just functions:

In [6]:
def increment(count, amount=1):
    return count + amount

def print_count(count):
    print(f"The counter's current value is {count:,}.")
    
count = 0
count = count + 10
count = increment(count, 5)
print_count(count)

The counter's current value is 15.


I strongly prefer this sort of programming style.  It does have two major drawbacks, though, that are worth being aware of.

First, Python doesn't have any especially elegant ways to force a function to only accept arguments of a single type.  In the above code, `increment()` will try to run `count + amount` even if the types are incompatible.  And if they are compatible, but they're not numbers, you might get unexpected result (e.g.: passing two string will lead to concatenation).  Other languages with more explicit type systems have ways to do this, e.g., in C:
```c
#include <stdio.h>

// This is roughly the C equivalent of a `class` statement
// in Python
typedef struct Counters {
    int count;
} Counter;

// an error will be thrown if this function is given anything
// other than a value of the `Counter` type, even if if it has
// a .count field, and `c.count + 1` would be a valid operation.
Counter increment(Counter c, int amount) {
    c.count = c.count + amount;
    return c;
}

int main(void) {
    Counter c;
    c.count = 10;
    increment(c, 5);
    
    printf("The counter's value is %i.\n", c.count);
    
    return 0;
}
```

Or Julia:
```julia
mutable struct Counter
    count::Integer
end

function increment(counter::Counter, amount::Number = 1)
    counter.count += amount
    return counter
end

c = Counter(0)
increment(c)
println("The counter's value is $(c.count).")
```

Python does have _a_ way to do this:
```python
def increment(c: Counter, amount: int) -> Counter:
    if not isinstance(c, Counter):
        raise ValueError(f"Expected Counter, got {type(c)}.")
    if not isinstance(amount, int):
        raise ValueError(f"Expected int, got {type(amount)}.")
    c.count += amount
    return c
```

Note that the type annotations in the function definition are _type hints,_ and are ignored by Python when running code.  They're hints to people reading your code about what the intended type are.  The `raise ValueError` lines actually cause the program to stop executing and throw an error message.

If you want to _guarantee_ that the `increment` function can never be called on anything that isn't a `Counter`, the above code would work.  But putting `.increment()` as a method on `Counter` objects provides a much stronger guarantee, with a lot less code overhead and input validation.  However, this isn't really a concern when you're writing your own programs.

The second downside to functions versus methods is the whole single dispatch thing.  If we wanted to implement something like scikit-learn's models using purely functions, we'd need to do something like:

```python
def fit(model, x, y):
    if isinstance(model, LinearSVM):
        return fit_linear_svm(model, x, y)
    elif isinstance(model, RandomForest):
        return fit_random_forest(model, x, y)
    # and so on
```

Or, rather than the one big `fit()` function, maybe we just require the end user to swap out `fit_linear_svm()` for `fit_random_forest()` themselves.  It probably isn't too much more work, but it does make the code a bit harder to swap out.  We probably need to pass our fit parameters to that function, whereas with the current way scikit-learn is written, we can define our models all in one place, then pass them around interchangeably after creating them.  That's harder to do in Python if everything is done through plain functions.  But, that isn't the case in other languages, again, like Julia:

```julia
mutable struct LinearSVM
    # parameters go here
end

mutable struct RandomForest
    # parameters go here
end

function fit(model::LinearSVM)
    # do stuff to fit a linear SVM
end

function fit(model::RandomForest)
    # do tuff to fit a random forest
end

svm = LinearSVM()
fit(svm) # fits the SVM

rf = RandomForest()
fit(rf) # fits the random forest
```

Julia does this through a very pervasive multiple dispatch scheme.  Python doesn't have that, so the above code in Python would look like it does in scikit-learn:

```python
class LinearSVM:
    # __init__, etc methods get defined
    def fit(self, x, y):
        # do the SVM fitting

class RandomForest:
    # __init__, etc methods get defined
    def fit(self, x, y):
        # do the random forest fitting

svm = LinearSVM()
svm.fit(x, y)

rf = RandomForest()
rf.fit(x, y)
```

However, all that being said: the time when you will _need_ to write classes in your own code is pretty small (unless you're using PyTorch).  But they're worth learning, because sometimes they can be very useful.

As a final note, I'm not going to talk about object-oriented design paradigms for writing programs.  I think they're dumb, most of the time, and just add unnecessary complexity.  Avoid writing classes unless you have a specific need to, and even then, it's generally better to operate on those classes with function just as often as methods.