In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

In [None]:
import expectexception

# Object-oriented programming


The are two concepts that are at the core of object-oriented programming: classes and objects.

A **class** is a recipe for creating an object. It is like a template or set of instructions for creating multiple versions of similar things.

An **object** is an instance of a class. That is, it is the result of following a class's recipe. 

Often your programs will have multiple object instances for a given class. These are all distinct and different, but they all follow the same recipe.

We've actually encountered classes and objects quite a bit already. Consider the concepts of lists and dictionaries.

In [None]:
dict1 = {'cat':1, 'dog': 2, 'human': 3}
dict2 = {1: 1, 2: 4, 3: 9, 4: 16}

list1 = [0, 80, 20, 60, 50, 40, 30, 70, 10, 90]
list2 = ['milk', 'eggs', 'cheese', 'bread']

We've just created two instances of the dictionary (`dict`) class and two instances of the list (`list`) class.

We can examine these with the `type` function to show that they are in fact `dict`s and `list`s:

In [None]:
print 'dict1:', type(dict1)
print 'dict2:', type(dict2)
print 'list1:', type(list1)
print 'list2:', type(list2)

We can examine these with the `id` function to show that they point to different locations in memory and are therefore different objects:

In [None]:
print 'dict1:', id(dict1)
print 'dict2:', id(dict2)
print 'list1:', id(list1)
print 'list2:', id(list2)

The dictionaries `dict1` and `dict2` are two separate dictionaries storing different data, and yet they have similar properties as each other and behave in a similar manner. They both offer the same functionality for us to manipulate the data.

The lists `list1` and `list2` are two separate lists, but also have similar properties and behaviors.

In [None]:
print dict1.keys()
print dict2.keys()

print list1.index(70)
print list2.index('bread')

Lists and dictionaries are called *built-in* classes in that Python provides these to you by default. What we're going to learn about in this lesson is how to create our own classes to create multiple instances of objects with similar behaviors.

## Everything is an Object


In your future Python studies you will sometimes hear the phrase "Everything is an Object." What does this mean?

First we must define what an object is. An object is the encapsulation or grouping of related data and/or methods (functions) into a self-contained thing (object) that exists in memory. Objects can be created, modified, passed around, copied, and destroyed.

In Python, all of the *things* you work with are in fact objects, each with associated data and methods. Even *things* you might not expect, like functions. This might seem odd at first, but it actually creates an opportunity. It means that anything that is an object can be passed around as a parameter or assigned to a variable, just like any other object.

The methods associated with an object are methods that are appropriate for acting upon the object's data. Consider a `list` object's methods.  They are a kind of natural collection of choices or options available to you to modify the list object. One of them is the `sort` method.

In [None]:
list1.sort()
list1

This sorts the `list` object in-place, in that it sorts the original `list` object. It does not create a new `list` object with sorted data and leave the original `list` unaffected.

I also could have sorted the `list` like this:

In [None]:
print sorted(list2)
print list2

Python comes with a standalone `sorted` method that can sort anything. It leaves the original `list` unaffected while creating a new sorted `list` object.

Do you see the difference between the two approaches to sorting a list? One is with a function that is associated with the object, and the other, a free standing function not associated with the list.

It happens that the `dict` objects do not have a `sort` method like `list`s. It doesn't fully make sense to sort a dictionary because they are unordered structure mapping keys to values. You can probably remember that a `dict` does not have a `sort` method, but what would Python be like if no object had methods? You'd have to remember which functions were appropriate for acting upon which objects and probably make a lot of mistakes. It would get confusing!

## Defining a Python Class


Let's start by defining a simple class:

In [None]:
class Point(object):
    pass

This defines a class called `Point`. This class could be called a dummy class in that it doesn't really do anything. As we will see later, most classes have more definitions inside of them. Here we have none, so we must use Python's `pass` statement. The `pass` statement is just a syntactic placeholder, indicating that nothing needs to be done. (The keyword `object` indicates that this is a "new-style" class, a distinction that you shouldn't worry about.)

What is `Point`? It's a class, which is a kind of *type* in Python.

In [None]:
Point, type(Point)

To make objects of class `Point`, we *instantiate* the class.  In Python, we do that by calling the class. Observe the parentheses after `Point`.

In [None]:
pt = Point()

Now `pt` is an instance of the `Point` class.

In [None]:
pt

There are two ways to programmatically verify this:

In [None]:
print type(pt) is Point
print isinstance(pt, Point)

We can make more `Point`s by instantiating the class again. This results in the creation of different objects.

In [None]:
pt2 = Point()
pt3 = Point()

We can use the Python built-in function `id` to show that the variables refer to different memory locations.

In [None]:
print id(pt)
print id(pt2)
print id(pt3)

## Adding Attributes and Methods

Objects can have data associated with them.  These are called **attributes**, and in Python they are indicated with the `object.attribute` syntax.  Python allows arbitrary attributes to be added to objects on the fly.

In [None]:
pt.banana = 5
print pt.banana

Now our point has 5 bananas.

Note that this does not affect other objects of the same class. The `banana` attribute was added to only that one instance of the `Point` class.

In [None]:
%%expect_exception AttributeError

print pt2.banana

Generally, we don't create attributes on an object-by-object basis. Instead, we create attributes in the class definition, so that we can guarantee that they will be there in all of our objects/instances of that class.

Let's add functionality to our Point class. We want Point to allow us to create objects similar to mathematical points on a Cartesian grid. For now we'll work with just two dimensions, $x$ and $y$.

In [None]:
class Point(object):
    """Cartesian point class capable of vector arithmetic."""
    
    def __init__(self, x, y):
        self.x = x
        self.y = y

When functions are defined within a class, these become **methods** of the class.  These methods are called with an `object.method()` syntax, parallel to the attributes. Let's add a method so we can add two points together using vector addition.

In [None]:
class Point(object):
    """Cartesian point class capable of vector arithmetic."""

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add(self, other):
        """Vector addition of Points, return new Point."""
        return Point(
            self.x + other.x,
            self.y + other.y
        )

Question: Why don't we `return self.x + other.x, self.y + other.y`? What other things could you imagine our `add` method doing?

In [None]:
pt1 = Point(3, 5)
pt2 = Point(8, -3)
pt3 = pt1.add(pt2)

For the moment, let's consider just the call of `add`.  Why is it defined with a two arguments (`self` and `other`), but called with just one?  The first argument of a method call is always the object itself.  By convention, we call this argument `self`.  (`self` is actually *not* a keyword in Python.  But the convention is so strong, you'll never see another name used there!)  When you call `object.method(argument)`, behind the scenes, Python changes that to `Class.method(object, argument)`.  You can do that yourself, if you like:

In [None]:
pt4 = Point.add(pt1, pt2)

print pt3.x, pt3.y
print pt4.x, pt4.y

A common programming mistake is to forget the `self` argument. If we had defined the `Point` class like this:
    class Point(object):
        """Cartesian point class capable of vector arithmetic."""
        
        def __init__(self, x, y):
            self.x = x
            self.y = y

        def add(other):
            """Vector addition of Points, return new Point."""
            return Point(
                self.x + other.x,
                self.y + other.y
            )
        
We would have seen this error:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-39-53c0be79a087> in <module>()
          1 pt1 = Point(3, 5)
          2 pt2 = Point(8, -3)
    ----> 3 pt1.add(pt2)

    TypeError: add() takes exactly 1 argument (2 given)
    
Any time you see a `TypeError` for a class method stating that the number of given arguments is one more than the number of arguments the method expects, check for this mistake.

Earlier we printed out `pt3.x, pt3.y`, which seems like a strange thing to do. If we had `pets = ['cat', 'dog', 'fish']` we wouldn't do `print pets[0], pets[1], pets[2]`, we would just `print pets`.

In [None]:
print pt3

What happened, why didn't `add` show us meaningful output?

We need to define how Python should represent (e.g. `print`) instances of our class. This is done with a special method called `__repr__`. By adding a `_repr__` method, we tell Python how to use an object's attributes to represent that object to the user.

In [None]:
class Point(object):
    """Cartesian point class capable of vector arithmetic."""

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add(self, other):
        """Vector addition of Points, return new Point."""
        return Point(
            self.x + other.x,
            self.y + other.y
        )

    def __repr__(self):
        return "Point({}, {})".format(self.x, self.y)

In [None]:
pt1 = Point(3, 5)
pt2 = Point(8, -3)
pt3 = pt1.add(pt2)
print pt3

Methods starting with two underscores (called "dunder") are often special in Python.  Another example, `__init__()`, is an initializer function that is called right after the object is created.

Any arguments passed to the class at instantiation (e.g. 3 and 5) get passed on to the initializer.  The first argument of `__init__()` is self, and then we store x and y as attributes, so that any class methods can reference them later.

Unlike many other object-oriented languages, Python doesn't have a concept of public or private attributes.  All are accessible to everyone.

In [None]:
pt1.y = 2
pt1

Python does have a convention, though, that addresses this issue.  Attributes and methods that start with an underscore are considered "internal" to the class.  There's nothing stopping you from accessing them, but there's no guarantee that they'll remain the same in new versions.  If you use them and your code breaks in the future, it's your own fault.

Writing classes from scratch is often an extensive project. For example, let's try a few simple expressions to illustrate the number of features we still need to implement.

In [None]:
%%expect_exception TypeError

Point(3, 8) + Point(1, -3)

In [None]:
Point(3, 8) == Point(3, 8)

In [None]:
%%expect_exception TypeError

5 * Point(1, -2)

Let's fill in this missing functionality and other methods we might want a Point to have.

In [None]:
class Point(object):
    """Cartesian point class capable of vector arithmetic."""

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add(self, other):
        """Vector addition of Points, return new Point."""
        return Point(
            self.x + other.x,
            self.y + other.y
        )

    def __add__(self, other):
        return None

    def __mul__(self, const):
        return None

    def __repr__(self):
        return "Point({}, {})".format(self.x, self.y)

    def __eq__(self, other):
        return None

    def norm(self):
        """Calculate distance from Point to origin."""
        return None

## Inheritance


Many object-oriented languages have the concept of inheritance, and Python is no exception.  When one class **inherits** from another, it has all of the methods and properties of the first, plus whatever it adds.

We have developed a `Point` class that implements some simple vector arithmetic on a 2D Cartesian grid, a special case of a more general class that implements vector arithmetic on an n-dimensional Cartesian grid. If we defined a `Vector` class for this more general case, then we could easily define `Point` class that inherits its functionality from `Vector`.

In [None]:
class Vector(object):

    def __init__(self, *args):
        if all(isinstance(arg, (int, float, long)) for arg in args):
            self.components = list(args)
            self.dim = len(args)
        else:
            raise TypeError("All arguments must be int, float, or long.")

    def __add__(self, other):
        if self.dim == other.dim:
            return Vector(
                *[self.components[i] + other.components[i]
                  for i in range(self.dim)]
            )
        else:
            raise ValueError("Can only add vectors with same dimension.")

    def __sub__(self, other):
        neg_other = Vector(*[-component for component in other.components])
        return self.__add__(neg_other)

    def __mul__(self, multiplier):
        if isinstance(multiplier, (int, float, long)):
            return Vector(*[multiplier * component for component in self.components])
        elif isinstance(multiplier, Vector):
            return self.inner_product(multiplier)
        else:
            raise ValueError("Multiplier must be int, float, long, or Vector.")

    def __repr__(self):
        return "Vector({})".format(', '.join(map(lambda x: str(x), self.components)))

    def __eq__(self, other):
        if self.dim == other.dim:
            return self.components == other.components
        return False

    def norm(self):
        """Calculate distance from Point to origin."""
        return (sum(component**2 for component in self.components))**0.5

    def inner_product(self, other):
        if self.dim == other.dim:
            return sum(self.components[i] * other.components[i]
                       for i in range(self.dim))
        else:
            raise ValueError("Can only take inner product of vectors with same dimension")

    def cross_product(self, other):
        if self.dim == 3 and other.dim == 3:
            component1 = (self.components[1] * other.components[2]
                          - self.components[2] * other.components[1])
            component2 = (-self.components[0] * other.components[2]
                          + self.components[2] * other.components[0])
            component3 = (self.components[0] * other.components[1]
                          - self.components[1] * other.components[0])
            return Vector(
                component1, component2, component3
            )
        else:
            raise ValueError("Can only take cross product between three-dimensional vectors.")

In [None]:
v1 = Vector(1, 6, 2)
v2 = Vector(7, -2, -4)
v3 = Vector(6, 2, 8, 3, 5)

In [None]:
print v1 * 2
print v3 * -1
print v1 * v2
print v1 + v2
print v1.cross_product(v2)
print v1 == v2

Now we'll write our `Point` class that inherits from `Vector`.

In [None]:
class Point(Vector):
    def __init__(self, x, y):
        Vector.__init__(self, x, y)

    def __add__(self, other):
        return Point(*Vector.__add__(self, other).components)

    def __sub__(self, other):
        neg_other = Point(*[-component for component in other.components])
        return self.__add__(neg_other)

    def __mul__(self, multiplier):
        result = Vector.__mul__(self, multiplier)
        if isinstance(result, Vector):
            return Point(*result.components)
        return result

    def __repr__(self):
        return "Point({})".format(', '.join(map(lambda x: str(x), self.components)))

    def cross_product(self, other):
        return Vector(0, 0,
                      self.components[0] * other.components[1]
                      - self.components[1] * other.components[0])

In [None]:
pt1 = Point(1, 6)
pt2 = Point(7, -2)

In [None]:
%%expect_exception TypeError

pt3 = Point(6, 2, 8, 3, 5)

In [None]:
print pt1 + pt2
print pt1 * pt2
print pt1 * 2
print pt1.cross_product(pt2)
print pt1.norm()

Here, we create a new class, `Point`, that inherits from `Vector`.  We would say that `Point` is a **subclass** of `Vector` and that `Vector` is a **superclass** of `Point`.   All two-dimensional vectors (i.e. points) are vectors, so this inheritance makes sense.

We only need to specify two vector components for a Point, since it's two-dimensional.  So we define a new initializer function.  It will call the underlying `Vector` initializer, but accepts only two arguments.

Since `Point` is a subclass of `Vector`, it has a `norm()` method without us having to explicitly code it. The magic of inheritance is that we can create a `Point` class without having to duplicate methods from the **superclass**. Furthermore, when we do define a method of the subclass, it's often abbreviated, only containing the code needed to modify output of superclass methods (e.g. addition etc. should return `Point`, not `Vector`).

Typically you will leverage this feature by implementing common functionality in a superclass and idiosyncratic functionality in the subclasses. We'll see an example of this later in the lesson.

If we want to check if an object is of type Point, we can use the two ways we have already seen, but what about checking if a Point is a Vector?

In [None]:
point = Point(5, 2)
print type(point) is Point
print isinstance(point, Point)
print type(point) is Vector
print isinstance(point, Vector)

`isinstance(obj, Class)` will check if `obj` is an instance of a `Class` or any superclass of `Class` whereas `type(obj) is Class` will only check if `obj` is an instance of `Class`.

A class can inherit from multiple superclasses.  Here we make a barebones `Matrix` class that can perform matrix multiplication:

In [None]:
class Matrix(object):
    """Basic 2D matrix object implementing matrix multiplication."""

    def __init__(self, matrix):
        self.nrows = len(matrix)
        if self.nrows:
            self.ncols = len(matrix[0])
        else:
            self.ncols = 0

        # check rows are same length
        for row in matrix:
            if len(row) != self.ncols:
                raise ValueError("Matrix must have rows of constant length.")

        # check all values are numeric
        for row in matrix:
            for value in row:
                if not isinstance(value, (int, float, long)):
                    raise ValueError("Matrix may only contain numbers.")

        self.matrix = matrix

    def matrix_product(self, other):
        # check inner dimensions match
        if self.ncols != other.nrows:
            raise ValueError("ncols of left matrix must match nrows of right matrix")

        result = []
        for i in range(self.nrows):
            row = []
            for j in range(other.ncols):
                row.append(sum(self.matrix[i][k] * other.matrix[k][j]
                               for k in range(self.ncols)))
            result.append(row)

        return Matrix(result)

    def __repr__(self):
        return "Matrix({})".format(self.matrix)

Now we can make a class, `LAVector`, that combines the functionality of `Vector` with the ability to do matrix multiplication from `Matrix`.

In [None]:
class LAVector(Vector, Matrix):
    def __init__(self, vector):
        if not isinstance(vector[0], list):
            vector = [vector]
        Matrix.__init__(self, vector)
        if self.ncols != 1 and self.nrows != 1:
            raise TypeError('Vector must have rank 1.')
        self.dim = max(self.ncols, self.nrows)
        self.components = [elem for row in self.matrix for elem in row]

    def __mul__(self, multiplier):
        if isinstance(multiplier, Matrix):
            result = self.matrix_product(multiplier)
            if result.ncols == 1 and result.nrows == 1:
                result = result.matrix[0][0]
            elif result.ncols == 1 or result.nrows == 1:
                result = LAVector(result.matrix)
        else:
            result = Vector.__mul__(self, multiplier)

        return result

    def __repr__(self):
        return "LAVector({})".format(self.matrix)

In [None]:
lv1 = LAVector([1, 5, 2])
v1 = Vector(5, 2, 1)
lv2 = LAVector([[5], [2], [1]])
mat = Matrix([[2, 5, 5], [8, 1, 0], [1, -1, -2]])

In [None]:
print lv1 * lv2
print lv2 * lv1
print lv1 * mat
print v1*lv1
print lv1 * 5

In [None]:
print lv1 + lv2
print lv2 - lv1
print lv1.norm()

Note that the results of multiplication calls on both `matrix_product` from `Matrix` as well as `__mul__` from `Vector`.  We also inherit methods like `__add__`, `__sub__`, and `norm` which are only defined in `Vector`. This is fairly easy to track, since every method is defined only once.  In the cases where both classes define a method (e.g. `__init__`) we explicitly specify which one we want to use. But what would happen if both superclasses defined a method with a given name, and we didn't specify which we wanted to invoke?

In [None]:
class A(object):
    
    def func(self):
        print "Class A"

class B(object):
    
    def func(self):
        print "Class B"

class C(A, B):
    pass

c = C()
c.func()

Python's [Method Resolution Order](https://www.python.org/download/releases/2.3/mro/) specifies the order in which parent classes are queried to find a method of a particular name.  The result of that algorithm are stored in a class property called `__mro__`.

In [None]:
C.__mro__

Thus, we we ask for a method called `func()`, we will check first class `C` and then `A`.  Since `A` has a `func()` method, we don't proceed on to check `B` or `object`.  While the MRO is perfectly deterministic, it may be hard to reason about for non-experts.  For instance, if we inherit in the opposite order, we'll get class `B`'s method:

In [None]:
class C2(B, A):
    pass

c2 = C2()
c2.func()

In [None]:
C2.__mro__

In general, it's probably a good idea to avoid complicated multiple inheritance.

There are two types of multiple inheritance that tend to be safer.  Some languages implement these as separate constructs, but in Python, it's all done with classes.

**Mixins** are classes that add some specific capability.  The `Hello` class from before acts as a mixin, adding the `speak()` method.  Because they provide only a few methods and attributes, they are unlikely to cause conflicts in the MRO.

**Interfaces** are classes that define a particular set of methods (or attributes) that a subclass is supposed to implement.  In Python, this tends to be done by defining methods that `raise NotImplementedError`.  This is useful in languages with type checking, as the compiler can be sure that any subclass of the interface will implement certain methods.  With Python's duck-typing, it tends not to be used.

## Putting it all together...

Let's turn to another example to summarize what we've learned. Let's define a `Person` class.

In [None]:
class Person(object):
    
    def __init__(self, first, last, age):
        self._first = first
        self._last = last
        self._age = age
    
    def _get_full_name(self):
        """return person's name, last name first"""
        return '%s, %s' % (self._last, self._first)
    
    def get_info(self):
        """return string of basic info on person"""
        return 'name: %s age: %d' % (self._get_full_name(), self._age)

    def report(self):
        raise NotImplementedError('You forgot to implement the report method!')

We define the class with some generic methods and attributes that characterize a person. We've additionally defined a `report` method. However, this `report` method does not have useful functionality as it will throw an error if we call it directly. Instead, we are coding it in a way that indicates to whomever codes subclasses of the `Person` that instances of `Person` are expected to have a working `report` method.

Typically if you are working on a project by yourself you might omit the `report` method in `Person` entirely. If this code is the work of a large team, it might be helpful to your teammates to include it.

Let's define two subclasses of `Person`:

In [None]:
class Teacher(Person):
    
    def __init__(self, first, last, age, department):
        """ Create a teacher for this department """
        super(Teacher, self).__init__(first, last, age)
        self._department = department
        self._students = []
        
    def add_student(self, student):
        """add a student that is learning from this teacher"""
        self._students.append(student)

    def report(self):
        """report on a teacher's students"""
        title = '%s (%s)' % (self.get_info(), self._department)
        print title
        print '=' * len(title)
        for student in self._students:
            print student.get_info()
            
class Student(Person):
    """ Creat a student with age and major """
    def __init__(self, first, last, age, major):
        super(Student, self).__init__(first, last, age)
        self._major = major

    def report(self):
        """report about student"""
        print '%s (%s)' % (self.get_info(), self._major)

In this example, instances of `Teacher` will contain a list of `Student` instances. We can use this to record which students have classes with the teachers.

In [None]:
teacher1 = Teacher('Stephen', 'Hawking', 74, 'Physics')
teacher2 = Teacher('Donald', 'Knuth', 78, 'Computer Science')

student1 = Student('John', 'Smith', 20, 'Mechanical Engineering')
student2 = Student('Mary', 'Johnson', 20, 'Physics')
student3 = Student('James', 'Williams', 19, 'Political Science')
student4 = Student('Patricia', 'Brown', 19, 'Computer Science')

All of the `Teacher` and `Student` instances have the `get_info` method. I did not explicitly code it for either class because it is inherited from the `Person` class. The same code is executed when instances of either class call `get_info`.

In [None]:
print teacher1.get_info()
print student3.get_info()
print student4.get_info()

All of the `Teacher` and `Student` instances are instances of the `People` class.

In [None]:
print isinstance(teacher1, Teacher)
print isinstance(teacher2, Teacher)
print isinstance(student1, Student)
print isinstance(student2, Student)
print isinstance(student3, Student)
print isinstance(student4, Student)

The `Teacher` instances are NOT instances of the `Student` class, and the `Student` instances are NOT instances of the `Teacher` class. Both are instances of the `Person` class.

In [None]:
print isinstance(teacher1, Student)
print isinstance(student1, Teacher)
print isinstance(teacher1, Person)
print isinstance(student1, Person)

We can add students who are in the teachers' classes.

In [None]:
teacher1.add_student(student1)
teacher1.add_student(student3)

teacher2.add_student(student2)
teacher2.add_student(student3)
teacher2.add_student(student4)

We will now use the `report` method for our `Person` instances. Observe that the functionality of the `report` method for the two kinds of classes are different.

In [None]:
teacher1.report()

In [None]:
teacher2.report()

In [None]:
student1.report()

In [None]:
student2.report()

### Exercises


1. Write a classes `Cube`, `Sphere`, `Cone`, and `Cylinder`, all inherited from a common `Shape` base class. Define methods `calculate_volume`, `calculate_surface_area`, `set_color`, and `get_color`. Which methods should be implemented in the base class and which should be implemented in the child classes?

### Exit Tickets


1. What is a class? Define it in your own words. When might a class be useful?
2. What is inheritance?
3. What is the difference between an attribute and a method?

*Copyright &copy; 2016 The Data Incubator.  All rights reserved.*