<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Object Oriented Programming: Coding a Linear Regression Class

_Authors: Kiefer Katovich (SF)_

---

### Learning Objectives
- Learn the fundamentals of object oriented programming in python
- Review the solution to coefficients for multiple linear regression
- Apply object oriented programming concepts to build a linear regression class by hand

### Lesson Guide
- [Review the linear algebra derivation of coefficients for MLR](#review-mlr)
- [Load the simple housing data](#load-data)
- [Classes and objects](#classes-objects)
- [Coding our own `LinearRegression` class](#coding-lr)
    - [Starting a basic python class](#starting-class)
    - [Adding a class function](#class-function)
    - [Assigning attributes during instantiation](#init-args)
    - [Add another function to add an intercept](#intercept-adder)
    - [Instantiate the class](#instantiate)
    - [Add a predict function](#predict)
    - [Add a score function](#score)
- [Verify your class against the sklearn implementation](#verify)
- [Inspecting a class](#inspection)
- [Some special class methods](#special)

<a id='review-mlr'></a>

## Review: solving for the coefficients that minimize the loss

---

### The "least squares" solution to linear regression

**Step 1:** With target vector $y$ and prediction matrix $X$, we can formulate a regression as:

### $$ y = \beta X + \epsilon $$

Where $\beta$ is our vector of coefficients and $\epsilon$ is our vector of errors, or residuals.

**Step 2:** We can equivalently formulate this as a calculation of the residuals:

### $$ \epsilon = \beta X - y $$

*Our goal is to minimize the sum of squared residuals.* This is also known as the "least squares loss function". 

**Step 3:** Solve for the sum of squared residuals on the left side of the equation. Recall the vector of errors are equivalent to the residuals. The sum of squared residuals is represented as the dot product of the vector of residuals.

### $$ \sum_{i=1}^n \epsilon_i^2 = 
\left[\begin{array}{cc}
\epsilon_1 \cdots \epsilon_n
\end{array}\right] 
\left[\begin{array}{cc}
\epsilon_1 \\ \cdots \\ \epsilon_n
\end{array}\right] = \epsilon' \epsilon
$$

Therefore we can write the sum of squared residuals as:

### $$ \epsilon' \epsilon = (\beta X - y)' (\beta X - y) $$

Which becomes:

### $$ \epsilon' \epsilon = y'y - y'X\beta - \beta' X' y + \beta' X' X \beta $$

**Step 4:** We want to find the coefficients where the loss function will be minimum. In this case we can use calculus, taking the derivative with respect to the $\beta$ vector:

### $$ \frac{\partial \epsilon' \epsilon}{\partial \beta} = 
-2X'y + 2X'X\beta$$

Since want to minimize loss function and the loss function is convex, we set the derivative to zero and solve for the beta coefficient vector:

### $$ 0 = -2X'y + 2X'X\beta \\
X'X\beta = X'y \\
\beta = (X'X)^{-1}X'y$$

In [3]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

<a id='load-data'></a>

## Load the simple housing data

---

This dataset only has 4 columns. We can formulate simple regression problems with it to test our linear regression class down the line.

In [4]:
house = './datasets/housing-data.csv'
house = pd.read_csv(house)

<a id='classes-objects'></a>

## Classes and objects

---

In python, everything is an "object" of some type. This is the basis of what is known as **Object Oriented Programming (OOP)**.

A *class* is a type of object. You can think of a class definition as a sort of "blueprint" that specifies the construction of a new object when instantiated.

> **Note:** Knowing how to define and use classes is essential to programming python at an intermediate or advanced level. I will cover the basics here, which will help you understand how things like `LinearRegression` in sklearn work.


<a id='coding-lr'></a>

## Coding our own version of the sklearn `LinearRegression` class

---

By now you are familiar with the `LinearRegression` class in sklearn. We will walk through the re-creation of this class (albeit a simplified version).


<a id='starting-class'></a>
### 1. Starting a basic python class

Below is the beginning of our class blueprint:

In [5]:
class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None

What are the components of this?

**`class`**

- The `class` is like `def`, but instead of defining a function it defines a class.

**`object`**

- `object` in the parentheses of the class definition indicate that this class "inherits" from the `object` class. The object class is a very general, very fundamental class in python. Inheritance means that whatever properties and function are part of the `object` class are passed down to our `SimpleLinearRegression` class.

**`def __init__(self)`**

- The `def __init__(self):` is our class's initialization function. This function is called when you instantiate the class by typing `SimpleLinearRegression()`

**`self`**

- `self` is the first argument to class definitions. It is a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `slr = SimpleLinearRegression()`, the `self` argument is now a reference to the current instantiation of the class `slr`. Now, when you use a function that is part of the class, it knows to use that specific object's function. This lets you have multiple instantiations of a class with the same function name.

**class attributes**

- `self.coef_` and `self.intercept_`, likewise, are "attributes" (variables) that are connected to the instantiation of the class. When self becomes `slr`, for example, the `self` becomes `slr` and `self.coef_` becomes `slr.coef`

In [6]:
'''
Below is a dumb example of class inheritance.
'''

class DumbClass(object):
    
    variable_parent = 10
    
    def __init__(self, set_variable=1):
        self.variable = set_variable
        
    def multiply_variable(self, multiplier):
        return multiplier
    
    
class DumbChild(DumbClass):
    """
    My documentation for this class.
    
    Functions:
        print_hello
    """
    
    def __init__(self, set_variable=2):
        super(DumbChild, self).__init__(set_variable=set_variable)        
        print 'called parents init'
    
    def print_hello(self):
        print self
        
child = DumbChild(set_variable=3)
print child.variable
child.print_hello()
child.multiply_variable(10)

called parents init
3
<__main__.DumbChild object at 0x1043f3f50>


10

---

<a id='class-function'></a>
### 2. Adding a class function

Now, just like with `__init__`, we can add functions to the class.

**Let's add a `fit()` method that will calculate the coefficients for a linear regression.**
- The function should have arguments `self`, `X` and `y`.
- Use the linear algebra equations above to calculate the coefficients and intercept.
- Assign the coefficients to `self.coef_` and the intercept to `self.intercept_`.

In [7]:
class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None
        
    def fit(self, X, y):
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        self.coef_ = np.dot(XtX_inv_Xt, y)

Notice how we assigned `self.coef_` inside of the `fit()` function.

This will set the class attribute `self.coef_`, and this attribute can be accessed by _any other function in the class without passing it as an argument!_

It can also be accessed by you after instantiating the class.

---

<a id='init-args'></a>
### 3. Assigning attributes during instantiation

There is an issue here - we may pass an `X` matrix in without an intercept. 

**Add a keyword argument to the `__init__` function which will specify whether the `X` matrix should have an intercept added or not.**

In [8]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def fit(self, X, y):
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        self.coef_ = np.dot(XtX_inv_Xt, y)
        

**Now, if we instantiate the class, it will assign `fit_intercept` to the class attribute `fit_intercept`. Try it out:**

In [9]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit_intercept

True

In [10]:
slr = SimpleLinearRegression(fit_intercept=False)
slr.fit_intercept

False

---

<a id='intercept-adder'></a>
### 4. Add a function to add an intercept to the `X` matrix if necessary

This function will be called from inside the `fit` function and run conditional on the value of `self.fit_intercept`.

In [11]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]

---

<a id='instantiate'></a>
### 5. Instantiate the class

At this point we can try out our class. 

**Instantiate the class and try out the coefficient fitting function on the housing data.**

In [12]:
y = house.price.values
X = house[['sqft','bdrms','age']].values

In [13]:
slr = SimpleLinearRegression(fit_intercept=True)
print slr.fit_intercept
print slr.coef_
print slr.intercept_

True
None
None


In [14]:
slr.fit(X, y)

In [15]:
print slr.coef_
print slr.intercept_

[  139.33484671 -8621.47045953   -81.21787764]
92451.6278416


Like in the sklearn `LinearRegression` class, we now have access to the assigned `coef_` and `intercept_` attributes after fitting the model.

---

<a id='predict'></a>
### 6. Add the `predict` function.

Let's add some more of the class methods that are in the real `LinearRegression` class.

**First off, add the `predict` function. It will take a design matrix `X` and return predictions for those rows.**

In [16]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]
        
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        return np.dot(X, np.concatenate([[self.intercept_], self.coef_]))

**Test out the predict function.**

In [17]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit(X,y)
y_hat = slr.predict(X)

In [18]:
print y.shape, y_hat.shape

(47,) (47,)


---

<a id='score'></a>
### 7. Add a `score` function

This will calculate the $R^2$ of your model on a provided `X` and `y`.

> **Note:** You'll probably want to write a helper function to calculate the sum of squared errors, since this will be run for both the baseline model and the regression model in order to calculate the $R^2$.

In [19]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]
        
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        return np.dot(X, np.concatenate([[self.intercept_], self.coef_]))
    
    def _calculate_sse(self, y_true, y_hat):
        return np.sum((y_true - y_hat)**2)
        
    def _calculate_r2(self, sse_model, sse_baseline):
        return 1. - float(sse_model)/sse_baseline
    
    def score(self, X, y):
            
        baseline_sse = self._calculate_sse(y, np.tile(np.mean(y), len(y)))
        
        y_hat = self.predict(X)
        model_sse = self._calculate_sse(y, y_hat)
        
        return self._calculate_r2(model_sse, baseline_sse)
          

In [20]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit(X,y)
r2 = slr.score(X, y)
print r2

0.733163999069


<a id='verify'></a>

## Verify your class against the sklearn `LinearRegression` implementation.

---

Our class should return the same results for the $R^2$

In [21]:
from sklearn.linear_model import LinearRegression

In [22]:
lr = LinearRegression()
lr.fit(X,y)
lr.score(X,y)



0.73316399906900243

<a id='inspection'></a>

## Inspecting a class

---

When we want to know more about a class object, we can use the "inspect" module. Specifically the `inspect.getmembers()` function takes an instantiated class as an argument and returns an information dictionary.

This can be helpful to know what attributes and methods are avaiable and basically, the blueprint of a class object in memory.  Depending on the way the class was implemented, you can usually find useful information hiding inside of `slr.__class__.__dict__` -- which can be easier to look at.  The "right way" is to use the "inspect" module.

In [23]:
import inspect

In [24]:
inspect.getmembers(slr)

[('__class__', __main__.SimpleLinearRegression),
 ('__delattr__',
  <method-wrapper '__delattr__' of SimpleLinearRegression object at 0x114f58c90>),
 ('__dict__',
  {'coef_': array([  139.33484671, -8621.47045953,   -81.21787764]),
   'fit_intercept': True,
   'intercept_': 92451.627841645211}),
 ('__doc__', None),
 ('__format__', <function __format__>),
 ('__getattribute__',
  <method-wrapper '__getattribute__' of SimpleLinearRegression object at 0x114f58c90>),
 ('__hash__',
  <method-wrapper '__hash__' of SimpleLinearRegression object at 0x114f58c90>),
 ('__init__',
  <bound method SimpleLinearRegression.__init__ of <__main__.SimpleLinearRegression object at 0x114f58c90>>),
 ('__module__', '__main__'),
 ('__new__', <function __new__>),
 ('__reduce__', <function __reduce__>),
 ('__reduce_ex__', <function __reduce_ex__>),
 ('__repr__',
  <method-wrapper '__repr__' of SimpleLinearRegression object at 0x114f58c90>),
 ('__setattr__',
  <method-wrapper '__setattr__' of SimpleLinearRegressi

<a id='special'></a>

## Some special class methods

---

|Method| Description|
|--|--|
|\_\_init\_\_ ( self [,args...] )| Constructor (with any optional arguments) Sample Call : obj = className(args)
|\_\_del\_\_( self ) | Destructor, deletes an object Sample Call : del obj
|\_\_repr\_\_( self ) | Evaluatable string representation Sample Call : repr(obj)
|\_\_str\_\_( self ) | Printable string representation Sample Call : str(obj)
|\_\_cmp\_\_ ( self, x ) | Object comparison Sample Call : cmp(obj, x)

The `__repr__` function reports back something descriptive about what the class represents.  You can basically do whatever you want with it but the purpose of it is to convey something descirptive about your class.

The `__del__` method is the bookend function of `__init__`. You can use it to run code once your class is done executing.  

Generally it works well but in practice there are a few things watch out for.  Read more about [safely using Python destructors](http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python)