<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Object-Oriented Programming: Coding a Linear Regression Class

_Authors: Kiefer Katovich (SF)_

---

### Learning Objectives
- Learn the fundamentals of object-oriented programming in Python
- Review the solution to coefficients for multiple linear regression
- Apply object-oriented programming concepts to build a linear regression class by hand

### Lesson Guide

- [Review the Linear Algebra Derivation of Coefficients for MLR](#review-mlr)
- [Load the Simple Housing Data](#load-data)
- [Classes and Objects](#classes-objects)
- [Coding our Own `LinearRegression` Class](#coding-lr)
    - [Starting a Basic Python Class](#starting-class)
    - [Adding a Class Function](#class-function)
    - [Assigning Attributes During Instantiation](#init-args)
    - [Add Another Function to Add an Intercept](#intercept-adder)
    - [Instantiate the Class](#instantiate)
    - [Add a Predict Function](#predict)
    - [Add a Score Function](#score)
- [Verify Your Class Against the Scikit-Learn Implementation](#verify)
- [Inspecting a Class](#inspection)
- [Some Special Class Methods](#special)

<a id='review-mlr'></a>

## Review: Solving for the Coefficients That Minimize the Loss

---

### The "Least Squares" Solution to Linear Regression

**Step 1:** With target vector $y$ and prediction matrix $X$, we can formulate a regression as:

### $$ y = \beta X + \epsilon $$

Where $\beta$ is our vector of coefficients and $\epsilon$ is our vector of errors, or residuals.

**Step 2:** Equivalently, we can formulate this as a calculation of the residuals:

### $$ \epsilon = \beta X - y $$

*Our goal is to minimize the sum of the squared residuals.* This is also known as the "least squares loss function." 

**Step 3:** Solve for the sum of the squared residuals on the left side of the equation. Recall that the vector of errors are equivalent to the residuals. The sum of the squared residuals is represented as the dot product of the vector of residuals.

### $$ \sum_{i=1}^n \epsilon_i^2 = 
\left[\begin{array}{cc}
\epsilon_1 \cdots \epsilon_n
\end{array}\right] 
\left[\begin{array}{cc}
\epsilon_1 \\ \cdots \\ \epsilon_n
\end{array}\right] = \epsilon' \epsilon
$$

Therefore we can write the sum of the squared residuals as:

### $$ \epsilon' \epsilon = (\beta X - y)' (\beta X - y) $$

Which becomes:

### $$ \epsilon' \epsilon = y'y - y'X\beta - \beta' X' y + \beta' X' X \beta $$

**Step 4:** We want to find the coefficients at the loss function's minimum. In this case we can use calculus, taking the derivative with respect to the $\beta$ vector:

### $$ \frac{\partial \epsilon' \epsilon}{\partial \beta} = 
-2X'y + 2X'X\beta$$

Because we want to minimize the loss function and the loss function is convex, we set the derivative to zero and solve for the beta coefficient vector:

### $$ 0 = -2X'y + 2X'X\beta \\
X'X\beta = X'y \\
\beta = (X'X)^{-1}X'y$$

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

<a id='load-data'></a>

## Load the Simple Housing Data

---

This data set only has four columns. We can formulate simple regression problems with the data set to test our linear regression class down the line.

In [2]:
house = './datasets/housing-data.csv'
house = pd.read_csv(house)

<a id='classes-objects'></a>

## Classes and Objects

---

In Python, everything is an "object" of some type. This is the basis of what is known as **object-oriented programming (OOP)**.

A *class* is a type of object. You can think of a class definition as a sort of blueprint that specifies the construction of a new object when instantiated.

> **Note:** Knowing how to define and use classes is essential for programming with Python at an intermediate or advanced level. We will cover the basics here, which will help you understand how concepts like `LinearRegression` in scikit-learn work.


<a id='coding-lr'></a>

## Coding our Own Version of the Scikit-Learn `LinearRegression` Class

---

By now you're familiar with the `LinearRegression` class in scikit-learn. We'll walk through the re-creation of this class (albeit a simplified version).


<a id='starting-class'></a>
### 1) Starting a basic Python class.

Below is the beginning of our class blueprint:

In [3]:
# A:

What are the components of the blueprint?

**`class`**

- `class` works like `def`, but instead of defining a function, it defines a class.

**`object`**

- In the parentheses of the `class` definition, `object` indicates that this class "inherits" from the `object` class. The `object` class is a very general, fundamental class in Python. Inheritance means that whatever properties and functions are part of the `object` class are passed down to our `SimpleLinearRegression` class.

**`def __init__(self)`**

- `def __init__(self):` is our class' initialization function. This function is called when you instantiate the class by typing `SimpleLinearRegression()`.

**`self`**

- `self` is the first argument to class definitions. It's a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `slr = SimpleLinearRegression()`, the `self` argument becomes a reference to the current instantiation of the class `slr`. Now, when you use a function that is part of the class, it knows to use that specific object's function. This allows you to have multiple instantiations of a class with the same function name.

**class attributes**

- `self.coef_` and `self.intercept_` are both "attributes" (variables) that are connected to the instantiation of the class. When `self` becomes `slr`, for example, the `self` becomes `slr` and `self.coef_` becomes `slr.coef`.

In [4]:
# Inheritance example.

---

<a id='class-function'></a>
### 2) Adding a class function.

Just like with `__init__`, we can add functions to a class.

**Let's add a `fit()` method that will calculate the coefficients for a linear regression.**
- The function should have arguments `self`, `X`, and `y`.
- Use the linear algebra equations above to calculate the coefficients and intercept.
- Assign the coefficients to `self.coef_` and the intercept to `self.intercept_`.

In [5]:
# A:

Notice how we assigned `self.coef_` inside of the `fit()` function.

This will set the class attribute `self.coef_`, which can be accessed by _any other function in the class without passing it as an argument._

You can also access it after instantiating the class.

---

<a id='init-args'></a>
### 3) Assigning attributes during instantiation.

There's an issue here — we may pass an `X` matrix in without an intercept. 

**Add a keyword argument to the `__init__` function, which will specify whether or not the `X` matrix should have an  added intercept.**

In [6]:
# A:

**Now, if we instantiate the class, it will assign `fit_intercept` to the class attribute `fit_intercept`. Try it out:**

In [7]:
# A:

---

<a id='intercept-adder'></a>
### 4) Include a function to add an intercept to the `X` matrix if necessary.

This function will be called from inside the `fit` function and run conditionally on the value of `self.fit_intercept`.

In [8]:
# A:

---

<a id='instantiate'></a>
### 5) Instantiate the class.

At this point, we can try out our class. 

**Instantiate the class and try out the coefficient-fitting function on the housing data.**

In [9]:
# A:

As with scikit-learn's `LinearRegression` class, after fitting the model, we now have access to the assigned `coef_` and `intercept_` attributes.

---

<a id='predict'></a>
### 6) Add the `predict` function.

Let's add some more of the class methods that are in the real `LinearRegression` class.

**First, add the `predict` function. It will take a design matrix `X` and return predictions for those rows.**

In [10]:
# A:

**Test out the `predict` function.**

In [11]:
# A:

---

<a id='score'></a>
### 7) Add a `score` function.

This will calculate the $R^2$ of your model on a provided `X` and `y`.

> **Note:** You'll probably want to write a helper function to calculate the sum of the squared errors, as this will be run for both the baseline model and the regression model in order to calculate the $R^2$.

In [12]:
# A:

<a id='verify'></a>

## Verify Your Class Against the Scikit-Learn `LinearRegression` Implementation

---

Our class should return the same results for the $R^2$.

In [13]:
# A:

<a id='inspection'></a>

## Inspecting a Class

---

When we want to know more about a class object, we can use the "inspect" module. Specifically, the `inspect.getmembers()` function takes an instantiated class as an argument and returns it as an information dictionary.

This help us know which attributes and methods are available and, basically, the blueprint of a class object in memory. Depending on the way the class was implemented, you can usually find useful information hiding inside of `slr.__class__.__dict__` (which can be easier to interpret). However, the "right way" is to use the "inspect" module.

In [14]:
import inspect

In [15]:
# A:

<a id='special'></a>

## Some Special Class Methods

---

|Method| Description|
|--|--|
|\_\_init\_\_ ( self [,args...] )| Constructor (with any optional arguments). Sample call: `obj = className(args)`.
|\_\_del\_\_( self ) | Destructor; deletes an object. Sample call: `del obj`.
|\_\_repr\_\_( self ) | Evaluable string representation. Sample call: `repr(obj)`.
|\_\_str\_\_( self ) | Printable string representation. Sample call: `str(obj)`.
|\_\_cmp\_\_ ( self, x ) | Object comparison. Sample call: `cmp(obj, x)`.

The `__repr__` function reports back a description of what the class represents. You can basically do whatever you want with it, but its purpose is to convey something descriptive about your class.

The `__del__` method is the bookend function of `__init__`. You can use it to run code once your class has executed. Generally it works well, but in practice there are a few considerations to keep in mind. Read more about safely using Python destructors [here](http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python).