
# Object-Oriented Programming: Coding a Linear Regression Class


---

### Learning Objectives

- Describe the fundamentals of object-oriented programming in Python
- Implement classes in Python 3.
- Apply object-oriented programming concepts to build a linear regression class by hand

### Lesson Guide

- [Review the Linear Algebra Derivation of Coefficients for MLR](#review-mlr)
- [Load the Simple Housing Data](#load-data)
- [Classes and Objects](#classes-objects)
- [Coding our Own `LinearRegression` Class](#coding-lr)
    - [Starting a Basic Python Class](#starting-class)
    - [Adding a Class Function](#class-function)
    - [Assigning Attributes During Instantiation](#init-args)
    - [Add Another Function to Add an Intercept](#intercept-adder)
    - [Instantiate the Class](#instantiate)
    - [Add a Predict Function](#predict)
    - [Add a Score Function](#score)
- [Verify Your Class Against the Scikit-Learn Implementation](#verify)
- [Inspecting a Class](#inspection)
- [Some Special Class Methods](#special)

<a id='review-mlr'></a>

> For a more in-depth review of the matrix form of OLS, see the [detailed review notebook](./ols_linear_algebra_review.ipynb) in this repo.

---

### The "Least Squares" Solution to Linear Regression

With target vector $y$ and prediction matrix $X$, we can formulate a regression as:

### $$ y = \beta X + \epsilon $$

We can calculate our parameter $\beta$ for each feature of $X$, using the following form.

### $$ \beta = (X'X)^{-1}X'y$$

> **Linear Algebra Reference**
>
> The operations we will be performing to solve for $\beta$ include:
> - Dot Product
$$
A = (a_1, a_2, a_3) \\
B = (a_1, a_2, a_3) \\
A \cdot B = a_1 b_1 + a_2 b_2 + a_3 b_3
$$
> - Matrix Transpose
> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Matrix_transpose.gif/200px-Matrix_transpose.gif">
> - Inverse matrix: [Inverse Matrices (MIT)](https://math.mit.edu/~gs/linearalgebra/ila0205.pdf)


In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

examples.directory is deprecated; in the future, examples will be found relative to the 'datapath' directory.
  "found relative to the 'datapath' directory.".format(key))


In [2]:
house = './datasets/housing-data.csv'
house = pd.read_csv(house)

X = house[['sqft', 'bdrms', 'age']]
y = house['price']

house.describe().T

# np.ones((47, 1))

# np.ones((X.shape[0], 1))

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sqft,47.0,2000.680851,794.702354,852.0,1432.0,1888.0,2269.0,4478.0
bdrms,47.0,3.170213,0.760982,1.0,3.0,3.0,4.0,5.0
age,47.0,42.744681,22.87344,5.0,24.5,44.0,61.5,79.0
price,47.0,340412.659574,125039.899586,169900.0,249900.0,299900.0,384450.0,699900.0


In [8]:
from sklearn.linear_model import LinearRegression

In [9]:
lr = LinearRegression()

In [30]:
class SimpleLinearRegression:
    
    coef_          = None
    intercept_     = None
    fit_intercept  = False
    
    if fit_intercept:
            self.fit_intercept = fit_intercept
    
    def __init__(self, fit_intercept = True):
        print("AHoy there matey..")
        
    def add_intercept(X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate([intercept, X], axis = 1)
    
    
    def fit(self, X, y):
        # print("X shape:", X.shape, "y shape: ", y.shape)
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # Beta formula
        # (X'X)^-1X'y
        xt_x        = np.dot(X.T, X)         # (X'X)
        xt_x_inv    = np.linalg.inv(xt_x)    # (X'X)^-1
        xt_x_inv_xt = np.dot(xt_x_inv, X.T)  # (X'X)^-1X'
        
        self.coef_  = np.dot(xt_x_inv_xt, y) # (X'X)^-1X'y
        
    def predict(self, X):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        return np.dot(X, self.coef_)

In [25]:
slr = SimpleLinearRegression()

AHoy there matey..


In [26]:
slr.fit(X, y)

TypeError: fit() takes 2 positional arguments but 3 were given

In [None]:
        
        
        if fit_intercept:
            self.fit_intercept = fit_intercept

In [28]:

    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate([intercept, X], axis = 1)
        
    def fit(self, X, y):
        # print("X shape:", X.shape, "y shape: ", y.shape)
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # Beta formula
        # (X'X)^-1X'y
        xt_x        = np.dot(X.T, X)         # (X'X)
        xt_x_inv    = np.linalg.inv(xt_x)    # (X'X)^-1
        xt_x_inv_xt = np.dot(xt_x_inv, X.T)  # (X'X)^-1X'
        
        self.coef_  = np.dot(xt_x_inv_xt, y) # (X'X)^-1X'y
    
    def predict(self, X):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        return np.dot(X, self.coef_)

In [31]:
slr = SimpleLinearRegression(fit_intercept = True)
slr.fit(X, y)

AHoy there matey..


In [32]:
house['slr_yhat'] = slr.predict(X)
house.head()

X.columns

Index(['sqft', 'bdrms', 'age'], dtype='object')

In [33]:
house.head()

Unnamed: 0,sqft,bdrms,age,price,slr_yhat,sklearn_yhat
0,2104,3,70,399900,354102.784233,354062.482512
1,1600,3,28,329900,274100.994856,287248.87063
2,2400,3,44,369000,389930.037375,397417.261957
3,1416,2,49,232000,238454.132742,268527.153863
4,3000,4,75,539900,495427.307249,469878.945319


In [7]:
from sklearn.linear_model import LinearRegression

linear = LinearRegression()
model = linear.fit(X, y)
model.coef_

pd.Series(slr.coef_)

house['sklearn_yhat'] = model.predict(X)
house.head()

Unnamed: 0,sqft,bdrms,age,price,slr_yhat,sklearn_yhat
0,2104,3,70,399900,354062.482512,354062.482512
1,1600,3,28,329900,287248.87063,287248.87063
2,2400,3,44,369000,397417.261957,397417.261957
3,1416,2,49,232000,268527.153863,268527.153863
4,3000,4,75,539900,469878.945319,469878.945319


In [35]:
??LinearRegression

<a id='load-data'></a>

## Load the Simple Housing Data

---

This data set only has four columns. We can formulate simple regression problems with the data set to test our linear regression class down the line.

<a id='classes-objects'></a>

## Classes and Objects

---

In Python, everything is an "object" of some type. This is the basis of what is known as **object-oriented programming (OOP)**.

A *class* is a type of object. You can think of a class definition as a sort of blueprint that specifies the construction of a new object when instantiated.

> **Note:** Knowing how to define and use classes is essential for programming with Python at an intermediate or advanced level. We will cover the basics here, which will help you understand how concepts like `LinearRegression` in scikit-learn work.


## Why using OO Python:  Variable Scope
We will build a basic class now and simply take a high level look at an "instance" of class and examine how class variables work, how they are similar to regular varibles, but how they are different.

In [40]:
# Create a basic student class and init 2 students!

<a id='coding-lr'></a>

## Coding our Own Version of the Scikit-Learn `LinearRegression` Class

---

By now you're familiar with the `LinearRegression` class in scikit-learn. We'll walk through the re-creation of this class (albeit a simplified version).


<a id='starting-class'></a>
### 1) Starting a basic Python class.

Below is the beginning of our class blueprint:

In [3]:
# A:

What are the components of the blueprint?

**`class`**

- `class` works like `def`, but instead of defining a function, it defines a class.

**`def __init__(self)`**

- `def __init__(self):` is our class' initialization function. This function is called when you instantiate the class by typing `SimpleLinearRegression()`.

**`self`**

- `self` is the first argument to class definitions. It's a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `slr = SimpleLinearRegression()`, the `self` argument becomes a reference to the current instantiation of the class `slr`. Now, when you use a function that is part of the class, it knows to use that specific object's function. This allows you to have multiple instantiations of a class with the same function name.

**class attributes**

- `self.coef_` and `self.intercept_` are both "attributes" (variables) that are connected to the instantiation of the class. When `self` becomes `slr`, for example, the `self` becomes `slr` and `self.coef_` becomes `slr.coef`.

---

<a id='class-function'></a>
### 2) Adding a class function.

Just like with `__init__`, we can add functions to a class.

**Let's add a `fit()` method that will calculate the coefficients for a linear regression.**
- The function should have arguments `self`, `X`, and `y`.
- Use the linear algebra equations above to calculate the coefficients and intercept.
- Assign the coefficients to `self.coef_` and the intercept to `self.intercept_`.

In [39]:
# A:

Notice how we assigned `self.coef_` inside of the `fit()` function.

This will set the class attribute `self.coef_`, which can be accessed by _any other function in the class without passing it as an argument._

You can also access it after instantiating the class.

---

<a id='init-args'></a>
### 3) Assigning attributes during instantiation.

There's an issue here — we may pass an `X` matrix in without an intercept. 

**Add a keyword argument to the `__init__` function, which will specify whether or not the `X` matrix should have an  added intercept.**

In [6]:
# A:

**Now, if we instantiate the class, it will assign `fit_intercept` to the class attribute `fit_intercept`. Try it out:**

In [7]:
# A:

---

<a id='intercept-adder'></a>
### 4) Include a function to add an intercept to the `X` matrix if necessary.

This function will be called from inside the `fit` function and run conditionally on the value of `self.fit_intercept`.

In [8]:
# A:

---

<a id='instantiate'></a>
### 5) Instantiate the class.

At this point, we can try out our class. 

**Instantiate the class and try out the coefficient-fitting function on the housing data.**

In [9]:
# A:

As with scikit-learn's `LinearRegression` class, after fitting the model, we now have access to the assigned `coef_` and `intercept_` attributes.

---

<a id='predict'></a>
### 6) Add the `predict` function.

Let's add some more of the class methods that are in the real `LinearRegression` class.

**First, add the `predict` function. It will take a design matrix `X` and return predictions for those rows.**

In [10]:
# A:

**Test out the `predict` function.**

In [11]:
# A:

---

<a id='score'></a>
### 7) Add a `score` function.

This will calculate the $R^2$ of your model on a provided `X` and `y`.

> **Note:** You'll probably want to write a helper function to calculate the sum of the squared errors, as this will be run for both the baseline model and the regression model in order to calculate the $R^2$.

In [12]:
# A:

<a id='verify'></a>

## Verify Your Class Against the Scikit-Learn `LinearRegression` Implementation

---

Our class should return the same results for the $R^2$.

In [13]:
# A:

# Our comparison model from SKlearn
from sklearn.linear_model import LinearRegression

linear = LinearRegression()
model = linear.fit(X, y)
model.coef_

<a id='inspection'></a>

## Inspecting a Class

---

When we want to know more about a class object, we can use the "inspect" module. Specifically, the `inspect.getmembers()` function takes an instantiated class as an argument and returns it as an information dictionary.

This help us know which attributes and methods are available and, basically, the blueprint of a class object in memory. Depending on the way the class was implemented, you can usually find useful information hiding inside of `slr.__class__.__dict__` (which can be easier to interpret). However, the "right way" is to use the "inspect" module.

In [14]:
import inspect

In [15]:
# A:

<a id='special'></a>

## Some Special Class Methods

---

|Method| Description|
|--|--|
|\_\_init\_\_ ( self [,args...] )| Constructor (with any optional arguments). Sample call: `obj = className(args)`.
|\_\_del\_\_( self ) | Destructor; deletes an object. Sample call: `del obj`.
|\_\_repr\_\_( self ) | Evaluable string representation. Sample call: `repr(obj)`.
|\_\_str\_\_( self ) | Printable string representation. Sample call: `str(obj)`.
|\_\_cmp\_\_ ( self, x ) | Object comparison. Sample call: `cmp(obj, x)`.

The `__repr__` function reports back a description of what the class represents. You can basically do whatever you want with it, but its purpose is to convey something descriptive about your class.

The `__del__` method is the bookend function of `__init__`. You can use it to run code once your class has executed. Generally it works well, but in practice there are a few considerations to keep in mind. Read more about safely using Python destructors [here](http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python).