# Object-Oriented Programming: Coding a Linear Regression Class

---

### Learning Objectives

- Describe the fundamentals of object-oriented programming in Python
- Implement classes in Python 3.
- Apply object-oriented programming concepts to build a linear regression class by hand

### Lesson Guide

- [Review the Linear Algebra Derivation of Coefficients for MLR](#review-mlr)
- [Load the Simple Housing Data](#load-data)
- [Classes and Objects](#classes-objects)
- [Coding our Own `LinearRegression` Class](#coding-lr)
    - [Starting a Basic Python Class](#starting-class)
    - [Adding a Class Function](#class-function)
    - [Assigning Attributes During Instantiation](#init-args)
    - [Add Another Function to Add an Intercept](#intercept-adder)
    - [Instantiate the Class](#instantiate)
    - [Add a Predict Function](#predict)
    - [Add a Score Function](#score)
- [Verify Your Class Against the Scikit-Learn Implementation](#verify)
- [Inspecting a Class](#inspection)
- [Some Special Class Methods](#special)

### The "Least Squares" Solution to Linear Regression

With target vector $y$ and prediction matrix $X$, we can formulate a regression as:

### $$ y = \beta X + \epsilon $$

We can calculate our parameter $\beta$ for each feature of $X$, using the following form.

### $$ \beta = (X'X)^{-1}X'y$$

> **Linear Algebra Reference**
>
> The operations we will be performing to solve for $\beta$ include:
> - Dot Product
$$
A = (a_1, a_2, a_3) \\
B = (b_1, b_2, b_3) \\
A \cdot B = a_1 b_1 + a_2 b_2 + a_3 b_3
$$
> - Matrix Transpose
> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Matrix_transpose.gif/200px-Matrix_transpose.gif">
> - Inverse matrix: [Inverse Matrices (MIT)](https://math.mit.edu/~gs/linearalgebra/ila0205.pdf)


In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [2]:
a = np.array([2, 2, 2])
b = np.array([3, 4, 5])
np.dot(a, b)

24

In [3]:
a = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [4]:
b = np.array([
    [1, 2],
    [4, 5],
    [3, 6]
])

In [5]:
np.dot(a, b)

array([[18, 30],
       [42, 69]])

<a id='load-data'></a>

## Load the Simple Housing Data

---

This data set only has four columns. We can formulate simple regression problems with the data set to test our linear regression class down the line.

In [6]:
house = 'data/housing-data.csv'
house = pd.read_csv(house)
house.head()

Unnamed: 0,sqft,bdrms,age,price
0,2104,3,70,399900
1,1600,3,28,329900
2,2400,3,44,369000
3,1416,2,49,232000
4,3000,4,75,539900


In [7]:
X = house[['sqft', 'bdrms', 'age']]
y = house['price']

<a id='classes-objects'></a>

## Classes and Objects

---

In Python, everything is an "object" of some type. This is the basis of what is known as **object-oriented programming (OOP)**.

A *class* is a type of object. You can think of a class definition as a sort of blueprint that specifies the construction of a new object when instantiated.

> **Note:** Knowing how to define and use classes is essential for programming with Python at an intermediate or advanced level. We will cover the basics here, which will help you understand how concepts like `LinearRegression` in scikit-learn work.


## Why using OO Python:  Variable Scope
We will build a basic class now and simply take a high level look at an "instance" of class and examine how class variables work, how they are similar to regular varibles, but how they are different.

In [8]:
# Create a basic student class and init 2 students!

class Student:
    
    def __init__(self, name, course):
        self.name = name
        self.course = course
        
    def attend_class(self):
        print('Learned all the things!')

In [9]:
amy = Student('Amy', 'DSI')
jake = Student('Jake Peralta', 'Detectiv-ing 101')

jake.attend_class()

Learned all the things!


<a id='coding-lr'></a>

## Coding our Own Version of the Scikit-Learn `LinearRegression` Class

---

By now you're familiar with the `LinearRegression` class in scikit-learn. We'll walk through the re-creation of this class (albeit a simplified version).


<a id='starting-class'></a>
### 1) Starting a basic Python class.

Below is the beginning of our class blueprint:

In [10]:
def fit(self, X, y):
    pass

# fit()

In [11]:
class SimpleLinearRegression:
    
    def __init__(self):
        pass

    def fit(self, X, y):
        pass
    
    def predict(self, X):
        pass 
    
slr = SimpleLinearRegression()
slr.fit(X, y)
# slr.coef_

What are the components of the blueprint?

**`class`**

- `class` works like `def`, but instead of defining a function, it defines a class.

**`def __init__(self)`**

- `def __init__(self):` is our class' initialization function. This function is called when you instantiate the class by typing `SimpleLinearRegression()`.

**`self`**

- `self` is the first argument to class definitions. It's a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `slr = SimpleLinearRegression()`, the `self` argument becomes a reference to the current instantiation of the class `slr`. Now, when you use a function that is part of the class, it knows to use that specific object's function. This allows you to have multiple instantiations of a class with the same function name.

**class attributes**

- `self.coef_` and `self.intercept_` are both "attributes" (variables) that are connected to the instantiation of the class. When `self` becomes `slr`, for example, the `self` becomes `slr` and `self.coef_` becomes `slr.coef`.

---

<a id='class-function'></a>
### 2) Adding a class function.

Just like with `__init__`, we can add functions to a class.

**Let's add a `fit()` method that will calculate the coefficients for a linear regression.**
- The function should have arguments `self`, `X`, and `y`.
- Use the linear algebra equations above to calculate the coefficients and intercept.
- Assign the coefficients to `self.coef_` and the intercept to `self.intercept_`.

In [12]:
house.head()

Unnamed: 0,sqft,bdrms,age,price
0,2104,3,70,399900
1,1600,3,28,329900
2,2400,3,44,369000
3,1416,2,49,232000
4,3000,4,75,539900


In [13]:
sqft = 2400
bdrms = 3
age = 44
sqft*140 + bdrms*14441 + age*220
# print(y2)

389003

In [14]:
def fit(X, y):
    step1 = np.dot(X.T, X)
    step2 = np.linalg.inv(step1)
    step3 = np.dot(step2, X.T)
    return np.dot(step3, y)

coef_ = fit(X, y)
coef_

array([  140.38185917, 14441.26620439,   220.22219895])

In [15]:
Xcopy = X.copy()
Xcopy['intercept'] = 1
Xcopy

# y = b0*x0 + b1*x1 + b2*x2

Unnamed: 0,sqft,bdrms,age,intercept
0,2104,3,70,1
1,1600,3,28,1
2,2400,3,44,1
3,1416,2,49,1
4,3000,4,75,1
5,1985,4,61,1
6,1534,3,12,1
7,1427,3,57,1
8,1380,3,14,1
9,1494,3,15,1


In [16]:
class SimpleLinearRegression:
    
    def __init__(self):
        pass
    
    def fit(self, X, y):
        step1 = np.dot(X.T, X)
        step2 = np.linalg.inv(step1)
        step3 = np.dot(step2, X.T)
        self.coef_ = np.dot(step3, y)
        return self

### $$ \beta = (X'X)^{-1}X'y$$

In [17]:
model = SimpleLinearRegression()

model.coef_ # doesn't work... on purpose

AttributeError: 'SimpleLinearRegression' object has no attribute 'coef_'

In [18]:
model.fit(X, y)

<__main__.SimpleLinearRegression at 0x1a1b4eba58>

In [19]:
model.coef_

array([  140.38185917, 14441.26620439,   220.22219895])

Notice how we assigned `self.coef_` inside of the `fit()` function.

This will set the class attribute `self.coef_`, which can be accessed by _any other function in the class without passing it as an argument._

You can also access it after instantiating the class.

In [20]:
intercept = np.ones((X.shape[0], 1))

np.set_printoptions(suppress=True)

np.concatenate([intercept, X], axis = 1)

array([[   1., 2104.,    3.,   70.],
       [   1., 1600.,    3.,   28.],
       [   1., 2400.,    3.,   44.],
       [   1., 1416.,    2.,   49.],
       [   1., 3000.,    4.,   75.],
       [   1., 1985.,    4.,   61.],
       [   1., 1534.,    3.,   12.],
       [   1., 1427.,    3.,   57.],
       [   1., 1380.,    3.,   14.],
       [   1., 1494.,    3.,   15.],
       [   1., 1940.,    4.,    7.],
       [   1., 2000.,    3.,   27.],
       [   1., 1890.,    3.,   45.],
       [   1., 4478.,    5.,   49.],
       [   1., 1268.,    3.,   58.],
       [   1., 2300.,    4.,   77.],
       [   1., 1320.,    2.,   62.],
       [   1., 1236.,    3.,   78.],
       [   1., 2609.,    4.,    5.],
       [   1., 3031.,    4.,   21.],
       [   1., 1767.,    3.,   44.],
       [   1., 1888.,    2.,   79.],
       [   1., 1604.,    3.,   13.],
       [   1., 1962.,    4.,   53.],
       [   1., 3890.,    3.,   36.],
       [   1., 1100.,    3.,   60.],
       [   1., 1458.,    3.,   29.],
 

---

<a id='init-args'></a>
### 3) Assigning attributes during instantiation.

There's an issue here — we may pass an `X` matrix in without an intercept. 

**Add a keyword argument to the `__init__` function, which will specify whether or not the `X` matrix should have an  added intercept.**

In [21]:
class SimpleLinearRegression:
    
    def __init__(self, fit_intercept=True):
        self.fit_intercept = fit_intercept
    
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate([intercept, X], axis = 1)
    
    def fit(self, X, y):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        step1 = np.dot(X.T, X)
        step2 = np.linalg.inv(step1)
        step3 = np.dot(step2, X.T)
        step4 = np.dot(step3, y)
        
        if self.fit_intercept:
            self.intercept_ = step4[0]
            self.coef_ = step4[1:]
        else:
            self.coef_ = step4 

In [22]:
model = SimpleLinearRegression(fit_intercept=True)
model.fit(X, y)
model.coef_
model.intercept_

92451.62784164713

In [23]:
sqft = 2400
bdrms = 3
age = 44

sqft*140 + bdrms*14441 + age*220 + 0

389003

In [24]:
sqft*139 + bdrms*-8621 + age*-81 + 92451

396624

**Now, if we instantiate the class, it will assign `fit_intercept` to the class attribute `fit_intercept`. Try it out:**

In [25]:
# A: See above

---

<a id='intercept-adder'></a>
### 4) Include a function to add an intercept to the `X` matrix if necessary.

This function will be called from inside the `fit` function and run conditionally on the value of `self.fit_intercept`.

In [26]:
# A: See ABove

---

<a id='instantiate'></a>
### 5) Instantiate the class.

At this point, we can try out our class. 

**Instantiate the class and try out the coefficient-fitting function on the housing data.**

In [27]:
# A:
slr = SimpleLinearRegression(fit_intercept = True)
slr.fit(X, y)
slr.coef_

array([  139.33484671, -8621.47045953,   -81.21787764])

As with scikit-learn's `LinearRegression` class, after fitting the model, we now have access to the assigned `coef_` and `intercept_` attributes.

---

<a id='predict'></a>
### 6) Add the `predict` function.

Let's add some more of the class methods that are in the real `LinearRegression` class.

**First, add the `predict` function. It will take a design matrix `X` and return predictions for those rows.**

In [28]:
coef_ = np.array([  139.33484671, -8621.47045953,   -81.21787764])
Xnew = np.array([560, 0.75, 11])

np.dot(coef_, Xnew)

# sqft*140 + bdrms*14441 + age*220 + 0

def predict(X):
    if self.fit_intercept:
        X = self.add_intercept(X)
        i = np.concatenate([[self.intercept_], self.coef_])
        return np.dot(X, i)
    else:
        return np.dot(X, self.coef_)

In [29]:
class SimpleLinearRegression:
    
    def __init__(self, fit_intercept=True):
        self.fit_intercept = fit_intercept
    
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate([intercept, X], axis = 1)
    
    def fit(self, X, y):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        step1 = np.dot(X.T, X)
        step2 = np.linalg.inv(step1)
        step3 = np.dot(step2, X.T)
        step4 = np.dot(step3, y)
        
        if self.fit_intercept:
            self.intercept_ = step4[0]
            self.coef_ = step4[1:]
        else:
            self.coef_ = step4 
    
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            i = np.concatenate([[self.intercept_], self.coef_])
            return np.dot(X, i)
        else:
            return np.dot(X, self.coef_)
        

**Test out the `predict` function.**

In [30]:
slr = SimpleLinearRegression()
slr.fit(X, y)
slr.coef_

array([  139.33484671, -8621.47045953,   -81.21787764])

In [31]:
X.iloc[0:1].shape

(1, 3)

In [32]:
Xmax = np.array([560, 0.75, 11])
Xmax.reshape(1, -1)

array([[560.  ,   0.75,  11.  ]])

In [33]:
slr.predict(Xmax.reshape(1, -1))

array([163119.64250209])

In [34]:
X.head(5).shape

(5, 3)

In [35]:
np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

In [36]:
slr.predict(X.iloc[0].values.reshape(1, -1))

array([354062.48251176])

---

<a id='score'></a>
### 7) Add a `score` function.

This will calculate the $R^2$ of your model on a provided `X` and `y`.

> **Note:** You'll probably want to write a helper function to calculate the sum of the squared errors, as this will be run for both the baseline model and the regression model in order to calculate the $R^2$.

In [37]:
from sklearn.metrics import r2_score

class SimpleLinearRegression:
    
    def __init__(self, fit_intercept=True):
        self.fit_intercept = fit_intercept
    
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate([intercept, X], axis = 1)
    
    def fit(self, X, y):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        step1 = np.dot(X.T, X)
        step2 = np.linalg.inv(step1)
        step3 = np.dot(step2, X.T)
        step4 = np.dot(step3, y)
        
        if self.fit_intercept:
            self.intercept_ = step4[0]
            self.coef_ = step4[1:]
        else:
            self.coef_ = step4 
    
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            i = np.concatenate([[self.intercept_], self.coef_])
            return np.dot(X, i)
        else:
            return np.dot(X, self.coef_)
        
    def score(self, X, y):
        y_hat = self.predict(X)
        return r2_score(y, y_hat)

In [38]:
model = SimpleLinearRegression()
model.fit(X, y)
# model.predict(X)
model.score(X, y)

0.7331639990690024

<a id='verify'></a>

## Verify Your Class Against the Scikit-Learn `LinearRegression` Implementation

---

Our class should return the same results for the $R^2$.

In [39]:
our_model = SimpleLinearRegression(fit_intercept=True)
our_model.fit(X, y)
our_model.coef_

array([  139.33484671, -8621.47045953,   -81.21787764])

In [40]:
# A:

# Our comparison model from SKlearn
from sklearn.linear_model import LinearRegression

linear = LinearRegression()
linear.fit(X, y)
linear.coef_

array([  139.33484671, -8621.47045953,   -81.21787764])

<a id='inspection'></a>

## Inspecting a Class

---

When we want to know more about a class object, we can use the "inspect" module. Specifically, the `inspect.getmembers()` function takes an instantiated class as an argument and returns it as an information dictionary.

This help us know which attributes and methods are available and, basically, the blueprint of a class object in memory. Depending on the way the class was implemented, you can usually find useful information hiding inside of `slr.__class__.__dict__` (which can be easier to interpret). However, the "right way" is to use the "inspect" module.

In [41]:
import inspect

In [42]:
slr = SimpleLinearRegression(fit_intercept = True)
dir(slr)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'add_intercept',
 'fit',
 'fit_intercept',
 'predict',
 'score']

<a id='special'></a>

## Some Special Class Methods

---

|Method| Description|
|--|--|
|\_\_init\_\_ ( self [,args...] )| Constructor (with any optional arguments). Sample call: `obj = className(args)`.
|\_\_del\_\_( self ) | Destructor; deletes an object. Sample call: `del obj`.
|\_\_repr\_\_( self ) | Evaluable string representation. Sample call: `repr(obj)`.
|\_\_str\_\_( self ) | Printable string representation. Sample call: `str(obj)`.
|\_\_cmp\_\_ ( self, x ) | Object comparison. Sample call: `cmp(obj, x)`.

The `__repr__` function reports back a description of what the class represents. You can basically do whatever you want with it, but its purpose is to convey something descriptive about your class.

The `__del__` method is the bookend function of `__init__`. You can use it to run code once your class has executed. Generally it works well, but in practice there are a few considerations to keep in mind. Read more about safely using Python destructors [here](http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python).