# Classes and Objects

In python, everything is an "object" of a specific type. This is the basis of what is known as object oriented programming.

A class is a type of object. You can think of a class definition as a sort of "blueprint" that specifies the construction of a new object when instantiated.

Knowing how to define and use classes is esential to programming python at an intermediate or advanced level. I will cover the basics here, which will help you understand how thinks like LinearRegression in sklearn work.



---

## Coding a simple version of `LinearRegression`

By now you are quite familiar with the `LinearRegression` class in sklearn. We will walk through the re-creation of this class (in the simplest possible sense).

---

### 1. The class definition

Below is the beginning of our class blueprint:

In [1]:
class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None

What are the components of this?

**`class`**

- The `class` is like `def`, but instead of defining a function it defines a class.

**`object``**

- `object` in the parentheses of the class definition indicate that this class "inherits" from the `object` class. The object class is a very general, very fundamental class in python. Inheritance means that whatever properties and function are part of the `object` class are passed down to our `SimpleLinearRegression` class.

**`def __init__(self)`**

- The `def __init__(self):` is our class's initialization function. This function is called when you instantiate the class by typing `SimpleLinearRegression()`

**`self`**

- `self`, the ever confusing first argument to class definitions, is a variable that refers to the **current instantiation of the class**. What does this mean? When you instantiate a class and assign it to a variable with `slr = SimpleLinearRegression()`, the `self` argument is now a reference to the current instantiation of the class `slr`. Now, when you use a function that is part of the class, it knows to use that specific object's function. This lets you have multiple instantiations of a class with the same function name.

**class attributes**

- `self.coef_` and `self.intercept_`, likewise, are "attributes" (variables) that are connected to the instantiation of the class. When self becomes `slr`, for example, the `self` becomes `slr` and `self.coef_` becomes `slr.coef`

---

### Adding a class function

Now, just like with `__init__`, we can add functions to the class.

Let's add a `calculate_betas()` method that will calculate the coefficients for a linear regression.

In [2]:
import numpy as np

class SimpleLinearRegression(object):
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None
        
    def fit(self, X, y):
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        self.coef_ = np.dot(XtX_inv_Xt, y)
        

Notice that we assigned `self.coef_` inside of the `calculate_betas()` function.

This will set the class attribute `self.coef_`, and this attribute can be accessed by _any other function in the class without passing it as an argument!_

It can also be accessed by you after instantiating the class.

---

### Assigning attributes during instantiation

There is an issue here - we are probably going to pass an `X` in without an intercept. We can actually have in arguments to the `__init__` function which will be used when the class is called:

In [10]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def fit(self, X, y):
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        self.coef_ = np.dot(XtX_inv_Xt, y)
        

Now, if we instantiate the class, it will assign `fit_intercept` to the class attribute `fit_intercept`, like so:

In [11]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit_intercept

True

In [12]:
slr = SimpleLinearRegression(fit_intercept=False)
slr.fit_intercept

False

Let's add a function that will be add the intercept to the X matrix, and call it during fit if necessary:

In [40]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]

---

### Trying out the class...

Let's instantiate the class and try out the beta fitting function. I'll load in the old housing data we used when making the linear regression the first time:

In [16]:
import pandas as pd

house = '/Users/kiefer/github-repos/DSI-SF-2/datasets/housing_data/housing-data.csv'
house = pd.read_csv(house).dropna()
house.head(2)

Unnamed: 0,sqft,bdrms,age,price
0,2104,3,70,399900
1,1600,3,28,329900


In [62]:
y = house.price.values
X = house[['sqft','bdrms','age']].values

In [41]:
slr = SimpleLinearRegression(fit_intercept=True)
print slr.fit_intercept
print slr.coef_
print slr.intercept_

True
None
None


In [42]:
slr.fit(X, y)

In [43]:
print slr.coef_
print slr.intercept_

[  139.33484671 -8621.47045953   -81.21787764]
92451.6278416


Like in the real `LinearRegression` class, we now have access to the assigned `coef_` and `intercept_` attributes after fitting the model.

---

### Adding more class methods

Let's add some more of the class methods that are in the real `LinearRegression` class.

First off, we can add the `predict` function.

In [56]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]
        
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        return np.dot(X, np.concatenate([[self.intercept_], self.coef_]))

In [58]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit(X,y)
y_hat = slr.predict(X)

In [59]:
print y.shape, y_hat.shape

(47,) (47,)


Next, lets add the `score` method:

In [77]:
class SimpleLinearRegression(object):
    
    def __init__(self, fit_intercept=True):
        self.coef_ = None
        self.intercept_ = None
        self.fit_intercept = fit_intercept
        
    def add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        X = np.concatenate([intercept, X], axis=1)
        return X
        
    def fit(self, X, y):
        
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # betas formula
        # betas = (X'X)^-1 X'Y
        
        XtX = np.dot(X.T, X)
        XtX_inv = np.linalg.inv(XtX)
        XtX_inv_Xt = np.dot(XtX_inv, X.T)
        betas = np.dot(XtX_inv_Xt, y)
        
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]
        
    def predict(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
            
        return np.dot(X, np.concatenate([[self.intercept_], self.coef_]))
    
    def _calculate_sse(self, y_true, y_hat):
        return np.sum((y_true - y_hat)**2)
        
    def _calculate_r2(self, sse_model, sse_baseline):
        return 1. - float(sse_model)/sse_baseline
    
    def score(self, X, y):
            
        baseline_sse = self._calculate_sse(y, np.tile(np.mean(y), len(y)))
        
        y_hat = self.predict(X)
        model_sse = self._calculate_sse(y, y_hat)
        
        return self._calculate_r2(model_sse, baseline_sse)
            
    

In [78]:
slr = SimpleLinearRegression(fit_intercept=True)
slr.fit(X,y)
r2 = slr.score(X, y)
print r2

0.733163999069


Check against sklearn's implementation:

In [79]:
from sklearn.linear_model import LinearRegression

In [80]:
lr = LinearRegression()
lr.fit(X,y)
lr.score(X,y)

0.73316399906900243