In this notebook first I'm using scikit-learn Regression model and then going to create my own linear regression class and will compare the the coeffients and intercept of both the models.

**Matrix Form (compact way to write it)**
$$𝑦 = 𝑋\beta$$

where:

y = vector of outputs (𝑛 × 1)

X = design matrix including all predictors (n * (p + 1))

β = vector of coefficients ((p + 1) * 1)

**Formula for coefficients**
$$ \beta = (X^TX)^{-1}X^Ty$$

This gives:
- $\beta_o$ = intercept
- $\beta_1, \beta_2,..., \beta_p$ = slope of coefficients

In [81]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

In [82]:
X, y = load_diabetes(return_X_y=True)

In [83]:
print(X.shape)
print(y.shape)

(442, 10)
(442,)


## Using scikit-learn's Linear Regression

In [84]:
from sklearn.linear_model import LinearRegression

In [85]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [86]:
print(X_train.shape)
print(X_test.shape)

(353, 10)
(89, 10)


In [87]:
reg = LinearRegression()

In [88]:

reg.fit(X_train,y_train)

In [89]:
y_pred = reg.predict(X_test)
y_pred

array([139.5475584 , 179.51720835, 134.03875572, 291.41702925,
       123.78965872,  92.1723465 , 258.23238899, 181.33732057,
        90.22411311, 108.63375858,  94.13865744, 168.43486358,
        53.5047888 , 206.63081659, 100.12925869, 130.66657085,
       219.53071499, 250.7803234 , 196.3688346 , 218.57511815,
       207.35050182,  88.48340941,  70.43285917, 188.95914235,
       154.8868162 , 159.36170122, 188.31263363, 180.39094033,
        47.99046561, 108.97453871, 174.77897633,  86.36406656,
       132.95761215, 184.53819483, 173.83220911, 190.35858492,
       124.4156176 , 119.65110656, 147.95168682,  59.05405241,
        71.62331856, 107.68284704, 165.45365458, 155.00975931,
       171.04799096,  61.45761356,  71.66672581, 114.96732206,
        51.57975523, 167.57599528, 152.52291955,  62.95568515,
       103.49741722, 109.20751489, 175.64118426, 154.60296242,
        94.41704366, 210.74209145, 120.2566205 ,  77.61585399,
       187.93203995, 206.49337474, 140.63167076, 105.59

In [90]:
print(reg.coef_)

[  37.90402135 -241.96436231  542.42875852  347.70384391 -931.48884588
  518.06227698  163.41998299  275.31790158  736.1988589    48.67065743]


In [91]:
print(reg.intercept_)

151.34560453985995


## Making own Linear Regression Class

In [92]:
class MeraLRM:

    def __init__(self):
        self.coef_ = None
        self.intercept_ = None

    def fit(self, X_train, y_train):
        X_train = np.insert(X_train, 0, 1, axis=1) # Inserting ones at the 0th index of X_train along vertical axis

        betas = np.linalg.inv(np.dot(X_train.T, X_train)).dot(X_train.T).dot(y_train) # Calculating the betas
        self.coef_ = betas[1:]
        self.intercept_ = betas[0]
        
    def predict(self, X_test):
        y_pred = np.dot(X_test, self.coef_) + self.intercept_
        return y_pred

In [93]:
mlrm = MeraLRM()

In [94]:
mlrm.fit(X_train, y_train)

In [95]:
y_pred = mlrm.predict(X_test)
y_pred

array([139.5475584 , 179.51720835, 134.03875572, 291.41702925,
       123.78965872,  92.1723465 , 258.23238899, 181.33732057,
        90.22411311, 108.63375858,  94.13865744, 168.43486358,
        53.5047888 , 206.63081659, 100.12925869, 130.66657085,
       219.53071499, 250.7803234 , 196.3688346 , 218.57511815,
       207.35050182,  88.48340941,  70.43285917, 188.95914235,
       154.8868162 , 159.36170122, 188.31263363, 180.39094033,
        47.99046561, 108.97453871, 174.77897633,  86.36406656,
       132.95761215, 184.53819483, 173.83220911, 190.35858492,
       124.4156176 , 119.65110656, 147.95168682,  59.05405241,
        71.62331856, 107.68284704, 165.45365458, 155.00975931,
       171.04799096,  61.45761356,  71.66672581, 114.96732206,
        51.57975523, 167.57599528, 152.52291955,  62.95568515,
       103.49741722, 109.20751489, 175.64118426, 154.60296242,
        94.41704366, 210.74209145, 120.2566205 ,  77.61585399,
       187.93203995, 206.49337474, 140.63167076, 105.59

In [96]:
mlrm.coef_

array([  37.90402135, -241.96436231,  542.42875852,  347.70384391,
       -931.48884588,  518.06227698,  163.41998299,  275.31790158,
        736.1988589 ,   48.67065743])

In [97]:
mlrm.intercept_

151.34560453986