# Algo1 Multiple Linear Regression P3

### Multiple Linear Regression Using Ordinary Least Square (OLS)

  <center>
Formula for finding beta Matrix:
</center> 

$$
\beta = \left( X^{\top} X \right)^{-1} X^{\top} Y
$$


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
from sklearn.datasets import load_diabetes

In [2]:
X,y = load_diabetes(return_X_y=True)

In [3]:
print(X.shape)
print(y.shape)

(442, 10)
(442,)


In [4]:
# spliting the data to train test 
Xtrain,Xtest,ytrain,ytest = train_test_split(X,y,test_size = 0.2,random_state = 2)

In [6]:
print(Xtrain.shape)   # 353 rows are in train dataset
print(Xtest.shape)    # 89 rows goes to test data set . The number of cols must remain the same. 

(353, 10)
(89, 10)


#### First We have done with sklearn Linear Regression

In [7]:
skLr = LinearRegression()

In [8]:
skLr.fit(Xtrain,ytrain)

In [9]:
skprediction = skLr.predict(Xtest)

In [10]:
r2Score = r2_score(ytest,skprediction)
r2Score

0.4399338661568968

In [11]:
skLr.intercept_

np.float64(151.88331005254167)

In [12]:
skLr.coef_

array([  -9.15865318, -205.45432163,  516.69374454,  340.61999905,
       -895.5520019 ,  561.22067904,  153.89310954,  126.73139688,
        861.12700152,   52.42112238])

### MY Own Multiple Linear Regression Using OLS

In [13]:
class Multiple_Linear_Regression:
    def __init__(self):
        self.coeficient_ = None
        self.intercept_ = None
        
    def fit(self,Xtrain,ytrain):
        # First we need to insert a new column of 1 in the train 
        Xtrain = np.insert(Xtrain,0,1,axis = 1)
        # Beta = np.linalg.pinv(Xtrain.T @ Xtrain) @ Xtrain.T @ ytrain               # WE can use this formula as well for betas calculation
        Betas = np.linalg.inv(np.dot(Xtrain.T,Xtrain)) .dot(Xtrain.T) .dot(ytrain)
        print(Betas)
        self.intercept_ = Betas[0]
        self.coeficient_ = Betas[1:]
        
    def predict(self,Xtest):
        return self.intercept_ + np.dot(Xtest,self.coeficient_)

In [14]:
mlr = Multiple_Linear_Regression()

In [15]:
mlr.fit(Xtrain,ytrain)

[ 151.88331005   -9.15865318 -205.45432163  516.69374454  340.61999905
 -895.5520019   561.22067904  153.89310954  126.73139688  861.12700152
   52.42112238]


In [28]:
prediction = mlr.predict(Xtest)


In [20]:
df = pd.DataFrame(X)
df.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593


In [86]:
r2Score = r2_score(ytest,prediction)
r2Score

0.43993386615689756

In [87]:
mlr.intercept_

np.float64(151.8833100525417)

In [88]:
mlr.coeficient_

array([  -9.15865318, -205.45432163,  516.69374454,  340.61999905,
       -895.5520019 ,  561.22067904,  153.89310954,  126.73139688,
        861.12700152,   52.42112238])

#### Conclusion: Both the Values are exactely same