# Linear Regression From Scratch

- We use following formulaes to calculate coeffcients and intercept
  - Y^  = $\beta$0 + X1($\beta$1) + X2($\beta$2) + ...... + Xn($\beta$n)
  - beta($\beta$) = (X^T . X)^(-1) . X^T . Y


   
- We add a extra column consisting of '1' for $\beta$0 

In [17]:
class LinearR:
    
    def __init__(self):
        self.coef_ = None
        self.intercept_ = None
        
    def fit(self,X_train,y_train):
        X_train = np.insert(X_train,0,1,axis=1)
        
        # calcuate the coeffs
        betas = np.linalg.inv(np.dot(X_train.T,X_train)).dot(X_train.T).dot(y_train)
        self.intercept_ = betas[0]
        self.coef_ = betas[1:]
    
    def predict(self,X_test):
        np.insert(X_test,[0],[1],axis=1)
        y_pred = np.dot(X_test,self.coef_) + self.intercept_
        return y_pred
        
    

## Testing our code using Iris dataset

Importing required Libraries

In [18]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

#### Import Iris DataSet

In [19]:
df = pd.read_csv('iris_csv.csv')
df.head()

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


#### Label Encoding the class names 

In [20]:
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.
df['class']= label_encoder.fit_transform(df['class'])

In [21]:
## Seperating data as independent variables(features) and response variables

X = df.iloc[:,0:4].values
y = df.iloc[:,4].values

#### Dividing Data into Train data and Test data

In [22]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

#### Creating object 'lr' for our class 'LinearR'

In [23]:
lr = LinearR()

In [24]:
lr.fit(X_train,y_train)

In [25]:
y_pred = lr.predict(X_test)

#### Calculating R2 Score

In [26]:
from sklearn.metrics import r2_score
print("R2 Score: ",r2_score(y_test,y_pred))

0.9370544014980843

In [27]:
print(lr.coef_)

array([-0.10448422, -0.06030979,  0.19942356,  0.65858979])

In [None]:
print("LR Intercept: ",lr.intercept_)

## Testing our code against the LinearRegression class of sklearn

Importing class LinearRegression from submodule 'linear_model' of sklearn  library.

In [24]:
from sklearn.linear_model import LinearRegression

In [25]:
reg = LinearRegression()
reg.fit(X_train,y_train)

In [26]:
y_pred = reg.predict(X_test)

In [27]:
from sklearn.metrics import r2_score

In [28]:
print(r2_score(y_test,y_pred))

0.9370544014980828

In [29]:
print(reg.coef_)

array([-0.10448422, -0.06030979,  0.19942356,  0.65858979])

In [22]:
print(reg.intercept_)

0.26137221515508624