# Coding Linear Regression from scratch 
## The OLS (Ordinary Least Square) Method:
In simple words the OLS method doesn't involves any approximation techniques to find the solution like differentiation instead the values are fitted in a determined formula to calculate the result. Note that the OLS doesn't involve differentiation in its practical application however the derivation of OLS estimator involves differentiation. For more details visit: https://en.wikipedia.org/wiki/Ordinary_least_squares

In [45]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Simple Linear Regression from scratch

In [46]:
class simple_LinearRegressor():
    def __init__ (self):
        self.m = None
        self.b = None 
    def fit(self,X_train,y_train):
        num = 0
        den = 0
        for i in range(X_train.shape[0]):
            num = num + ((X_train[i] - X_train.mean())*(y_train[i] - y_train.mean()))
            den = den + ((X_train[i] - X_train.mean())*(X_train[i] - X_train.mean()))
        self.m = num/den
        self.b = y_train.mean() - (self.m * X_train.mean())
    def predict(self,X_test):
        y_pred = self.m*X_test + self.b
        return y_pred

In [47]:
df = pd.read_csv('Datasets/placement.csv')
df.sample(5)

Unnamed: 0,cgpa,package
130,6.68,2.49
83,8.44,3.49
10,5.32,1.86
173,6.75,2.56
194,7.89,3.67


In [48]:
X = df.iloc[:,0].values
y = df.iloc[:,1].values
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)

In [49]:
sklearn_LR = LinearRegression()
sklearn_LR.fit(X_train,y_train)
pred = sklearn_LR.predict(X_test)

In [50]:
print(sklearn_LR.intercept_)
print(sklearn_LR.coef_)

-1.0270069374542108
[0.57425647]


In [51]:
our_SLR = simple_LinearRegressor()
our_SLR.fit(X_train,y_train)
pred = our_SLR.predict(X_test)

In [52]:
print(our_SLR.b)
print(our_SLR.m)

[-1.02700694]
[0.57425647]


When compared the results from both our implementation and scikit-learn's implementation are same having same intercept and coefficient values indicating that the scikit-learn also uses the OLS technique however we need to modify our code so that it can handle multiple linear regression like scikit-learn

# Multiple Linear Regression

In [53]:
class multiple_LinearRegressor():
    def __init__(self):
        self.m = None   #coefficient
        self.b = None   #intercept
    def fit(self,X_train,y_train):
        X_train = np.insert(X_train,0,1,axis=1) #add extra column for beta(zero)
        betas = np.linalg.inv(np.dot(X_train.T,X_train)).dot(np.dot(X_train.T,y_train))
        self.b = betas[0]
        self.m = betas[1:]
    def predict(self,X_test):
        pred = np.dot(X_test,self.m) + self.b
        return pred

In [54]:
from sklearn.datasets import load_diabetes

X,y = load_diabetes(return_X_y=True)

X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.2,random_state=42)

In [55]:
sklearn_LR = LinearRegression()
sklearn_LR.fit(X_train,y_train)
pred = sklearn_LR.predict(X_test)

In [56]:
print(sklearn_LR.intercept_)
print(sklearn_LR.coef_)

151.34560453985995
[  37.90402135 -241.96436231  542.42875852  347.70384391 -931.48884588
  518.06227698  163.41998299  275.31790158  736.1988589    48.67065743]


In [57]:
our_MLR = multiple_LinearRegressor()
our_MLR.fit(X_train,y_train)
pred = our_MLR.predict(X_test)

In [58]:
print(our_MLR.b)
print(our_MLR.m)

151.34560453985995
[  37.90402135 -241.96436231  542.42875852  347.70384391 -931.48884588
  518.06227698  163.41998299  275.31790158  736.1988589    48.67065743]
