In [1]:
import numpy as np
import pandas as pd
from numpy.linalg import pinv
from IPython.display import display, Math, Latex

## Linear Regression

In Statistics, Linear regression is a linear approach which models the relationship between a dependent variable (continuous) and one or more independent variables. Linear regression is also called as Least squares regression.

Let's plot the dependent variable, Y  and the independent variables, X1 and X2.

![multiple_linear.png](multiple_linear.png)

We can have multiple independent variables $x-{1}$, $x_{2}$,....., $x_{n}$.

Now, $f(x)$ = $b_{0}$ + $b_{1}x_{1}$ + $b_{2}x_{2}$ + ... + $b_{m}x_{m}$

$y_{0}$ = $b_{0}$ + $b_{1}x_{11}$ + $b_{2}x_{12}$ + ... + $b_{m}x_{1m}$

$y_{1}$ = $b_{0}$ + $b_{1}x_{21}$ + $b_{2}x_{22}$ + ... + $b_{m}x_{2m}$

$y_{2}$ = $b_{0}$ + $b_{1}x_{21}$ + $b_{2}x_{22}$ + ... + $b_{m}x_{2m}$

$y_{n+1}$ = $b_{0}$ + $b_{1}x_{m (n+1)}$ + $b_{2}x_{mn}$ + ... + $b_{m}x_{mn}$



We have many equations, for each of them we should find normal equations, instead we consider matrices which are used for computational efficiency. 

$ X = \left[ \begin{array}{cccc}
1 & x_{11} & x_{12} & \ldots & x_{1m} \\
1 & x_{21} & x_{22} & \ldots & x_{2m} \\
.  & .   &    .    &          &      . \\
.  & .   &    .   &           &     . \\
1 & x_{(n+1)m} & x_{(n+1)m} & \ldots & x_{(n+1)m} \\ \end{array} \right ] $

$Y = \left[ \begin{array}{cccc}
y_{0} \\
y_{1} \\
y_{2} \\
    . \\
    . \\
y_{(n+1)} \\ \end{array} \right]$

$B = \left[ \begin{array}{cccc}
b_{0} \\
b_{1} \\
b_{2} \\
    . \\
    . \\
b_{(n+1)} \\ \end{array} \right]$

$X$ = $ (n+1)$ x $m$  matrix , $Y$ = $(n+1)$ x $1$     matrix , $B$ = $(n+1)$ x$ 1 $   matrix 


$ Y = XB $

$ \hat{Y} = X \hat{B}$ ------------[1]

Our objective is to reduce the error function, which is as follows

$ e = (Y - \hat{Yi})^2 $

According to matrices, we know that $ A^2 = A^\top A $

$(Y - \hat{Yi})^2 = (Y - \hat{Yi})^\top (Y - \hat{Yi}) $

substitute [1] in above we get 

$ = (Y -  X \hat{B})^\top (Y - X \hat{B})$

$ = (Y^\top -   \hat{B}^\top X^\top ) (Y -  X \hat{B})) $

$ =  Y^\top Y - Y^\top X \hat{B} - \hat{B}^\top X^\top Y + \hat{B}^\top X^\top X \hat{B} $

Now we take the partial derivative of the above equation and equate it to 0

matrix differentiation, 

$ \frac{\partial}{\partial x} A = 0 $

$ \frac{\partial}{\partial x} Ax = A  $

$ \frac{\partial}{\partial x} xA = A^\top $

$  \frac{\partial}{\partial x} x^\top A x = 2 x^\top A $

We use the above differentiations in our equation

$ \frac{\partial}{\partial \hat{B}} ( Y^\top Y - Y^\top X \hat{B} - \hat{B}^\top X^\top Y + \hat{B}^\top X^\top X \hat{B})  = 0 $

$ 0 - Y^\top X - ( X^\top Y)^\top + 2 \hat{B}^\top X^\top X = 0 $

$ 2 \hat{B}^\top X^\top X =  Y^\top X + ( X^\top Y)^\top $ 

we know that $  ( X^\top Y)^\top =  Y^\top X  $

$ \hat{B}^\top X^\top X =  Y^\top X $

$ \hat{B}^\top = Y^\top X  (X^\top X)^ {-1}  $

$ \hat{B} = (X^\top X)^{ -1} X^\top Y $

We now got the $\hat{B} =  b_{0},b_{1},b_{2},....,b_{n+1} $


In [2]:
#Linear model function 
def lm(x,y):
    x = np.matrix(x)
    y = np.matrix(y)
    a =  pinv(np.transpose(x)*(x))*np.transpose(x)* np.transpose(y)
    return a # which gives us the slope values

In [3]:
#Predictions
def pred(model,test_data):
    test_data = np.matrix(test_data)
    z = test_data * lm(x,y)
    return z 

In [4]:
# Toy dataset
data = pd.read_csv('d.csv')

In [5]:
y = data.X7 # Target 
x = data.drop(['X7'],1) #Independent variables

#### Train and Test split :

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size = 0.2,random_state = 123)

In [8]:
a = lm(xtrain,ytrain) # Function call

In [9]:
z = pred(a,xtest) # prediction

#### Error Metric:

In [10]:
from sklearn import metrics

In [11]:
metrics.mean_squared_error(np.array(z),ytest)

8.143689135274693

In [12]:
from sklearn.linear_model import LinearRegression

#### Comparision with gradient descent function

In [13]:
model = LinearRegression()

In [14]:
model.fit(xtrain,ytrain) # training

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [15]:
a = model.predict(xtest) # prediction

This is because the independent variables are not standardized

In [16]:
metrics.mean_squared_error(a,ytest) # error

11.224834334203972