<a href="https://colab.research.google.com/github/Blackman9t/Machine_Learning/blob/master/coefficients_of_multiple_linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### <font color='khaki'>Getting to understand Multiple Linear Regression better by walking through it's internal workings and computation.

[stattrek.com](https://stattrek.com/multiple-regression/regression-coefficients.aspx?Tutorial=reg)

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Defining the scores table  like on the website

In [0]:
student_scores = pd.DataFrame({'Score':[100, 90, 80, 70, 60], 'IQ':[110, 120, 100, 90, 80,], 'Study_hours':[40, 30, 20, 0, 10]})

student_scores.index.name='Student'

student_scores.head()

Unnamed: 0_level_0,Score,IQ,Study_hours
Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,100,110,40
1,90,120,30
2,80,100,20
3,70,90,0
4,60,80,10


## Defining Matrix $X$

In [0]:
X = student_scores.loc[:, ['IQ',"Study_hours"]]

ones = np.ones(len(student_scores), dtype=int)

X.insert(0, 'ones', ones)

X.head()

Unnamed: 0_level_0,ones,IQ,Study_hours
Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1,110,40
1,1,120,30
2,1,100,20
3,1,90,0
4,1,80,10


In [0]:
# Next we convert X to a Numpy array from a Data frame

X = X.values

X

array([[  1, 110,  40],
       [  1, 120,  30],
       [  1, 100,  20],
       [  1,  90,   0],
       [  1,  80,  10]])

## Defining $X'$ or X Transpose

In [0]:
X_tp = X.transpose()

X_tp

array([[  1,   1,   1,   1,   1],
       [110, 120, 100,  90,  80],
       [ 40,  30,  20,   0,  10]])

#### Print out the shape of X and X'.

In [0]:
print('X shape is',X.shape,'\nX-tanspose shape is',X_tp.shape)

X shape is (5, 3) 
X-tanspose shape is (3, 5)


## Premultiply $X'$ by $X$... using matrix multiplication

In [0]:
X_tp_X = np.matmul(X_tp, X)

X_tp_X

array([[    5,   500,   100],
       [  500, 51000, 10800],
       [  100, 10800,  3000]])

Notice,  that so far we are getting exactly the same result as on the website [stattrek.com](https://stattrek.com/multiple-regression/regression-coefficients.aspx?Tutorial=reg)

## Next we define the inverse of $X'X$, just like in the website

In [0]:
from numpy.linalg import inv

inverse = inv(X_tp_X)

inverse

array([[ 2.02000000e+01, -2.33333333e-01,  1.66666667e-01],
       [-2.33333333e-01,  2.77777778e-03, -2.22222222e-03],
       [ 1.66666667e-01, -2.22222222e-03,  2.77777778e-03]])

In [0]:
# checking to make sure the same values as the website

print(inverse[0,:])
print(inverse[1,:])
print(inverse[2,:])

[20.2        -0.23333333  0.16666667]
[-0.23333333  0.00277778 -0.00222222]
[ 0.16666667 -0.00222222  0.00277778]


everything is perfect up to this point

In [0]:
X_tp.shape

(3, 5)

In [0]:
inverse.shape

(3, 3)

## Next we pre-multiply the inverse of $X'X$ by $X'$ 

In [0]:
X_tp_inverse = np.matmul(inverse, X_tp)

X_tp_inverse

array([[ 1.20000000e+00, -2.80000000e+00,  2.00000000e-01,
        -8.00000000e-01,  3.20000000e+00],
       [-1.66666667e-02,  3.33333333e-02,  1.73472348e-17,
         1.66666667e-02, -3.33333333e-02],
       [ 3.33333333e-02, -1.66666667e-02, -1.73472348e-17,
        -3.33333333e-02,  1.66666667e-02]])

In [0]:
X_tp_inverse.shape

(3, 5)

## Next we define the $Y$ variable

In [0]:
Y = student_scores.Score.values

Y

array([100,  90,  80,  70,  60])

In [0]:
Y.shape

(5,)

## Calculating the slope or coefficients

<h2><font color='khaki'>$ b = (X'X)^{-1}X'Y $</h2>

## Finally we Pre-multiply $(X'X)^{-1}$ by $Y$ to get the coefficients of the MLR model

In [0]:
coefficients = np.matmul(X_tp_inverse, Y)

coefficients

array([20. ,  0.5,  0.5])

## $y_{hat} = b0 + b1x1 + b2x2$

which now means

## $y_{hat} = 20 + 0.5x1 + 0.5x2$

which now means

## $y_{hat} = 20 + 0.5(IQ) + 0.5(Study-hours)$