# Multiple linear regression 

Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable 

In [19]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D #Used to plot in a 3D place
%matplotlib inline

What np.matrix does is that when you define a list as np.matrix, it pre-calculates the inverse and transpose of that matrix and stores the values which can be called as X.I for inverse and X.T for transpose

In [2]:
X = np.matrix([[1,6,2], #x0, study time, play time
               [1,7,4],
               [1,3,2],
               [1,1,2],
               [1,6,3]]) #multilinear data

Y = np.matrix([[70],
               [72],
               [50],
               [45],
               [73]]) #multilinear labels

## Minimise Cost Function: Specific Example
### X: m x (n + 1)
### m: number of training examples
### n: number of features
### X_transpose: (n + 1) x m
### X_transpose * X: (n + 1) x m * m x (n + 1) = (n + 1) x (n + 1)
### (X_transpose * X)^-1 * X_transpose: (n + 1) x (n + 1) * (n + 1) x m = (n + 1) x m
### theta = (n + 1) x m * m x 1 = (n + 1) x 1

Transpose of X and Y

In [4]:
X.T,Y.T

(matrix([[1, 1, 1, 1, 1],
         [6, 7, 3, 1, 6],
         [2, 4, 2, 2, 3]]),
 matrix([[70, 72, 50, 45, 73]]))

In [7]:
XT = X.T
YT = Y.T

Dot product of two arrays

In [10]:
np.dot(XT,X)

matrix([[  5,  23,  13],
        [ 23, 131,  66],
        [ 13,  66,  37]])

 inverse of XTranspose.X

In [11]:
XTX_inv = np.dot(XT,X).I
XTX_inv

matrix([[ 2.32701422,  0.03317536, -0.87677725],
        [ 0.03317536,  0.07582938, -0.14691943],
        [-0.87677725, -0.14691943,  0.5971564 ]])

#### Inverse(XTX).XT.Y, we did this formula with derivation in class
#### It is the formula to determine the bias and weights, here stored in matrix B (Theta) (Theta0,Theta1,...)

In [12]:
theta = np.dot(XTX_inv , np.dot(XT,Y))

 #### First value is bias rest all are weights

In [14]:
theta

matrix([[39.16587678],
        [ 5.37914692],
        [-0.73459716]])

#### Yhat = X.theta was the formula for predicted Y, we started off our derivation with this formula
#### Since we wanted error to be 0, we equated Y = Yhat

In [15]:
y_hat = np.dot(X,theta)
y_hat

matrix([[69.97156398],
        [73.88151659],
        [53.83412322],
        [43.07582938],
        [69.23696682]])

### Taken from mpl_toolkit documentation 
##### Plane is formed due to performing multilinear regression on 2 features, performing on 3 or more features results in hyperplane, 1 feature results in line 


In [7]:
plt3d = plt.figure(figsize=(15,12)).gca(projection = '3d')
X1 = np.array(X[:,1]) #Array of X1 features
X2 = np.array(X[:,2]) #Array of X2 features
xx , yy = np.meshgrid(range(min(X1)[0],max(X1)[0]),
                      range(min(X2)[0],max(X2)[0])) #Creates a meshgrid in 3D
plt3d.scatter(np.array(X[:,1]),np.array(X[:,2]),
              np.array(Y[:,0]),color = 'blue',s=100) #Scattering of points (X1,X2,Y)
plt3d.plot_surface(xx,yy,
                   np.array(theta[0,0]+theta[1,0]*xx
                            +theta[2,0]*yy),color = 'cyan') #Plots the plane formed in 3D space 
plt3d.view_init(-140,120)
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()


NameError: name 'plt' is not defined

### Multivariate Regression Algorithm in function

We have now created functions to turn any dataset into aur matrix format so that we can apply our equations on it


In [4]:
def make_X_mat(dataset):
    return np.append(
        np.ones((dataset.shape[0],1)),
        dataset,
        axis=1)

Calculates weights and bias matrix for given input

In [5]:
def get_theta(X,Y):
    XT = X.T
    XTX_inv = np.dot(XT,X).I
    return np.dot(XTX_inv , np.dot(XT,Y))

Gives prediction for given feature matrix

In [6]:
def predict(X,theta):
    return(np.dot(X,theta))