Multivariate Linear Regression is similar to the simple linear regression model, but with multiple independent variables contributing to the dependent variable and hence multiple coefficients to determine and complex computation due to the added variables.

Below is one example of housing price data, where we do have three Independent Variables "area", "bedrooms" and "age" and one dependent variable "price".

![title](img/2_Mutlivariate_Linear_Regression_Example.png)

Equation for multivariate linear regression is pretty much similar to that of univariate linear regression except the fact that, due to multiple independent variables, we need to get multiple coeffiocient (m1, m2, m3 etc.) for each of the independent variables.

![title](img/2_Univariate_Linear_Regression_Equation.png)

Above is the equation for univariate linear regression. Y is dependent variable and x1 is independent variable. m1 is coefficient of independent variable x1.

Since, we have a single independent variable, so we need to calculate the single coefficient value.

In case of Multivariate linear regression, we do have multiple independent variables (or input) and hence we need to calculate multiple coefficient. Below given is the equtaion for multivariate linear regression for three dependent variables.

![title](img/2_Multivariate_Linear_Regression_Equation.png)

Lets jump into the practical!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model

In [2]:
df=pd.read_csv(r'H:\PythonWork\PracticeData\Fish.csv')

In [3]:
df.head()

Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
0,Bream,242.0,23.2,25.4,30.0,11.52,4.02
1,Bream,290.0,24.0,26.3,31.2,12.48,4.3056
2,Bream,340.0,23.9,26.5,31.1,12.3778,4.6961
3,Bream,363.0,26.3,29.0,33.5,12.73,4.4555
4,Bream,430.0,26.5,29.0,34.0,12.444,5.134


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159 entries, 0 to 158
Data columns (total 7 columns):
Species    159 non-null object
Weight     159 non-null float64
Length1    159 non-null float64
Length2    159 non-null float64
Length3    159 non-null float64
Height     159 non-null float64
Width      159 non-null float64
dtypes: float64(6), object(1)
memory usage: 8.8+ KB


Seperating Dependent Variable (Target variable) from the input dataset.

In [5]:
X = df.drop('Weight',axis='columns')

In [6]:
X

Unnamed: 0,Species,Length1,Length2,Length3,Height,Width
0,Bream,23.2,25.4,30.0,11.5200,4.0200
1,Bream,24.0,26.3,31.2,12.4800,4.3056
2,Bream,23.9,26.5,31.1,12.3778,4.6961
3,Bream,26.3,29.0,33.5,12.7300,4.4555
4,Bream,26.5,29.0,34.0,12.4440,5.1340
...,...,...,...,...,...,...
154,Smelt,11.5,12.2,13.4,2.0904,1.3936
155,Smelt,11.7,12.4,13.5,2.4300,1.2690
156,Smelt,12.1,13.0,13.8,2.2770,1.2558
157,Smelt,13.2,14.3,15.2,2.8728,2.0672


In [7]:
y = df.Weight

In [8]:
y

0      242.0
1      290.0
2      340.0
3      363.0
4      430.0
       ...  
154     12.2
155     13.4
156     12.2
157     19.7
158     19.9
Name: Weight, Length: 159, dtype: float64

In [None]:
Transforming categorical varaibles to numerical

In [11]:
Species = pd.get_dummies(X['Species'], drop_first = 'TRUE')

In [12]:
Species

Unnamed: 0,Parkki,Perch,Pike,Roach,Smelt,Whitefish
0,0,0,0,0,0,0
1,0,0,0,0,0,0
2,0,0,0,0,0,0
3,0,0,0,0,0,0
4,0,0,0,0,0,0
...,...,...,...,...,...,...
154,0,0,0,0,1,0
155,0,0,0,0,1,0
156,0,0,0,0,1,0
157,0,0,0,0,1,0


In [13]:
X = X.drop('Species', axis = 'columns')

In [14]:
X

Unnamed: 0,Length1,Length2,Length3,Height,Width
0,23.2,25.4,30.0,11.5200,4.0200
1,24.0,26.3,31.2,12.4800,4.3056
2,23.9,26.5,31.1,12.3778,4.6961
3,26.3,29.0,33.5,12.7300,4.4555
4,26.5,29.0,34.0,12.4440,5.1340
...,...,...,...,...,...
154,11.5,12.2,13.4,2.0904,1.3936
155,11.7,12.4,13.5,2.4300,1.2690
156,12.1,13.0,13.8,2.2770,1.2558
157,13.2,14.3,15.2,2.8728,2.0672


In [15]:
X = pd.concat([X,Species], axis = 1)

In [16]:
X

Unnamed: 0,Length1,Length2,Length3,Height,Width,Parkki,Perch,Pike,Roach,Smelt,Whitefish
0,23.2,25.4,30.0,11.5200,4.0200,0,0,0,0,0,0
1,24.0,26.3,31.2,12.4800,4.3056,0,0,0,0,0,0
2,23.9,26.5,31.1,12.3778,4.6961,0,0,0,0,0,0
3,26.3,29.0,33.5,12.7300,4.4555,0,0,0,0,0,0
4,26.5,29.0,34.0,12.4440,5.1340,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
154,11.5,12.2,13.4,2.0904,1.3936,0,0,0,0,1,0
155,11.7,12.4,13.5,2.4300,1.2690,0,0,0,0,1,0
156,12.1,13.0,13.8,2.2770,1.2558,0,0,0,0,1,0
157,13.2,14.3,15.2,2.8728,2.0672,0,0,0,0,1,0


In [18]:
from sklearn.model_selection import train_test_split

In [19]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state = 0)

In [21]:
X_test.shape

(32, 11)

In [22]:
reg=linear_model.LinearRegression()

In [23]:
reg.fit(X_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [24]:
reg.coef_

array([-71.73958808,  76.67271467,  16.28970884,  32.13498485,
        13.20175457, 123.86016468, 229.69582121,  70.93074455,
       196.10211259, 532.28414804, 113.40418593])

In [25]:
reg.intercept_

-930.4897697914975

In [26]:
y_train_predict = reg.predict(X_train)

In [27]:
y_train_predict

array([1155.53572308,  358.79005791,  590.03746782,  259.978418  ,
        365.18397948,  670.15161648,  674.08728847,  169.35139122,
        581.62581385, -125.08335337,  231.82508088,  767.88995671,
        118.46326131,  282.76923907,  238.4871875 ,  278.70677811,
        620.21704707,  427.50242506,  891.77005524, -167.89871432,
         45.03875912,  353.12813396,   69.0403215 ,  127.37246404,
        358.71915544,  524.3668012 ,  -20.80484382,  461.66383287,
          5.05906246,  -42.89315051,  113.21216709,  188.10651607,
        170.81098631,  -68.35586188, 1111.76705815,  960.55686327,
        842.23923537,  210.72277536,  262.20408955,  210.73756374,
         49.88724043,  833.06899073,  373.71542297,  237.06612744,
        186.57942634,  263.31583662,  528.38608241,  676.23281189,
        711.11642493,  730.91707014,  634.05821174,  589.63606564,
         52.09166101,  357.63379197,   -5.37236103,  419.87180999,
        144.80700582,  548.81747263,  511.4073296 ,  168.53022

In [31]:
y_train

143    1550.0
130     300.0
16      700.0
96      225.0
107     300.0
        ...  
9       500.0
103     260.0
67      170.0
117     650.0
47      160.0
Name: Weight, Length: 127, dtype: float64

In [29]:
from sklearn.metrics import r2_score

In [32]:
score = r2_score(y_train, y_train_predict)

In [33]:
score

0.9342848664059864

In [34]:
y_test_predict = reg.predict(X_test)

In [35]:
score = r2_score(y_test, y_test_predict)

In [36]:
score

0.9102350316202581

In [None]:
y_test_predict = reg.predict(X_test)

In [None]:
reg.predict([[,,]])