### Multiple regression

Multiple regression is like linear regression, but with more than one independent value, meaning that we try to predict a value based on two or more variables.

Example from: https://www.w3schools.com/python/python_ml_multiple_regression.asp


In [24]:
import pandas
from sklearn import linear_model 
import matplotlib.pyplot as plt

In [25]:
df = pandas.read_csv("dataExample.csv") 
df

Unnamed: 0,Car,Model,Volume,Weight,CO2
0,Toyoty,Aygo,1000,790,99
1,Mitsubishi,Space Star,1200,1160,95
2,Skoda,Citigo,1000,929,95
3,Fiat,500,900,865,90
4,Mini,Cooper,1500,1140,105
5,VW,Up!,1000,929,105
6,Skoda,Fabia,1400,1109,90
7,Mercedes,A-Class,1500,1365,92
8,Ford,Fiesta,1500,1112,98
9,Audi,A1,1600,1150,99


In [26]:
X = df[['Weight', 'Volume']] # independent var
y = df['CO2']                # dependable var

From the sklearn module we will use the LinearRegression() method to create a linear regression object.

This object has a method called fit() that takes the independent and dependent values as parameters and fills the regression object with data that describes the relationship:

In [27]:
regr = linear_model.LinearRegression()
regr.fit(X, y) 

In [28]:
#predict the CO2 emission of a car where the weight is 2300kg, and the volume is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300]]) 

print(predictedCO2) 

[107.2087328]




#### Coefficients 
The coefficient is a factor that describes the relationship with an unknown variable.
The result array represents the coefficient values of weight and volume.

Weight: 0.00755095
Volume: 0.00780526

These values tell us that if the weight increase by 1kg, the CO2 emission increases by 0.00755095g.

And if the engine size (Volume) increases by 1 cm3, the CO2 emission increases by 0.00780526 g.

In [30]:
print(regr.coef_) 

[0.00755095 0.00780526]


This is one parameter per coefficient... how to combine them?

In [31]:
df = pandas.read_csv("dataExample.csv") 
df['VW']=df['Volume']*df['Weight']
df

Unnamed: 0,Car,Model,Volume,Weight,CO2,VW
0,Toyoty,Aygo,1000,790,99,790000
1,Mitsubishi,Space Star,1200,1160,95,1392000
2,Skoda,Citigo,1000,929,95,929000
3,Fiat,500,900,865,90,778500
4,Mini,Cooper,1500,1140,105,1710000
5,VW,Up!,1000,929,105,929000
6,Skoda,Fabia,1400,1109,90,1552600
7,Mercedes,A-Class,1500,1365,92,2047500
8,Ford,Fiesta,1500,1112,98,1668000
9,Audi,A1,1600,1150,99,1840000


In [32]:
X = df[['Weight', 'Volume','VW']] # independent var
y = df['CO2']                # dependable var

In [33]:
regr = linear_model.LinearRegression()
regr.fit(X, y) 

In [34]:
print(regr.coef_) 

[-4.25717518e-02 -3.21914104e-02  3.23357216e-05]


In [35]:
df['Funct']=regr.coef_[0]*df['Weight'] + regr.coef_[1]*df['Volume'] + regr.coef_[2]*df['Volume']*df['Weight']
df

Unnamed: 0,Car,Model,Volume,Weight,CO2,VW,Funct
0,Toyoty,Aygo,1000,790,99,790000,-42.457746
1,Mitsubishi,Space Star,1200,1160,95,1392000,-43.416814
2,Skoda,Citigo,1000,929,95,929000,-42.437687
3,Fiat,500,900,865,90,778500,-40.986787
4,Mini,Cooper,1500,1140,105,1710000,-45.261752
5,VW,Up!,1000,929,105,929000,-42.437687
6,Skoda,Fabia,1400,1109,90,1552600,-45.096285
7,Mercedes,A-Class,1500,1365,92,2047500,-41.591513
8,Ford,Fiesta,1500,1112,98,1668000,-45.718492
9,Audi,A1,1600,1150,99,1840000,-45.637197


In [36]:


plt.

SyntaxError: invalid syntax (2344461133.py, line 1)