<a href="https://colab.research.google.com/github/11AJ/Machine-Learning/blob/main/LinearRegression_Heating_Cooling_Load.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Features/Column Description :-

1) X1 - Relative Compactness<br>
2) X2 - Surface Area<br>
3) X3 - Wall Area<br>
4) X4 - Roof Area<br>
5) X5 - Overall Height<br>
6) X6 - Orientation<br>
7) X7 - Glazing Area<br>
8) X8 - Glazing Area Distrbution<br>
9) Y1 - Heating Load<br>
10) Y2 - Cooling Load


<b>Regression Task</b><br>
Use the given dataset and perform the following:-
<ol>
<li> Read the 'heat_load.xlsx' dataset.</li>
<li> Rename the columns as per the given features</li> 
<li> Remove/handle null values if any</li>    
<li> Considering all the features as independent features (except heating and cooling load) Split the dataset into training and test dataset with test_size = 25%</li>
<li> Predict the Heating load based on features from X1 to X8 and also calculate
the model score. Also find the intercept and the coefficients corresponding to
each of these features. Generate equation of Linear regression</li>
<li>Predict the Cooling load based on features from X1 to X8 and also calculate
the model score. Also find the intercept and the coefficients corresponding 
to each of these features. Generate equation of Linear regression</li>
<li> Compute MSE, MAE, RMSE for two scenarios(5 and 6)</li> 
<li>Select appropriate independent features based on Correlation matrix</li>
<li>Repeat Q5 and Q7 for heating load after the original dataset has been split into training and testing dataset with test_szie=25% with appropriate independent features selected from correlation matrix</li>
<li>Repeat Q6 and Q7 for cooling load after the original dataset has been split into training and testing dataset with test_szie=25% with appropriate independent features selected from correlation matrix</li>   
</ol>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Read the 'heat_load.xlsx' dataset.

In [74]:
df=pd.read_excel('/content/heat_load.xlsx')

In [75]:
df.head(5)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,Y1,Y2
0,0.98,514.5,294.0,110.25,7.0,2,0.0,0,15.55,21.33
1,0.98,514.5,294.0,110.25,7.0,3,0.0,0,15.55,21.33
2,0.98,514.5,294.0,110.25,7.0,4,0.0,0,15.55,21.33
3,0.98,514.5,294.0,110.25,7.0,5,0.0,0,15.55,21.33
4,0.9,563.5,318.5,122.5,7.0,2,0.0,0,20.84,28.28


Rename the columns as per the given features

In [76]:
df.rename(columns = {'X1':'Relative Compactness'}, inplace = True)
df.rename(columns = {'X2':'Surface Area'}, inplace = True)
df.rename(columns = {'X3':'Wall Area'}, inplace = True)
df.rename(columns = {'X4':'Roof Area'}, inplace = True)
df.rename(columns = {'X5':'Overall Height'}, inplace = True)
df.rename(columns = {'X6':'Orientation'}, inplace = True)
df.rename(columns = {'X7':'Glazing Area'}, inplace = True)
df.rename(columns = {'X8':'Glazing Area Distrbution'}, inplace = True)
df.rename(columns = {'Y1':'Heating Load'}, inplace = True)
df.rename(columns = {'Y2':'Cooling Load'}, inplace = True)

In [77]:
df.head(5)

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Orientation,Glazing Area,Glazing Area Distrbution,Heating Load,Cooling Load
0,0.98,514.5,294.0,110.25,7.0,2,0.0,0,15.55,21.33
1,0.98,514.5,294.0,110.25,7.0,3,0.0,0,15.55,21.33
2,0.98,514.5,294.0,110.25,7.0,4,0.0,0,15.55,21.33
3,0.98,514.5,294.0,110.25,7.0,5,0.0,0,15.55,21.33
4,0.9,563.5,318.5,122.5,7.0,2,0.0,0,20.84,28.28


Remove/handle null values if any

In [78]:
df.isnull().sum()

Relative Compactness        0
Surface Area                0
Wall Area                   0
Roof Area                   0
Overall Height              0
Orientation                 0
Glazing Area                0
Glazing Area Distrbution    0
Heating Load                0
Cooling Load                0
dtype: int64

Considering all the features as independent features (except heating and cooling load) Split the dataset into training and test dataset with test_size = 25%

In [124]:
X=df.iloc[:,:-2]
Y=df['Heating Load']

In [103]:
from sklearn.model_selection import train_test_split

In [104]:
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=.25)

Predict the Heating load based on features from X1 to X8 and also calculate the model score. Also find the intercept and the coefficients corresponding to each of these features. Generate equation of Linear regression

In [105]:
from sklearn.linear_model import LinearRegression

In [106]:
model=LinearRegression()

In [107]:
model.fit(X_train,Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [108]:
model.score(X,Y)

0.9159345135268653

In [109]:
model.coef_

array([-6.56986983e+01,  6.81111446e+10, -6.81111446e+10, -1.36222289e+11,
        3.91584849e+00, -4.00738120e-02,  1.92858950e+01,  2.25587217e-01])

In [110]:
model.intercept_

88.30156537154565

In [111]:
df.columns

Index(['Relative Compactness', 'Surface Area', 'Wall Area', 'Roof Area',
       'Overall Height', 'Orientation', 'Glazing Area',
       'Glazing Area Distrbution', 'Heating Load', 'Cooling Load'],
      dtype='object')

In [112]:
Head_load_eqn= df['Relative Compactness']*model.coef_[0] + df['Surface Area']*model.coef_[1] + df['Wall Area']*model.coef_[2] + df['Roof Area']*model.coef_[3] + df['Overall Height']*model.coef_[4] + df['Orientation']*model.coef_[5] + df['Glazing Area']*model.coef_[6] + df['Glazing Area Distrbution']*model.coef_[7]  + model.intercept_

In [120]:
Heat_load_pred=model.predict(X_test)

In [121]:
Heat_load_pred[0:5]

array([37.38544324, 32.42093909, 17.75156487, 35.5389989 , 14.06302009])

In [122]:
Head_load_eqn[0:5]

0    22.700717
1    22.660643
2    22.620569
3    22.580495
4    25.022982
dtype: float64

Predict the Cooling load based on features from X1 to X8 and also calculate the model score. Also find the intercept and the coefficients corresponding to each of these features. Generate equation of Linear regression

In [126]:
X1=df.iloc[:,:-2]
Y1=df['Cooling Load']

In [127]:
from sklearn.model_selection import train_test_split

In [128]:
X1_train,X1_test,Y1_train,Y1_test=train_test_split(X1,Y1,test_size=0.25)

In [129]:
from sklearn.linear_model import LinearRegression

In [130]:
mod=LinearRegression()

In [132]:
mod.fit(X1_train,Y1_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [133]:
mod.score(X1,Y1)

0.8878169186885901

In [139]:
Cool_load_pred=mod.predict(X1_test)

In [134]:
model.coef_

array([-6.56986983e+01,  6.81111446e+10, -6.81111446e+10, -1.36222289e+11,
        3.91584849e+00, -4.00738120e-02,  1.92858950e+01,  2.25587217e-01])

In [135]:
model.intercept_

88.30156537154565

In [136]:
df.columns

Index(['Relative Compactness', 'Surface Area', 'Wall Area', 'Roof Area',
       'Overall Height', 'Orientation', 'Glazing Area',
       'Glazing Area Distrbution', 'Heating Load', 'Cooling Load'],
      dtype='object')

In [138]:
Cool_load_eqn = df['Relative Compactness']*mod.coef_[0] + df['Surface Area']*mod.coef_[1] + df['Wall Area']*mod.coef_[2] + df['Roof Area']*mod.coef_[3] + df['Overall Height']*mod.coef_[4] + df['Orientation']*mod.coef_[5] + df['Glazing Area']*mod.coef_[6] + df['Glazing Area Distrbution']*mod.coef_[7] + mod.intercept_

In [140]:
Cool_load_pred[0:5]

array([33.24670815, 31.4748154 , 15.45596171, 17.02674236, 26.43372067])

In [141]:
Cool_load_eqn[0:5]

0    26.059643
1    26.184335
2    26.309028
3    26.433721
4    28.356518
dtype: float64

Compute MSE, MAE, RMSE for two scenarios(5 and 6)

In [142]:
from sklearn.metrics import mean_squared_error,mean_absolute_error

In [149]:
mse_heat=mean_squared_error(df['Heating Load'][0:192],Heat_load_pred)
mae_heat=mean_absolute_error(df['Heating Load'][0:192],Heat_load_pred)
rmse_heat=(mse_heat)**0.5

print(mse_heat)
print(mae_heat)
print(rmse_heat)

182.2828092992842
10.830880160929903
13.501215104548338


In [152]:
mse_cool=mean_squared_error(df['Cooling Load'][0:192],Cool_load_pred)
mae_cool=mean_absolute_error(df['Cooling Load'][0:192],Cool_load_pred)
rmse_cool=(mse_cool)**0.5

print(mse_cool)
print(mae_cool)
print(rmse_cool)

170.9470445472981
10.627961340727904
13.07467187149636


Select appropriate independent features based on Correlation matrix

In [153]:
import seaborn as sns

In [157]:
df.corr()

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Orientation,Glazing Area,Glazing Area Distrbution,Heating Load,Cooling Load
Relative Compactness,1.0,-0.9919015,-0.2037817,-0.8688234,0.8277473,0.0,1.2839860000000002e-17,1.76462e-17,0.622272,0.634339
Surface Area,-0.9919015,1.0,0.1955016,0.8807195,-0.8581477,0.0,1.318356e-16,-3.558613e-16,-0.65812,-0.672999
Wall Area,-0.2037817,0.1955016,1.0,-0.2923165,0.2809757,0.0,-7.9697259999999995e-19,0.0,0.455671,0.427117
Roof Area,-0.8688234,0.8807195,-0.2923165,1.0,-0.9725122,0.0,-1.381805e-16,-1.079129e-16,-0.861828,-0.862547
Overall Height,0.8277473,-0.8581477,0.2809757,-0.9725122,1.0,0.0,1.861418e-18,0.0,0.88943,0.895785
Orientation,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,-0.002587,0.01429
Glazing Area,1.2839860000000002e-17,1.318356e-16,-7.9697259999999995e-19,-1.381805e-16,1.861418e-18,0.0,1.0,0.2129642,0.269842,0.207505
Glazing Area Distrbution,1.76462e-17,-3.558613e-16,0.0,-1.079129e-16,0.0,0.0,0.2129642,1.0,0.087368,0.050525
Heating Load,0.6222719,-0.6581199,0.4556714,-0.8618281,0.8894305,-0.002587,0.2698417,0.08736846,1.0,0.975862
Cooling Load,0.6343391,-0.6729989,0.427117,-0.8625466,0.8957852,0.01429,0.207505,0.05052512,0.975862,1.0


Repeat Q5 and Q7 for heating load after the original dataset has been split into training and testing dataset with test_size=25% with appropriate independent features selected from correlation matrix

In [160]:
XF=df[['Relative Compactness','Wall Area','Overall Height','Glazing Area','Glazing Area Distrbution']]
YF=df['Heating Load']

In [161]:
XF_train,XF_test,YF_train,YF_test=train_test_split(XF,YF,test_size=0.25)

In [162]:
FMOD=LinearRegression()

In [163]:
FMOD.fit(XF_train,YF_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [164]:
FMOD.score(XF,YF)

0.9132525608368104

In [168]:
Final_heat_pred=FMOD.predict(XF_test)

In [170]:
FMOD.coef_

array([-14.56730434,   0.03493175,   5.62684081,  20.48565739,
         0.18887735])

In [171]:
FMOD.intercept_

-12.554291881523682

In [173]:
XF.columns

Index(['Relative Compactness', 'Wall Area', 'Overall Height', 'Glazing Area',
       'Glazing Area Distrbution'],
      dtype='object')

In [175]:
Final_heat_eqn= df['Relative Compactness']*FMOD.coef_[0] + df['Wall Area']*FMOD.coef_[1] + df['Overall Height']*FMOD.coef_[2] + df['Glazing Area']*FMOD.coef_[3] + df['Glazing Area Distrbution']*FMOD.coef_[4] + FMOD.intercept_

In [176]:
Final_heat_pred[0:5]

array([15.86396829, 24.57564725, 13.96128489, 10.60617144, 18.9368169 ])

In [177]:
Final_heat_eqn[0:5]

0    22.827571
1    22.827571
2    22.827571
3    22.827571
4    24.848783
dtype: float64

In [181]:
Final_mse_heat=mean_squared_error(df['Heating Load'][0:192],Final_heat_pred)
Final_mae_heat=mean_absolute_error(df['Heating Load'][0:192],Final_heat_pred)
Final_rmse_heat=(Final_mse_heat)**0.5

print(Final_mse_heat)
print(Final_mae_heat)
print(Final_rmse_heat)

191.7228988200071
11.082888871611635
13.846403822653992


Repeat Q6 and Q7 for cooling load after the original dataset has been split into training and testing dataset with test_szie=25% with appropriate independent features selected from correlation matrix

In [183]:
x5=df[['Relative Compactness','Wall Area','Overall Height','Glazing Area','Glazing Area Distrbution']]
y5=df['Cooling Load']

In [184]:
x5_train,x5_test,y5_train,y5_test=train_test_split(x5,y5,test_size=.25)

In [185]:
clr=LinearRegression()

In [186]:
clr.fit(x5_train,y5_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [187]:
clr.score(x5,y5)

0.8841055680372997

In [188]:
Final_cool_pred=clr.predict(x5_test)

In [190]:
clr.coef_

array([-2.05748414e+01,  1.56679362e-02,  5.81908551e+00,  1.43628611e+01,
        6.10660551e-02])

In [191]:
clr.intercept_

1.2668096195644019

In [189]:
x5.columns

Index(['Relative Compactness', 'Wall Area', 'Overall Height', 'Glazing Area',
       'Glazing Area Distrbution'],
      dtype='object')

In [193]:
Final_cool_eqn= df['Relative Compactness']*clr.coef_[0] + df['Wall Area']*clr.coef_[1] + df['Overall Height']*clr.coef_[2] + df['Glazing Area']*clr.coef_[3] + df['Glazing Area Distrbution']*clr.coef_[4] + clr.intercept_

In [194]:
Final_cool_pred[0:5]

array([20.4413843 , 36.5410055 , 16.17521325, 32.31071342, 36.72420366])

In [197]:
Final_cool_eqn[0:5]

0    26.443437
1    26.443437
2    26.443437
3    26.443437
4    28.473289
dtype: float64

In [198]:
Final_mse_cool=mean_squared_error(df['Cooling Load'][0:192],Final_cool_pred)
Final_mae_cool=mean_absolute_error(df['Cooling Load'][0:192],Final_cool_pred)
Final_rmse_cool=(Final_mse_cool)**0.5

print(Final_mse_cool)
print(Final_mae_cool)
print(Final_rmse_cool)

156.80868105056044
10.099406920810173
12.522327301686394
