#  PREDICTING ENERGY EFFICIENCY OF BUILDINGS (Using Regression Models) - Project
### By: Ajibade Abdulquddus
<img src='energy2.jpg' />

The <a href src='https://archive.ics.uci.edu/ml/machine-learning-databases/00374/'>dataset</a> for this project is the Appliances Energy Prediction data. The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters). The attribute information can be seen below.

Attribute Information:

Date, time year-month-day hour:minute:second

Appliances, energy use in Wh

lights, energy use of light fixtures in the house in Wh

T1, Temperature in kitchen area, in Celsius

RH_1, Humidity in kitchen area, in %

T2, Temperature in living room area, in Celsius

RH_2, Humidity in living room area, in %

T3, Temperature in laundry room area

RH_3, Humidity in laundry room area, in %

T4, Temperature in office room, in Celsius

RH_4, Humidity in office room, in %

T5, Temperature in bathroom, in Celsius

RH_5, Humidity in bathroom, in %

T6, Temperature outside the building (north side), in Celsius

RH_6, Humidity outside the building (north side), in %

T7, Temperature in ironing room , in Celsius

RH_7, Humidity in ironing room, in %

T8, Temperature in teenager room 2, in Celsius

RH_8, Humidity in teenager room 2, in %

T9, Temperature in parents room, in Celsius

RH_9, Humidity in parents room, in %

To, Temperature outside (from Chievres weather station), in Celsius

Pressure (from Chievres weather station), in mm Hg

RH_out, Humidity outside (from Chievres weather station), in %

Wind speed (from Chievres weather station), in m/s

Visibility (from Chievres weather station), in km

Tdewpoint (from Chievres weather station), Â°C

rv1, Random variable 1, nondimensional

rv2, Random variable 2, nondimensional

________________________________________________________________
________________________________________________________________
________________________________________________________________

**Importing important libraries**

In [1]:
import numpy as np
import pandas as pd

**Reading the data as a DataFrame called 'df' and checking its head:**

In [2]:
df= pd.read_csv('energydata_complete.csv')
df.head()

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2016-01-11 17:20:00,50,30,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,2016-01-11 17:30:00,50,40,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,2016-01-11 17:40:00,60,40,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


**Fitting a linear model on the relationship between the temperature in the living room in Celsius (x = T2) and the temperature outside the building (y = T6) and finding the R-Squared value:**

In [3]:
from sklearn.linear_model import LinearRegression

In [4]:
model = LinearRegression()

In [7]:
x=df[['T2']]
y= df[['T6']]

In [8]:
model.fit(x,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [9]:
from sklearn.metrics import r2_score

In [10]:
# Question 12

r2_score = r2_score(x, y)
round(r2_score, 2)

-35.39

**Now we'll normalize the dataset using the MinMaxScaler after removing the following columns: [“date”, “lights”]**

**We'll then split the Data into training and testing data and run a multiple linear regression using the training set then evaluate the model on the test set:**

In [12]:
df=df.drop(['date','lights'], axis=1)
df.head()

Unnamed: 0,Appliances,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,60,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,45.566667,17.166667,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,60,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,45.9925,17.166667,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,50,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,45.89,17.166667,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,50,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,45.723333,17.166667,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,60,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,45.53,17.2,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


In [13]:
from sklearn.preprocessing import MinMaxScaler

In [14]:
scaler = MinMaxScaler()
normalised_df = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

  return self.partial_fit(X, y)


In [15]:
features_df = normalised_df.drop(columns=['Appliances'])
heating_target = normalised_df['Appliances']


In [16]:
features_df.head()

Unnamed: 0,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,RH_5,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,0.32735,0.566187,0.225345,0.684038,0.215188,0.746066,0.351351,0.764262,0.175506,0.381691,...,0.223032,0.67729,0.37299,0.097674,0.894737,0.5,0.953846,0.538462,0.265449,0.265449
1,0.32735,0.541326,0.225345,0.68214,0.215188,0.748871,0.351351,0.782437,0.175506,0.381691,...,0.2265,0.678532,0.369239,0.1,0.894737,0.47619,0.894872,0.533937,0.372083,0.372083
2,0.32735,0.530502,0.225345,0.679445,0.215188,0.755569,0.344745,0.778062,0.175506,0.380037,...,0.219563,0.676049,0.365488,0.102326,0.894737,0.452381,0.835897,0.529412,0.572848,0.572848
3,0.32735,0.52408,0.225345,0.678414,0.215188,0.758685,0.341441,0.770949,0.175506,0.380037,...,0.219563,0.671909,0.361736,0.104651,0.894737,0.428571,0.776923,0.524887,0.908261,0.908261
4,0.32735,0.531419,0.225345,0.676727,0.215188,0.758685,0.341441,0.762697,0.178691,0.380037,...,0.219563,0.671909,0.357985,0.106977,0.894737,0.404762,0.717949,0.520362,0.201611,0.201611


In [17]:
from sklearn.model_selection import train_test_split

In [61]:
X= features_df
y= heating_target

In [62]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [63]:
model.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

**Finding the  Mean Absolute Error (in two decimal places):**

In [64]:
# Question 13

from sklearn.metrics import mean_absolute_error
predicted_values= model.predict(X_test)
mae = mean_absolute_error(y_test, predicted_values)
round(mae, 2)

0.05

**Finding the Residual Sum of Squares (in two decimal places):**

In [65]:
# Question 14

rss = np.sum(np.square(y_test - predicted_values))
round(rss, 2)

45.35

**Finding the Root Mean Squared Error (in three decimal places):**

In [66]:
# Question 15

from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.088

**Finding the Coefficient of Determination/ R-Squared score (in two decimal places):**

In [67]:
# Question 16

from sklearn.metrics import r2_score
r2_score = r2_score(y_test, predicted_values)
round(r2_score, 2) 

0.15

**Now, we'll train a Ridge regression model with an alpha value of 0.4 and find if there is any change to the root mean squared error (RMSE) when evaluated on the test set**

In [47]:
X= features_df
y= heating_target

In [48]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.4)
ridge_reg.fit(X_train, y_train)

Ridge(alpha=0.4, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [49]:
# Question 18

predicted_values= ridge_reg.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.088

We can notice from the result above that there is no change to the root mean squared error (RMSE) when evaluated on the test set compared to the RMSE of the LinearRegression model.
_______

**Now we'll train a lasso regression model with an alpha value of 0.001 and obtain the new feature weights with it.**

In [52]:
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.001)
lasso_reg.fit(X_train, y_train)

Lasso(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

**Finding the new RMSE with the lasso regression:**

In [53]:
# Question 20

predicted_values= lasso_reg.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.094

**Comparing the effects of regularisation:**

In [30]:
def get_weights_df(model, feat, col_name):
#this function returns the weight of every feature
    weights = pd.Series(model.coef_, feat.columns).sort_values()
    weights_df = pd.DataFrame(weights).reset_index()
    weights_df.columns = ['Features', col_name]
    weights_df[col_name].round(3)
    return weights_df
linear_model_weights = get_weights_df(model, X_train, 'Linear_Model_Weight')
ridge_weights_df = get_weights_df(ridge_reg, X_train, 'Ridge_Weight')
lasso_weights_df = get_weights_df(lasso_reg, X_train, 'Lasso_weight')
final_weights = pd.merge(linear_model_weights, ridge_weights_df, on='Features')
final_weights = pd.merge(final_weights, lasso_weights_df, on='Features')


In [36]:
final_weights

Unnamed: 0,Features,Linear_Model_Weight,Ridge_Weight,Lasso_weight
0,rv2,-23103530000.0,0.000743,-0.0
1,RH_2,-0.4567347,-0.401134,-0.0
2,T_out,-0.3218519,-0.250765,0.0
3,T2,-0.2362082,-0.19388,0.0
4,T9,-0.1899433,-0.188584,-0.0
5,RH_8,-0.1576113,-0.156596,-0.00011
6,RH_out,-0.07766027,-0.050541,-0.049557
7,RH_7,-0.04460062,-0.046291,-0.0
8,RH_9,-0.03980315,-0.041701,-0.0
9,T5,-0.01566011,-0.020727,-0.0


In [32]:
model.coef_

array([-3.28105119e-03,  5.53549215e-01, -2.36208162e-01, -4.56734747e-01,
        2.90639707e-01,  9.60588467e-02,  2.89883963e-02,  2.63956775e-02,
       -1.56601131e-02,  1.60097204e-02,  2.36427838e-01,  3.80482094e-02,
        1.03165251e-02, -4.46006227e-02,  1.02005044e-01, -1.57611297e-01,
       -1.89943266e-01, -3.98031547e-02, -3.21851878e-01,  6.84485399e-03,
       -7.76602744e-02,  2.91888047e-02,  1.23066006e-02,  1.17758418e-01,
        2.31035306e+10, -2.31035306e+10])

### The End!

**To drop comments and Follow for more of this:**

GitHub: https://github.com/Ajisco

LinkedIn: https://www.linkedin.com/in/ajibade-abdulquddus-ab5237159

Twitter: https://mobile.twitter.com/Dayo_Ajisco

Instagram: https://www.instagram.com/dayo_ajisco

Phone No./ WhatsApp No.: +2349030987312

