# Predicting Energy Efficiency of Buildings

<img src = "regg.jfif" style="width:700px;height:40px  Appliances usage graph">

__Context:__

For the Graded Assessment of the Machine Learning: Regression - Predicting Energy Efficiency of Buildings -stage B quiz , you are expected to make use of the provided dataset,I downloaded the dataset [HERE](https://drive.google.com/file/d/1Eru_UHVc3WLHVveC9Q8K9QUxlzYeHt18/view)




__Dataset Description__ 

The dataset is the Appliances Energy Prediction data. The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters). The attribute information can be seen below.

Attribute Information:

__Date,__ time year-month-day hour:minute:second

__Appliances,__ energy use in Wh

__lights,__ energy use of light fixtures in the house in Wh

__T1,__ Temperature in kitchen area, in Celsius

__RH_1,__ Humidity in kitchen area, in %

__T2,__ Temperature in living room area, in Celsius

__RH_2,__ Humidity in living room area, in %

__T3, Temperature in laundry room area

__RH_3,__ Humidity in laundry room area, in %

__T4,__ Temperature in office room, in Celsius

__RH_4,__ Humidity in office room, in %

__T5,__ Temperature in bathroom, in Celsius

__RH_5,__ Humidity in bathroom, in %

__T6,__ Temperature outside the building (north side), in Celsius

__RH_6,__ Humidity outside the building (north side), in %

__T7,__ Temperature in ironing room , in Celsius

__RH_7,__ Humidity in ironing room, in %

__T8,__ Temperature in teenager room 2, in Celsius

__RH_8,__ Humidity in teenager room 2, in %

__T9,__ Temperature in parents room, in Celsius

__RH_9,__ Humidity in parents room, in %

To, Temperature outside (from Chievres weather station), in Celsius

__Pressure__ (from Chievres weather station), in mm Hg

__RH_out,__ Humidity outside (from Chievres weather station), in %

__Wind speed__ (from Chievres weather station), in m/s

__Visibility__ (from Chievres weather station), in km

__Tdewpoint__ (from Chievres weather station), Â°C

__rv1,__ Random variable 1, nondimensional

__rv2,__ Random variable 2, nondimensional



__Objective__<br>
Evaluate  linear regression models on the dataset

## Import neccessary libraries

In [36]:
#Import Libraries to read and manipulate data

import pandas as pd
import numpy as np

# Libaries for data visualization
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns



# Import library for preparing data
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

#linear models
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

#regression model evaluation metric
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error

import warnings
warnings.filterwarnings("ignore")

## Read in the dataset and overview the data

In [37]:
df = pd.read_csv("energydata_complete.csv")

In [38]:
df.head(2)

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195


In [39]:
df.tail(2)

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
19733,2016-05-27 17:50:00,420,10,25.5,46.99,25.414,43.036,26.89,41.29,24.7,...,23.2,46.8175,22.333333,755.2,56.666667,3.833333,26.166667,13.233333,6.322784,6.322784
19734,2016-05-27 18:00:00,430,10,25.5,46.6,25.264286,42.971429,26.823333,41.156667,24.7,...,23.2,46.845,22.2,755.2,57.0,4.0,27.0,13.2,34.118851,34.118851


In [40]:
#copying the data
df1 = df[:]
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19735 entries, 0 to 19734
Data columns (total 29 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         19735 non-null  object 
 1   Appliances   19735 non-null  int64  
 2   lights       19735 non-null  int64  
 3   T1           19735 non-null  float64
 4   RH_1         19735 non-null  float64
 5   T2           19735 non-null  float64
 6   RH_2         19735 non-null  float64
 7   T3           19735 non-null  float64
 8   RH_3         19735 non-null  float64
 9   T4           19735 non-null  float64
 10  RH_4         19735 non-null  float64
 11  T5           19735 non-null  float64
 12  RH_5         19735 non-null  float64
 13  T6           19735 non-null  float64
 14  RH_6         19735 non-null  float64
 15  T7           19735 non-null  float64
 16  RH_7         19735 non-null  float64
 17  T8           19735 non-null  float64
 18  RH_8         19735 non-null  float64
 19  T9  

In [41]:
#summary statistics
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Appliances,19735.0,97.694958,102.524891,10.0,50.0,60.0,100.0,1080.0
lights,19735.0,3.801875,7.935988,0.0,0.0,0.0,0.0,70.0
T1,19735.0,21.686571,1.606066,16.79,20.76,21.6,22.6,26.26
RH_1,19735.0,40.259739,3.979299,27.023333,37.333333,39.656667,43.066667,63.36
T2,19735.0,20.341219,2.192974,16.1,18.79,20.0,21.5,29.856667
RH_2,19735.0,40.42042,4.069813,20.463333,37.9,40.5,43.26,56.026667
T3,19735.0,22.267611,2.006111,17.2,20.79,22.1,23.29,29.236
RH_3,19735.0,39.2425,3.254576,28.766667,36.9,38.53,41.76,50.163333
T4,19735.0,20.855335,2.042884,15.1,19.53,20.666667,22.1,26.2
RH_4,19735.0,39.026904,4.341321,27.66,35.53,38.4,42.156667,51.09


###### A univariate linear regression model

In [42]:
x = df1['T2']
y = df1['T6']

In [43]:
x = pd.DataFrame(x)
y = pd.DataFrame(y)

In [44]:
xu_train, xu_test, yu_train, yu_test = train_test_split(x,y,test_size=0.3,random_state=42)

xu_train.shape,xu_test.shape,yu_train.shape,yu_test.shape

((13814, 1), (5921, 1), (13814, 1), (5921, 1))

In [45]:
linear_model = LinearRegression()

# fit the model into the training dataset
linear_model.fit(xu_train,yu_train)


LinearRegression()

In [46]:
# obtain predictions
lrpredicted_values = linear_model.predict(xu_test)

#evaluating the model
r2_score = r2_score(yu_test,lrpredicted_values)
round(r2_score,2)

0.64

###### multivariate linear regression

In [47]:
# Dropping lights and date columns
df1 = df1.drop(columns=['lights','date'])


In [48]:
scaler = MinMaxScaler()
df1_scaled = pd.DataFrame(scaler.fit_transform(df1),columns=df1.columns)
df1_scaled.sample(2,random_state = 42)

Unnamed: 0,Appliances,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
8980,0.028037,0.432946,0.230529,0.120669,0.525822,0.25673,0.380122,0.41982,0.279129,0.31247,...,0.457856,0.408251,0.217578,0.92093,0.846491,0.166667,0.953846,0.298643,0.512428,0.512428
2754,0.074766,0.538543,0.717641,0.377272,0.700066,0.368339,0.956224,0.489489,0.567933,0.224346,...,0.145682,0.622241,0.247588,0.588372,0.868421,0.214286,0.4,0.352941,0.469466,0.469466


In [49]:
# separating the predictors and target variable
df1_features = df1_scaled.drop(columns=['Appliances'])
appliances_target = df1_scaled['Appliances']

In [50]:
#split into train test data

x_train, x_test, y_train, y_test = train_test_split(df1_features,
                                                    appliances_target, 
                                                    test_size=0.3,random_state=42)
x_train.shape,x_test.shape,y_train.shape,y_test.shape                                                   

((13814, 26), (5921, 26), (13814,), (5921,))

### Linear Regression Model

In [51]:
linear_reg = LinearRegression()

# fit the model into  training dataset
linear_reg.fit(x_train,y_train)

# obtain predictions
predicted_values = linear_reg.predict(x_test)

In [52]:
#Mean Absolute Error

mae = mean_absolute_error(y_test,predicted_values)
round(mae,2)

0.05

In [53]:
#Residual Sum of Square

rss = np.sum(np.square(y_test - predicted_values))
round(rss,2)

45.35

In [54]:
#Mean Square Error

mse = mean_squared_error(y_test,predicted_values)
round(mse,4)

0.0077

In [55]:
#Root Mean Square Error (RMSE)

rmse = np.sqrt(mean_squared_error(y_test,predicted_values))
round(rmse,3)


0.088

In [56]:
# obtain predictions
predicted_values = linear_reg.predict(x_test)

In [57]:
#R-Squared or Coefficient of Determination
from sklearn.metrics import r2_score

mr2_score = r2_score(y_test,predicted_values)
round(mr2_score,2)

0.15

In [58]:
def get_weights_df(model,feat,col_name):
    #returns the weight of every feature
    weights = pd.Series(model.coef_,feat.columns).sort_values(ascending = False)
    
    # .reset_index() renumbers the dataframe from 0
    weights_df = pd.DataFrame(weights).reset_index()
    
    # assign the column names 'Features' and col_name from the function
    weights_df.columns = ['Features', col_name]
    
    weights_df[col_name].round(3)
    return weights_df

In [59]:
linear_model_weights = get_weights_df(linear_reg,x_train,"Linear_regresion")

In [60]:
linear_model_weights

Unnamed: 0,Features,Linear_regresion
0,RH_1,0.553547
1,T3,0.290627
2,T6,0.236425
3,Tdewpoint,0.117758
4,T8,0.101995
5,RH_3,0.096048
6,RH_6,0.038049
7,Windspeed,0.029183
8,T4,0.028981
9,RH_4,0.026386


### Ridge regression model

In [61]:
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

In [62]:
ridge_reg = Ridge(alpha=0.4)
ridge_reg.fit(x_train,y_train)

Ridge(alpha=0.4)

In [63]:
rgpredicted_values = ridge_reg.predict(x_test)

In [64]:
#RSME

rmse = np.sqrt(mean_squared_error(y_test,rgpredicted_values))
round(rmse,3)


0.088

### Lasso Regression model

In [65]:
lasso_reg = Lasso(alpha=0.001)
lasso_reg.fit(x_train,y_train)

Lasso(alpha=0.001)

In [66]:
#get the weights of the lasso model coefficients

lasso_weights_df = get_weights_df(lasso_reg,x_train,"Lasso_Weights")

In [67]:
lasso_weights_df

Unnamed: 0,Features,Lasso_Weights
0,RH_1,0.01788
1,Windspeed,0.002912
2,T1,0.0
3,T7,-0.0
4,rv1,-0.0
5,Tdewpoint,0.0
6,Visibility,0.0
7,Press_mm_hg,-0.0
8,T_out,0.0
9,RH_9,-0.0


The models highest and lowest weight are shown above

In [68]:
#To count the Lasso model weights that are not 0

lasso_weights_df[lasso_weights_df['Lasso_Weights'] != 0].count()

Features         4
Lasso_Weights    4
dtype: int64

In [69]:
Lasso_pvalues = lasso_reg.predict(x_test)

In [70]:
#RMSE

rmse = np.sqrt(mean_squared_error(y_test,Lasso_pvalues))
round(rmse,3)

0.094