MACHINE LEARNING REGRESSION PREDICTING ENERGY EFFICIENCY OF BUILDINGS

Dataset Description

https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv

The dataset for the remainder of this quiz is the Appliances Energy Prediction data. The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters). The attribute information can be seen below.

Attribute Information:

Date, time year-month-day hour:minute:second

Appliances, energy use in Wh

lights, energy use of light fixtures in the house in Wh

T1, Temperature in kitchen area, in Celsius

RH_1, Humidity in kitchen area, in %

T2, Temperature in living room area, in Celsius

RH_2, Humidity in living room area, in %

T3, Temperature in laundry room area

RH_3, Humidity in laundry room area, in %

T4, Temperature in office room, in Celsius

RH_4, Humidity in office room, in %

T5, Temperature in bathroom, in Celsius

RH_5, Humidity in bathroom, in %

T6, Temperature outside the building (north side), in Celsius

RH_6, Humidity outside the building (north side), in %

T7, Temperature in ironing room , in Celsius

RH_7, Humidity in ironing room, in %

T8, Temperature in teenager room 2, in Celsius

RH_8, Humidity in teenager room 2, in %

T9, Temperature in parents room, in Celsius

RH_9, Humidity in parents room, in %

To, Temperature outside (from Chievres weather station), in Celsius

Pressure (from Chievres weather station), in mm Hg

RH_out, Humidity outside (from Chievres weather station), in %

Wind speed (from Chievres weather station), in m/s

Visibility (from Chievres weather station), in km

Tdewpoint (from Chievres weather station), Â°C

rv1, Random variable 1, nondimensional

rv2, Random variable 2, nondimensional


In [61]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

In [62]:
df = pd.read_csv('energydata_complete.csv')

In [64]:
#12.linear model on the relationship between the temperature in the living room in Celsius (x = T2) and the temperature outside the building (y = T6). 

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

x = df[['T2']]
y = df[['T6']]

linear_model = LinearRegression()
linear_model.fit(x, y)

predictions = linear_model.predict(x)
#r Squared value
round(r2_score(y, predictions), 2)

0.64

In [74]:
#13. Normalize the dataset using the MinMaxScaler 
#Calculating the Mean Absolute Error

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error

X = df[df.columns.drop(["date", "lights", "Appliances"])]
y = df["Appliances"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

multiple_linear_model = LinearRegression()
multiple_linear_model.fit(X_train, y_train)

predicted_values = multiple_linear_model.predict(X_test)

#MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, predicted_values)
round(mae, 2)



53.64

In [83]:
#14. Residual Sum of Squares 

rss = np.sum(np.square(y_test - predicted_values))
round(rss, 2) 

51918501.21

In [110]:
#15. Calculating the Root Mean Squared Error
#this value was 93.64 earlier
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3) 

93.64

In [111]:
#16. Coefficient of determination

from sklearn.metrics import r2_score
r2_score = r2_score(y_test, predicted_values)
round(r2_score, 2)

0.15

In [112]:
#17. Obtain the feature weights from your linear model above. Which features have the lowest and highest weights respectively?

weights_df = pd.DataFrame([X.columns, multiple_linear_model.coef_]).transpose()
weights_df.columns = ["features", "weights"]
weights_df.sort_values(by="weights", ascending=False)





Unnamed: 0,features,weights
1,RH_1,495.526
4,T3,310.971
10,T6,252.717
23,Tdewpoint,125.431
14,T8,109.135
5,RH_3,102.772
11,RH_6,40.7121
21,Windspeed,31.2259
6,T4,31.0097
7,RH_4,27.3118


In [113]:
#18. Train a ridge regression model with an alpha value of 0.4. Is there any change to the root mean squared error (RMSE) when evaluated on the test set?
#Ans = No
from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=0.4)
ridge_model.fit(X_train, y_train)

test_predictions = ridge_model.predict(X_test)

#Root Mean Squared Error
round(mean_squared_error(y_test, test_predictions, squared=False), 3)

93.66

In [114]:
#19. Train a lasso regression model with an alpha value of 0.001 
#How many of the features have non-zero feature weights? Ans = 26
from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=0.001)
lasso_model.fit(X_train, y_train)

test_predictions = lasso_model.predict(X_test)

tmptdf = pd.DataFrame([X.columns, lasso_model.coef_]).transpose()
tmptdf.columns = ["features", "weights"]
tmptdf.sort_values(by="weights", ascending=False)

Unnamed: 0,features,weights
1,RH_1,494.326
4,T3,310.2
10,T6,249.834
23,Tdewpoint,118.063
14,T8,108.88
5,RH_3,102.172
11,RH_6,40.091
21,Windspeed,31.388
6,T4,30.3442
7,RH_4,26.5606


In [116]:
#20 Root Mean Squared Error
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3) 


93.64