# Appliances Energy Prediction data

> 

- author: Victor Omondi
- toc: true
- comments: true
- categories: [energy, machine-learning]
- image: images/ aepd-shield.png

# Libraries

In [33]:
import warnings
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

import numpy as np
import pandas as pd
pd.set_option("display.max.columns", None)
pd.set_option("display.max_colwidth", None)

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use("ggplot")

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import (LinearRegression, 
                                  Ridge, 
                                  Lasso)
from sklearn.metrics import (r2_score, 
                             mean_absolute_error, 
                             mean_squared_error)

# The Dataset

The dataset is the [Appliances Energy Prediction data](https://archive.ics.uci.edu/ml/machine-learning-databases/00374/). The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters). 

In [3]:
energy = pd.read_csv("datasets/energydata_complete.csv")
energy.head()

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,RH_5,T6,RH_6,T7,RH_7,T8,RH_8,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,45.566667,17.166667,55.2,7.026667,84.256667,17.2,41.626667,18.2,48.9,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,45.9925,17.166667,55.2,6.833333,84.063333,17.2,41.56,18.2,48.863333,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2016-01-11 17:20:00,50,30,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,45.89,17.166667,55.09,6.56,83.156667,17.2,41.433333,18.2,48.73,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,2016-01-11 17:30:00,50,40,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,45.723333,17.166667,55.09,6.433333,83.423333,17.133333,41.29,18.1,48.59,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,2016-01-11 17:40:00,60,40,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,45.53,17.2,55.09,6.366667,84.893333,17.2,41.23,18.1,48.59,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


## Dataset Description

The attribute information can be seen below.

### Attribute Information:

|Attribute|Description|Units|
|---|---|---|
|Date| time| year-month-day hour\:minute:second|
|Appliances| energy use| in Wh|
|lights| energy use of light fixtures in the house| in Wh|
|T1| Temperature in kitchen area| in Celsius|
|RH_1| Humidity in kitchen area| in %|
|T2| Temperature in living room area| in Celsius|
|RH_2| Humidity in living room area| in %|
|T3| Temperature in laundry room area||
|RH_3| Humidity in laundry room area| in %|
|T4| Temperature in office room| in Celsius|
|RH_4| Humidity in office room| in %|
|T5| Temperature in bathroom| in Celsius|
|RH_5| Humidity in bathroom| in %|
|T6| Temperature outside the building (north side)| in Celsius|
|RH_6| Humidity outside the building (north side)| in %|
|T7| Temperature in ironing room| in Celsius|
|RH_7| Humidity in ironing room| in %|
|T8| Temperature in teenager room 2| in Celsius|
|RH_8| Humidity in teenager room 2| in %|
|T9| Temperature in parents room| in Celsius|
|RH_9| Humidity in parents room| in %|
|To| Temperature outside (from Chievres weather station)| in Celsius|
|Pressure| (from Chievres weather station)| in mm Hg|
|RH_out| Humidity outside (from Chievres weather station)| in %|
|Wind speed| (from Chievres weather station)| in m/s|
|Visibility| (from Chievres weather station)| in km|
|Tdewpoint| (from Chievres weather station)| Â °C|
|rv1| Random variable 1| nondimensional|
|rv2| Random variable 2| nondimensional|

In [4]:
energy.describe()

Unnamed: 0,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,RH_5,T6,RH_6,T7,RH_7,T8,RH_8,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
count,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0
mean,97.694958,3.801875,21.686571,40.259739,20.341219,40.42042,22.267611,39.2425,20.855335,39.026904,19.592106,50.949283,7.910939,54.609083,20.267106,35.3882,22.029107,42.936165,19.485828,41.552401,7.411665,755.522602,79.750418,4.039752,38.330834,3.760707,24.988033,24.988033
std,102.524891,7.935988,1.606066,3.979299,2.192974,4.069813,2.006111,3.254576,2.042884,4.341321,1.844623,9.022034,6.090347,31.149806,2.109993,5.114208,1.956162,5.224361,2.014712,4.151497,5.317409,7.399441,14.901088,2.451221,11.794719,4.194648,14.496634,14.496634
min,10.0,0.0,16.79,27.023333,16.1,20.463333,17.2,28.766667,15.1,27.66,15.33,29.815,-6.065,1.0,15.39,23.2,16.306667,29.6,14.89,29.166667,-5.0,729.3,24.0,0.0,1.0,-6.6,0.005322,0.005322
25%,50.0,0.0,20.76,37.333333,18.79,37.9,20.79,36.9,19.53,35.53,18.2775,45.4,3.626667,30.025,18.7,31.5,20.79,39.066667,18.0,38.5,3.666667,750.933333,70.333333,2.0,29.0,0.9,12.497889,12.497889
50%,60.0,0.0,21.6,39.656667,20.0,40.5,22.1,38.53,20.666667,38.4,19.39,49.09,7.3,55.29,20.033333,34.863333,22.1,42.375,19.39,40.9,6.916667,756.1,83.666667,3.666667,40.0,3.433333,24.897653,24.897653
75%,100.0,0.0,22.6,43.066667,21.5,43.26,23.29,41.76,22.1,42.156667,20.619643,53.663333,11.256,83.226667,21.6,39.0,23.39,46.536,20.6,44.338095,10.408333,760.933333,91.666667,5.5,40.0,6.566667,37.583769,37.583769
max,1080.0,70.0,26.26,63.36,29.856667,56.026667,29.236,50.163333,26.2,51.09,25.795,96.321667,28.29,99.9,26.0,51.4,27.23,58.78,24.5,53.326667,26.1,772.3,100.0,14.0,66.0,15.5,49.99653,49.99653


In [5]:
energy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19735 entries, 0 to 19734
Data columns (total 29 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         19735 non-null  object 
 1   Appliances   19735 non-null  int64  
 2   lights       19735 non-null  int64  
 3   T1           19735 non-null  float64
 4   RH_1         19735 non-null  float64
 5   T2           19735 non-null  float64
 6   RH_2         19735 non-null  float64
 7   T3           19735 non-null  float64
 8   RH_3         19735 non-null  float64
 9   T4           19735 non-null  float64
 10  RH_4         19735 non-null  float64
 11  T5           19735 non-null  float64
 12  RH_5         19735 non-null  float64
 13  T6           19735 non-null  float64
 14  RH_6         19735 non-null  float64
 15  T7           19735 non-null  float64
 16  RH_7         19735 non-null  float64
 17  T8           19735 non-null  float64
 18  RH_8         19735 non-null  float64
 19  T9  

There are no missing values in the dataset.

In [6]:
scaler = MinMaxScaler()
normalised_df = pd.DataFrame(scaler.fit_transform(energy.drop(columns=['date', 'lights'])), 
                             columns=energy.drop(columns=['date', 'lights']).columns)
features_df = normalised_df.drop(columns=['Appliances'])
energy_target = normalised_df.Appliances
X_train, X_test, y_train, y_test = train_test_split(features_df, energy_target, test_size=.3, random_state=42)

From the dataset, fit a linear model on the relationship between the temperature in the living room in Celsius (x = T2) and the temperature outside the building (y = T6).

In [15]:
lin_reg = LinearRegression()
lin_reg.fit(X_train[['T2']], X_train.T6)
T6_pred = lin_reg.predict(X_test[['T2']])
print(f'r^2 score: {round(r2_score(X_test.T6, T6_pred), 2)}')

r^2 score: 0.64


In [17]:
print(f'MAE: {round(mean_absolute_error(X_test.T6, T6_pred), 2)}')

MAE: 0.08


In [20]:
print(f'Residual Sum of Squares: {round(np.sum(np.square(X_test.T6 - T6_pred)), 2)}')

Residual Sum of Squares: 274.9


In [23]:
print(f'Root Mean Squared Error: {round(np.sqrt(mean_squared_error(X_test.T6, T6_pred)), 3)}')

Root Mean Squared Error: 0.215


T7             1.0
T1             1.0
T2             1.0
T9             1.0
RH_8           1.0
RH_out         1.0
Tdewpoint      1.0
Visibility     1.0
Windspeed      1.0
Press_mm_hg    1.0
T_out          1.0
RH_9           1.0
T8             1.0
RH_7           1.0
Appliances     1.0
RH_6           1.0
T6             1.0
RH_5           1.0
RH_4           1.0
RH_3           1.0
T3             1.0
RH_2           1.0
RH_1           1.0
rv1            1.0
rv2            1.0
T5             1.0
T4             1.0
dtype: float64

In [30]:
energy.drop(columns=['date', 'lights']).max().sort_values()

Windspeed        14.000000
Tdewpoint        15.500000
T9               24.500000
T5               25.795000
T7               26.000000
T_out            26.100000
T4               26.200000
T1               26.260000
T8               27.230000
T6               28.290000
T3               29.236000
T2               29.856667
rv1              49.996530
rv2              49.996530
RH_3             50.163333
RH_4             51.090000
RH_7             51.400000
RH_9             53.326667
RH_2             56.026667
RH_8             58.780000
RH_1             63.360000
Visibility       66.000000
RH_5             96.321667
RH_6             99.900000
RH_out          100.000000
Press_mm_hg     772.300000
Appliances     1080.000000
dtype: float64

In [31]:
energy.drop(columns=['date', 'lights']).min().sort_values()

Tdewpoint       -6.600000
T6              -6.065000
T_out           -5.000000
Windspeed        0.000000
rv2              0.005322
rv1              0.005322
Visibility       1.000000
RH_6             1.000000
Appliances      10.000000
T9              14.890000
T4              15.100000
T5              15.330000
T7              15.390000
T2              16.100000
T8              16.306667
T1              16.790000
T3              17.200000
RH_2            20.463333
RH_7            23.200000
RH_out          24.000000
RH_1            27.023333
RH_4            27.660000
RH_3            28.766667
RH_9            29.166667
RH_8            29.600000
RH_5            29.815000
Press_mm_hg    729.300000
dtype: float64

In [32]:
def get_weights_df(model, feat, col_name):
    #this function returns the weight of every feature
    weights = pd.Series(model.coef_, feat.columns).sort_values()
    weights_df = pd.DataFrame(weights).reset_index()
    weights_df.columns = ['Features', col_name]
    weights_df[col_name].round(3)
    return weights_df

In [41]:
ridge_reg = Ridge(alpha=0.4)
ridge_reg.fit(X_train, y_train)

Ridge(alpha=0.4)

In [42]:
lasso_reg = Lasso(alpha=0.001)
lasso_reg.fit(X_train, y_train)

Lasso(alpha=0.001)

In [43]:
model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

In [44]:
linear_model_weights = get_weights_df(model, X_train, 'Linear_Model_Weight')
ridge_weights_df = get_weights_df(ridge_reg, X_train, 'Ridge_Weight')
lasso_weights_df = get_weights_df(lasso_reg, X_train, 'Lasso_weight')

final_weights = pd.merge(linear_model_weights, ridge_weights_df, on='Features')
final_weights = pd.merge(final_weights, lasso_weights_df, on='Features')

In [45]:
final_weights.sort_values("Linear_Model_Weight", ascending=False)

Unnamed: 0,Features,Linear_Model_Weight,Ridge_Weight,Lasso_weight
25,RH_1,0.553547,0.519525,0.01788
24,T3,0.290627,0.288087,0.0
23,T6,0.236425,0.217292,0.0
22,Tdewpoint,0.117758,0.083128,0.0
21,T8,0.101995,0.101028,0.0
20,RH_3,0.096048,0.095135,0.0
19,RH_6,0.038049,0.035519,-0.0
18,Windspeed,0.029183,0.030268,0.002912
17,T4,0.028981,0.027384,-0.0
16,RH_4,0.026386,0.024579,0.0


In [46]:
y_pred_lg = model.predict(X_test)
y_pred_r = ridge_reg.predict(X_test)
y_pred_l = lasso_reg.predict(X_test)

In [48]:
print(f'Root Mean Squared Error: {round(np.sqrt(mean_squared_error(y_test, y_pred_r)), 3)}')

Root Mean Squared Error: 0.088


In [51]:
print(f'Root Mean Squared Error: {round(np.sqrt(mean_squared_error(y_test, y_pred_l)), 3)}')

Root Mean Squared Error: 0.094
