# Appliances energy prediction
The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters).

In [225]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### load dataset

In [226]:
data=pd.read_csv("energydata_complete.csv")

### Attribute Information:

* date time year-month-day hour:minute:second
Appliances, energy use in Wh<br>
lights, energy use of light fixtures in the house in Wh<br>
T1, Temperature in <b>kitchen area</b>, in Celsius<br>
RH_1, Humidity in kitchen area, in %<br>
T2, Temperature in <b>living room area</b>, in Celsius<br>
RH_2, Humidity in living room area, in %<br>
T3, Temperature in <b>laundry room area</b><br>
RH_3, Humidity in laundry room area, in %<br>
T4, Temperature in <b>office room</b>, in Celsius<br>
RH_4, Humidity in office room, in %<br>
T5, Temperature in <b>bathroom</b>, in Celsius<br>
RH_5, Humidity in bathroom, in %<br>
T6, Temperature <b>outside the building (north side)</b>, in Celsius<br>
RH_6, Humidity outside the building (north side), in %<br>
T7, Temperature in <b>ironing room </b>, in Celsius<br>
RH_7, Humidity in ironing room, in %<br>
T8, Temperature in <b>teenager room 2</b>, in Celsius<br>
RH_8, Humidity in teenager room 2, in %<br>
T9, Temperature in <b>parents room</b>, in Celsius<br>
RH_9, Humidity in parents room, in %<br>
To, Temperature <b>outside (from Chievres weather station)</b>, in Celsius<br>
Pressure (from Chievres weather station), in mm Hg<br>
RH_out, Humidity outside (from Chievres weather station), in %<br>
Wind speed (from Chievres weather station), in m/s<br>
Visibility (from Chievres weather station), in km<br>
Tdewpoint (from Chievres weather station), Â°C<br>
rv1, Random variable 1, nondimensional<br>
rv2, Random variable 2, nondimensional<br>

In [227]:
data.head(2)

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195


In [228]:
data.shape

(19735, 29)

In [229]:
# normalise dataset
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df=data.drop(columns=["date","lights"])
normalised_data = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
features = normalised_data.drop(columns=['Appliances'])
target = normalised_data['Appliances']

In [230]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
x_train, x_test, y_train, y_test = train_test_split(features,target,test_size=0.3, random_state=42)
linear_model = LinearRegression()

In [231]:
#fit the model to the training dataset
linear_model.fit(x_train, y_train)
#obtain predictions
predicted_values = linear_model.predict(x_test)

In [232]:
#MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, predicted_values)
round(mae, 3)

0.05

In [233]:
# Residual Sum of Squares (RSS)
import numpy as np
rss = np.sum(np.square(y_test - predicted_values))
round(rss, 3)

45.348

In [234]:
# Root Mean Square Error (RMSE)
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.088

In [235]:
# R-squared
from sklearn.metrics import r2_score
r2_score = r2_score(y_test, predicted_values)
round(r2_score, 3)

0.149

In [236]:
predicted_values

array([0.03322207, 0.24411599, 0.03400024, ..., 0.06844707, 0.10032325,
       0.05722198])

In [237]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.5)
ridge_reg.fit(x_train, y_train)

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [238]:
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.001)
lasso_reg.fit(x_train, y_train)

Lasso(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

In [239]:
#comparing the effects of regularisation
def get_weights_df(model, feat, col_name):
#this function returns the weight of every feature
    weights = pd.Series(model.coef_, feat.columns).sort_values()
    weights_df = pd.DataFrame(weights).reset_index()
    weights_df.columns = ['Features', col_name]
    weights_df[col_name].round(3)
    return weights_df

In [240]:
linear_model_weights = get_weights_df(linear_model, x_train, 'Linear_Model_Weight')
ridge_weights_df = get_weights_df(ridge_reg, x_train, 'Ridge_Weight')
lasso_weights_df = get_weights_df(lasso_reg, x_train, 'Lasso_weight')
final_weights = pd.merge(linear_model_weights, ridge_weights_df, on='Features')
final_weights = pd.merge(final_weights, lasso_weights_df, on='Features')

In [195]:
print(final_weights)

       Features  Linear_Model_Weight  Ridge_Weight  Lasso_weight
0          RH_2            -0.456698     -0.401134     -0.000000
1         T_out            -0.321860     -0.250765      0.000000
2            T2            -0.236178     -0.193880      0.000000
3            T9            -0.189941     -0.188584     -0.000000
4          RH_8            -0.157595     -0.156596     -0.000110
5        RH_out            -0.077671     -0.050541     -0.049557
6          RH_7            -0.044614     -0.046291     -0.000000
7          RH_9            -0.039800     -0.041701     -0.000000
8            T5            -0.015657     -0.020727     -0.000000
9            T1            -0.003281     -0.021549      0.000000
10          rv1             0.000770      0.000743     -0.000000
11          rv2             0.000770      0.000743     -0.000000
12  Press_mm_hg             0.006839      0.006516     -0.000000
13           T7             0.010319      0.010021     -0.000000
14   Visibility          

In [196]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.4)
ridge_reg.fit(x_train, y_train)
predicted_values = ridge_reg.predict(x_test)

In [197]:
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.088

In [198]:
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.001)
lasso_reg.fit(x_train, y_train)
predicted_values_lasso_reg = lasso_reg.predict(x_test)
# Root Mean Square Error (RMSE)
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, predicted_values_lasso_reg))
round(rmse, 3)

0.094

## Question 1

Gaussian prior

## Question 2

## Question 3

## Question 4

## Question 5

## Question 6

## Question 7

y=a+b*x

## Question 8

## Question 9

## Question 10

## Question 11

## Question 12

In [199]:
data_new=df[["T2","T6"]]
data_new.head(1)

Unnamed: 0,T2,T6
0,19.2,7.026667


In [200]:
x=data_new["T2"]
y=data_new["T6"]
xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=0.3, random_state=42)

In [201]:
xtrain=xtrain.to_frame()
ytrain=ytrain.to_frame()
xtest=xtest.to_frame()
#ytest=ytest.to_frame()


In [202]:
ytest=ytest.to_numpy()
ytest.shape

(5921,)

In [203]:
ytest=ytest.reshape(5921, 1)

In [204]:
#x_train, x_test, y_train, y_test = train_test_split(features_df, heating_target,test_size=0.3, random_state=1)
linear_model = LinearRegression()
linear_model.fit(xtrain, ytrain)
#obtain predictions
predicted_values1 = linear_model.predict(xtest)
predicted_values1.shape

(5921, 1)

In [205]:
ytest

array([[ 1.19857143],
       [ 2.53      ],
       [-0.26666667],
       ...,
       [ 9.6       ],
       [13.19571429],
       [12.5       ]])

In [206]:
predicted_values1

array([[ 2.15578912],
       [10.01116055],
       [ 1.87391554],
       ...,
       [ 4.24758774],
       [ 8.69822311],
       [ 4.9893603 ]])

In [207]:
# Root Mean Square Error (RMSE)
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(ytest, predicted_values1))
round(rmse, 3)

3.63

In [208]:
r2_score = r2_score(ytest, predicted_values1)
round(r2_score, 3)


TypeError: 'numpy.float64' object is not callable

## Question 13

In [209]:
#MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, predicted_values)
round(mae, 3)

0.05

## Question 14

In [210]:
# Residual Sum of Squares (RSS)
import numpy as np
rss = np.sum(np.square(y_test - predicted_values))
round(rss, 3)

45.368

## Question 15

In [211]:
# Root Mean Square Error (RMSE)
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.088

## Question 16

In [212]:
# R-squared
from sklearn.metrics import r2_score
r2_score = r2_score(y_test, predicted_values)
round(r2_score, 3)

0.149

## Question 17

In [241]:
#comparing the effects of regularisation
def get_weights_df(model, feat, col_name):
#this function returns the weight of every feature
    weights = pd.Series(model.coef_, feat.columns).sort_values()
    weights_df = pd.DataFrame(weights).reset_index()
    weights_df.columns = ['Features', col_name]
    weights_df[col_name].round(3)
    return weights_df

In [242]:
linear_model_weights = get_weights_df(linear_model, x_train, 'Linear_Model_Weight')
print(linear_model_weights)

       Features  Linear_Model_Weight
0          RH_2            -0.456698
1         T_out            -0.321860
2            T2            -0.236178
3            T9            -0.189941
4          RH_8            -0.157595
5        RH_out            -0.077671
6          RH_7            -0.044614
7          RH_9            -0.039800
8            T5            -0.015657
9            T1            -0.003281
10          rv1             0.000770
11          rv2             0.000770
12  Press_mm_hg             0.006839
13           T7             0.010319
14   Visibility             0.012307
15         RH_5             0.016006
16         RH_4             0.026386
17           T4             0.028981
18    Windspeed             0.029183
19         RH_6             0.038049
20         RH_3             0.096048
21           T8             0.101995
22    Tdewpoint             0.117758
23           T6             0.236425
24           T3             0.290627
25         RH_1             0.553547


## Question 18

In [215]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.5)
ridge_reg.fit(x_train, y_train)
predicted_values_ridge = ridge_reg.predict(x_test)

In [216]:
# ridge model
rmse = np.sqrt(mean_squared_error(y_test, predicted_values_ridge))
round(rmse, 3)

0.088

In [217]:
#linear model
rmse = np.sqrt(mean_squared_error(y_test, predicted_values))
round(rmse, 3)

0.088

## Question 19

In [218]:
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.001)
lasso_reg.fit(x_train, y_train)
lasso_predict= lasso_reg.predict(x_test)

In [219]:
lasso_weights = get_weights_df(lasso_reg, x_train, 'Lasso_Weight')
print(lasso_weights)

       Features  Lasso_Weight
0        RH_out     -0.049557
1          RH_8     -0.000110
2            T1      0.000000
3     Tdewpoint      0.000000
4    Visibility      0.000000
5   Press_mm_hg     -0.000000
6         T_out      0.000000
7          RH_9     -0.000000
8            T9     -0.000000
9            T8      0.000000
10         RH_7     -0.000000
11          rv1     -0.000000
12           T7     -0.000000
13           T6      0.000000
14         RH_5      0.000000
15           T5     -0.000000
16         RH_4      0.000000
17           T4     -0.000000
18         RH_3      0.000000
19           T3      0.000000
20         RH_2     -0.000000
21           T2      0.000000
22         RH_6     -0.000000
23          rv2     -0.000000
24    Windspeed      0.002912
25         RH_1      0.017880


4 non zero coefficients

## Question 20

In [220]:
rmse = np.sqrt(mean_squared_error(y_test, lasso_predict))
round(rmse, 3)

0.094