# SmartHome AI Training


 We set out to train an machine learning system using data from a house in *Sceaux, France*. 47 months of data capture across the house with specific data on 3 rooms was a large training set for us to use.

In [1]:
!pip install keras



In [2]:
import warnings

with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=DeprecationWarning)
    warnings.filterwarnings("ignore",category=FutureWarning)
    warnings.filterwarnings("ignore",category=UserWarning)


For this machine learning project we used keras with all the basic Python data science packages!

In [3]:
from math import sqrt
from numpy import concatenate
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

Using TensorFlow backend.


To create the dataset needed to train the model, the data was prepared using the following function. It creates arrays of data for the same variable but shifted by a timestep. When processed in the machine learning algrorithm this forces the network to learn how to predict the next timestep from the previous step's data.

In [4]:

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

To load the dataset we used Pandas to create a dataframe. Pandas can load .csv files directly. A few missing data points were contained in the .csv files marked by a '?'. These were removed using `dataset = dataset.replace({'?':0.0} )`. All data was converted to the same datatype to avoid issues later.

In [5]:
# load dataset
dataset = read_csv('new_house.csv', header=0, index_col=0, low_memory=False)
dataset = dataset.replace({'?':0.0} )
values = dataset.values


# ensure all data is float
values = values.astype('float32')



  mask |= (ar1 == a)


We used some of the *Scikit* tools to further prepare the data. A `MinMaxScaler` for all of the values so that they had a conversion to lie between 0 and 1. The Scaler also creates an inverse version of itself at the same time so that we can get back to it at the end. We decided to save time on training and only predict overall energy usage. 

This was not a simple function of the 3 rooms as there was a lot of hidden data that was not recorded in the study.

In [6]:
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
reframed.drop(reframed.columns[[3,4,5,6,7,8]], axis=1, inplace=True)
# drop columns we don't want to predict
print(reframed.head())

      var1(t-1)  var2(t-1)  var3(t-1)   var2(t)   var3(t)   var4(t)   var5(t)  \
1  0.000000e+00   0.379069   0.300719  0.481928  0.313669  0.919260  0.475207   
2  9.536743e-07   0.481928   0.313669  0.483186  0.358273  0.917922  0.475207   
3  9.536743e-07   0.483186   0.358273  0.484445  0.361151  0.919693  0.475207   
4  1.907349e-06   0.484445   0.361151  0.329617  0.379856  0.927326  0.326446   
5  1.907349e-06   0.329617   0.379856  0.316490  0.375540  0.924730  0.309917   

   var6(t)  var7(t)   var8(t)  
1      0.0   0.0125  0.516129  
2      0.0   0.0250  0.548387  
3      0.0   0.0125  0.548387  
4      0.0   0.0125  0.548387  
5      0.0   0.0250  0.548387  


## Seperating Training and Testing
---
The overall dataset is huge so we trained using the first $\sim 16$ months of data. A test set was then generated for just a few months following this sample. The training data has to be reshaped so that it can be fed into the *Long Short-Term Memory Cells* or **LSTMs** that are used in the network. 

In [7]:
values = reframed.values
n_train_hours = int(0.3*len(values))
train = values[:n_train_hours, :]
test = values[n_train_hours:int(1.2*n_train_hours), :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

(614762, 1, 9) (614762,) (122952, 1, 9) (122952,)


## Training Model
---


Here we define the model that is going to be trained. This is a very simple model comprised of **50 LSTM** cells that feed into a single dense layer. This dense layer outputs a single value; the value we want for the next time step. The loss function used is the *Mean Absolute Error*.

$mae = |PredictedValue-TrueValue|$

In [8]:
# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')


## Callback Systems
---

We use some of the pre-built Keras callback classes to speed up our training processes and make sure that the model we used is the best that we have made so far.

The first callback used is an `EarlyStopping` class. This class halts the training if the validation loss has not improved for a set number of epochs known as the *patience*.


In [9]:
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')

The next class used is a `ModelCheckpoint` class. This saves the models throughout the training run so that we can always pull back to the model that performed the best so far. If `EarlyStopping` activates then this means the model will revert back to the state it was 10 epochs ago. 

In [12]:

mcp_save = ModelCheckpoint('.mdl_wts.hdf5', save_best_only=True, monitor='val_loss', mode='min')
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=7, verbose=0, epsilon=1e-4, mode='min')

### Now we begin training!

In [None]:
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, 
                    validation_data=(test_X, test_y), verbose=2, shuffle=False,
                   callbacks=[earlyStopping, mcp_save, reduce_lr_loss])


Train on 614762 samples, validate on 122952 samples
Epoch 1/50
 - 66s - loss: 0.1087 - val_loss: 0.1014
Epoch 2/50
 - 60s - loss: 0.1043 - val_loss: 0.0994
Epoch 3/50
 - 66s - loss: 0.1024 - val_loss: 0.0991
Epoch 4/50
 - 69s - loss: 0.1010 - val_loss: 0.0990
Epoch 5/50
 - 62s - loss: 0.0997 - val_loss: 0.0989
Epoch 6/50
 - 62s - loss: 0.0986 - val_loss: 0.0983
Epoch 7/50


Here we plot the training loss and the validation loss over time. As you can see eventually the validation loss is below the training loss. This could be a case of overfitting.

In [None]:
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

## Testing the Model
---

Here we test the accuracy of the model by giving it data that it has not been trained on from the rest of the time series. We can see how it continues trends and whether that matches the data that we have. We have to scale back out of our reduced scaling before calculating the *Root Mean Squared Error* metric for accuracy.

In [None]:
yhat = model.predict(test_X)
print(yhat.shape)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)

In [None]:
pyplot.plot(test_X[:,0], inv_y)
pyplot.plot(test_X[:,0], inv_yhat)
pyplot.show()