# Household Power Consumption: prediction of electric usage

#### We develop a model that predicts future household electric power consumption from previous usage. The model needs to infer the next twenty four observations based on the past twenty four. The baseline model given to beat throws a test MAE of approximately 0.055. 

We load the dataset and explore some of its statistics.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

In [2]:
data = pd.read_table('household_power_consumption.csv', sep=',')
data.head()

Unnamed: 0,datetime,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,2006-12-16 17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,2006-12-16 17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,2006-12-16 17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,2006-12-16 17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
4,2006-12-16 17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86400 entries, 0 to 86399
Data columns (total 8 columns):
datetime                 86400 non-null object
Global_active_power      86400 non-null float64
Global_reactive_power    86400 non-null float64
Voltage                  86400 non-null float64
Global_intensity         86400 non-null float64
Sub_metering_1           86400 non-null float64
Sub_metering_2           86400 non-null float64
Sub_metering_3           86400 non-null float64
dtypes: float64(7), object(1)
memory usage: 5.3+ MB


In [4]:
data.describe()

Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
count,86400.0,86400.0,86400.0,86400.0,86400.0,86400.0,86400.0
mean,1.644244,0.128601,240.964285,6.952023,1.305127,1.878669,7.514213
std,1.335542,0.117621,3.498536,5.629463,6.682567,7.567679,8.671909
min,0.194,0.0,224.68,0.8,0.0,0.0,0.0
25%,0.396,0.0,238.61,1.8,0.0,0.0,0.0
50%,1.416,0.116,241.22,5.8,0.0,0.0,0.0
75%,2.414,0.196,243.47,10.0,0.0,1.0,17.0
max,9.272,0.874,251.7,40.4,77.0,78.0,20.0


The dataset is quite clean so we procede to the other steps in preparation for training a model.

-----

Here we define the dataset for training and testing. The split time was provided to us and couldn't be changed.

In [5]:
data_ = data.values[:,1:]

data_min = np.min(data_, axis=0)
data_max = np.max(data_, axis=0)

In [6]:
split_time = 69120

t_train = np.array(range(0,split_time+1))
x_train = (data_[:split_time,:]-data_min)/data_max

t_test = np.array(range(split_time+1,data.shape[0]+1))
x_test = (data_[split_time:,:]-data_min)/data_max

In [7]:
n_past = 24  
n_future = 24 
window_size = n_past + n_future

X_train = []
Y_train = []
for i in range(x_train.shape[0]-window_size):
    X_train.append(x_train[i:i+n_past,:])
    Y_train.append(x_train[i+n_past:i+window_size,:])
X_train = np.array(X_train, dtype='float32')
Y_train = np.array(Y_train, dtype='float32')

X_test = []
Y_test = []
for i in range(x_test.shape[0]-window_size):
    X_test.append(x_test[i:i+n_past,:])
    Y_test.append(x_test[i+n_past:i+window_size,:])
X_test = np.array(X_test, dtype='float32')
Y_test = np.array(Y_test, dtype='float32')

In [8]:
X_train.shape, Y_train.shape, X_test.shape, Y_test.shape

((69072, 24, 7), (69072, 24, 7), (17232, 24, 7), (17232, 24, 7))

-----

Here we train our first model, a Deep Neural Network with LSTM.

In [9]:
def Model_1():
    f1 = tf.keras.layers.LSTM(units=128,
                              activation='tanh',
                              recurrent_activation='sigmoid',
                              kernel_initializer='glorot_uniform',
                              bias_initializer='zeros',
                              recurrent_initializer='zeros',
                              return_sequences=True,
                              return_state = False)
    f2 = tf.keras.layers.Dropout(rate=0.5)
    f3 = tf.keras.layers.Dense(units=Y_train.shape[1:][1],
                              activation='linear',
                              kernel_initializer='glorot_uniform',
                              bias_initializer='zeros')
    x = tf.keras.Input(shape=X_train.shape[1:])
    a1 = f1(x)
    a2 = f2(a1)
    y = f3(a2)
    model = tf.keras.Model(x, y)   
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, 
                                         beta_1=0.9, 
                                         beta_2=0.999, 
                                         epsilon=1e-07)
    model.compile(loss='huber_loss', metrics=['mae'], optimizer=optimizer)
    model.summary()
    callback1 = tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', 
                                                     patience=10,
                                                     min_delta=0.001,
                                                     factor=0.1, 
                                                     min_lr=0.0001)
    callback2 = tf.keras.callbacks.EarlyStopping(monitor='loss',
                                                 patience=20,
                                                 min_delta=0.001)
    model.fit(X_train, Y_train, epochs=100, batch_size=64, callbacks=[callback1, callback2], validation_data=(X_test, Y_test))
    return model

In [10]:
model_1 = Model_1()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 24, 7)]           0         
_________________________________________________________________
lstm (LSTM)                  (None, 24, 128)           69632     
_________________________________________________________________
dropout (Dropout)            (None, 24, 128)           0         
_________________________________________________________________
dense (Dense)                (None, 24, 7)             903       
Total params: 70,535
Trainable params: 70,535
Non-trainable params: 0
_________________________________________________________________
Train on 69072 samples, validate on 17232 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 

In [11]:
model_1.evaluate(X_train, Y_train)



[0.008524559389162663, 0.065180674]

In [12]:
model_1.evaluate(X_test, Y_test)



[0.008966798847627687, 0.068456165]

This first model is performing worse than the baseline model so we discard it.

-----

Here we train our second model, a Deep Neural Network with Bidirectional LSTM.

In [13]:
def Model_2():
    f1 = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=128,
                                                            activation='tanh',
                                                            recurrent_activation='sigmoid',
                                                            kernel_initializer='glorot_uniform',
                                                            bias_initializer='zeros',
                                                            recurrent_initializer='zeros',
                                                            return_sequences=True,
                                                            return_state = False), 
                                       merge_mode='concat')
    f2 = tf.keras.layers.Dropout(rate=0.5)
    f3 = tf.keras.layers.Dense(units=128,
                               activation='relu',
                               kernel_initializer='glorot_uniform',
                               bias_initializer='zeros')
    f4 = tf.keras.layers.Dropout(rate=0.5)
    f5 = tf.keras.layers.Dense(units=128,
                               activation='relu',
                               kernel_initializer='glorot_uniform',
                               bias_initializer='zeros')
    f6 = tf.keras.layers.Dropout(rate=0.5)
    f7 = tf.keras.layers.Dense(units=Y_train.shape[1:][1],
                              activation='linear',
                              kernel_initializer='glorot_uniform',
                              bias_initializer='zeros')
    x = tf.keras.Input(shape=X_train.shape[1:])
    a1 = f1(x)
    a2 = f2(a1)
    a3 = f3(a2)
    a4 = f4(a3)
    a5 = f5(a4)
    a6 = f6(a5)
    y = f7(a6)
    model = tf.keras.Model(x, y)   
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, 
                                         beta_1=0.9, 
                                         beta_2=0.999, 
                                         epsilon=1e-07)
    model.compile(loss='huber_loss', metrics=['mae'], optimizer=optimizer)
    model.summary()
    callback1 = tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', 
                                                     patience=10,
                                                     min_delta=0.001,
                                                     factor=0.1, 
                                                     min_lr=0.0001)
    callback2 = tf.keras.callbacks.EarlyStopping(monitor='loss',
                                                 patience=20,
                                                 min_delta=0.001)
    model.fit(X_train, Y_train, epochs=100, batch_size=64, callbacks=[callback1, callback2], validation_data=(X_test, Y_test))
    return model

In [14]:
model_2 = Model_2()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 24, 7)]           0         
_________________________________________________________________
bidirectional (Bidirectional (None, 24, 256)           139264    
_________________________________________________________________
dropout_1 (Dropout)          (None, 24, 256)           0         
_________________________________________________________________
dense_1 (Dense)              (None, 24, 128)           32896     
_________________________________________________________________
dropout_2 (Dropout)          (None, 24, 128)           0         
_________________________________________________________________
dense_2 (Dense)              (None, 24, 128)           16512     
_________________________________________________________________
dropout_3 (Dropout)          (None, 24, 128)           0   

In [15]:
model_2.evaluate(X_train, Y_train)



[0.005280584946508505, 0.04756076]

In [16]:
model_2.evaluate(X_test, Y_test)



[0.005532963803531334, 0.049253885]

This second model beats the provided baseline model MAE so we propose it as the new baseline model to beat.

-----

Here we train our third and final model, a Deep Neural Network with Conv1D and Bidirectional LSTM.

In [17]:
def Model_3():
    f1 = tf.keras.layers.Conv1D(filters=256, 
                                kernel_size=5, 
                                strides=1,
                                padding='causal',
                                activation='relu',
                                kernel_initializer='glorot_uniform',
                                bias_initializer='zeros')
    f2 = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=128,
                                                            activation='tanh',
                                                            recurrent_activation='sigmoid',
                                                            kernel_initializer='glorot_uniform',
                                                            bias_initializer='zeros',
                                                            recurrent_initializer='zeros',
                                                            return_sequences=True,
                                                            return_state = False), 
                                       merge_mode='concat')
    f3 = tf.keras.layers.Dropout(rate=0.5)
    f4 = tf.keras.layers.Dense(units=128,
                               activation='relu',
                               kernel_initializer='glorot_uniform',
                               bias_initializer='zeros')
    f5 = tf.keras.layers.Dropout(rate=0.5)
    f6 = tf.keras.layers.Dense(units=128,
                               activation='relu',
                               kernel_initializer='glorot_uniform',
                               bias_initializer='zeros')
    f7 = tf.keras.layers.Dropout(rate=0.5)
    f8 = tf.keras.layers.Dense(units=Y_train.shape[1:][1],
                                activation='linear',
                                kernel_initializer='glorot_uniform',
                                bias_initializer='zeros')
    x = tf.keras.Input(shape=X_train.shape[1:])
    a1 = f1(x)
    a2 = f2(a1)
    a3 = f3(a2)
    a4 = f4(a3)
    a5 = f5(a4)
    a6 = f6(a5)
    a7 = f7(a6)
    y = f8(a7)
    model = tf.keras.Model(x, y)   
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, 
                                         beta_1=0.9, 
                                         beta_2=0.999, 
                                         epsilon=1e-07)
    model.compile(loss='huber_loss', metrics=['mae'], optimizer=optimizer)
    model.summary()
    callback1 = tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', 
                                                     patience=5,
                                                     min_delta=0.001,
                                                     factor=0.1, 
                                                     min_lr=0.0001)
    callback2 = tf.keras.callbacks.EarlyStopping(monitor='loss',
                                                 patience=20,
                                                 min_delta=0.001)
    model.fit(X_train, Y_train, epochs=100, batch_size=64, callbacks=[callback1, callback2], validation_data=(X_test, Y_test))
    return model

In [18]:
model_3 = Model_3()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 24, 7)]           0         
_________________________________________________________________
conv1d (Conv1D)              (None, 24, 256)           9216      
_________________________________________________________________
bidirectional_1 (Bidirection (None, 24, 256)           394240    
_________________________________________________________________
dropout_4 (Dropout)          (None, 24, 256)           0         
_________________________________________________________________
dense_4 (Dense)              (None, 24, 128)           32896     
_________________________________________________________________
dropout_5 (Dropout)          (None, 24, 128)           0         
_________________________________________________________________
dense_5 (Dense)              (None, 24, 128)           1651

In [19]:
model_3.evaluate(X_train, Y_train)



[0.00494401850777542, 0.04561259]

In [20]:
model_3.evaluate(X_test, Y_test)



[0.005590735885979194, 0.0490793]

#### As we can see, this last model beats the provided baseline model MAE, so it could be considered a candidate solution for the original problem presented. Additional exploration of hyperparameters would be necessary to make a final conclusion. 