# Welcome Dear !!!
### This Jupyter Notebook will run on Google Colab :)

## This project will analyse the time series power consumption data (2 million rows) using deep learning 
### The aim is just to show how to build the simplest Long Short-Term Memory (LSTM) recurrent neural network for the data.

#### lets do!

The description of data can be found here:
http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

Attribute Information:
1.date: Date in format dd/mm/yyyy

2.time: time in format hh:mm:ss

3.global_active_power (output label): household global minute-averaged active power (in kilowatt)

4.global_reactive_power: household global minute-averaged reactive power (in kilowatt)

5.voltage: minute-averaged voltage (in volt)

6.global_intensity: household global minute-averaged current intensity (in ampere)

7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).

8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.

9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [0]:
# data = https://drive.google.com/open?id=1Qm1L8izAJ-8NAt2ZROmtAVSf1CNEPyGH
id = "1Qm1L8izAJ-8NAt2ZROmtAVSf1CNEPyGH"

In [0]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials 

In [0]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [None]:
download = drive.CreateFile({'id':id})
download.GetContentFile('household_power_consumption.txt')
print(f"data has been download to google colab")

In [0]:
df = pd.read_csv('household_power_consumption.txt', sep = ';', 
                 parse_dates={'datetime':['Date','Time']},
                 na_values=['nan','?'],
                 index_col = 'datetime'
                 )

In [None]:
df.head(100)

In [None]:
df.shape

In [None]:
df.describe(include='all')

In [None]:
df.info()

In [None]:
# remove null values
df.isnull().sum()

In [None]:
# bearable outliers
df.Global_active_power.plot(kind='box') 

In [0]:
# df.fillna({
#     'Global_active_power':np.mean()
# })
from sklearn.preprocessing import Imputer
from sklearn.pipeline import Pipeline

In [None]:
cat_pipe = Pipeline([
       ('imputer', Imputer(strategy='median'))              
])
cleaned_data = cat_pipe.fit_transform(df)

In [None]:
clean_df = pd.DataFrame(cleaned_data,columns=df.columns)
clean_df.isnull().sum()

In [0]:
clean_df.set_index(df.index, inplace = True)

In [None]:
# now explore the monthly wise gloabl active power
monthly_resampled_data_mean = clean_df.Global_active_power.resample('M').mean()
monthly_resampled_data_sum = clean_df.Global_active_power.resample('M').sum()

monthly_resampled_data_mean.plot(title = 'Global_active_power resampled over month for mean')
plt.tight_layout()
plt.show() 

monthly_resampled_data_sum.plot(title = 'Global_active_power resampled over month for sum', color = 'red')
plt.tight_layout()
plt.show() 


In [None]:
r2 = clean_df.Global_reactive_power.resample('M').agg(['mean', 'std'])
r2.plot(subplots = True, title='Global_reactive_power resampled over day', color='red')
plt.show()

In [None]:
r2 = clean_df.Voltage.resample('M').agg(['mean', 'std'])
r2.plot(subplots = True, title='Voltage resampled over month', color='red')
plt.show()

In [None]:
# sns.pairplot(clean_df, kind = 'reg')
sns.pairplot(clean_df)
plt.show()
# KDE Plot described as Kernel Density Estimate is used for visualizing the Probability Density of a continuous variable. 
# It depicts the probability density at different values in a continuous variable. 
# We can also plot a single graph for multiple samples which helps in more efficient data visualization

In [None]:
# global active power and gloabl density are directly proportional to each other
clean_df.Global_reactive_power.resample('W').mean().plot(color='y', legend=True)
clean_df.Global_active_power.resample('W').mean().plot(color='r', legend=True)
clean_df.Sub_metering_1.resample('W').mean().plot(color='b', legend=True)
clean_df.Global_intensity.resample('W').mean().plot(color='g', legend=True)
plt.show()

In [None]:
clean_df.Global_reactive_power.resample('W').mean().plot(kind = 'hist', color='y', legend=True)
clean_df.Global_active_power.resample('W').mean().plot(kind = 'hist', color='r', legend=True)
clean_df.Sub_metering_1.resample('W').mean().plot(kind = 'hist', color='b', legend=True)
clean_df.Global_intensity.resample('W').mean().plot(kind = 'hist',color='g', legend=True)
plt.show()

In [None]:
# find the percentage change with the previous row 
data_returns = clean_df.pct_change()
data_returns

In [None]:
sns.jointplot(x='Voltage', y='Global_active_power', data=data_returns)  
plt.show()

# Machine-Leaning: LSTM Data Preparation and feature engineering
### * I will apply recurrent nueral network (LSTM) which is best suited for time-seriers and sequential problem. This approach is the best if we have large data.  

### * ***Its time to convert Time Series data  into a Supervised Learning Problem***

https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

In [0]:


def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	dff = pd.DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(dff.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(dff.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = pd.concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg
 

### * In order to reduce the computation time, and also get a quick result to test the model.  One can resmaple the data over hour (the original data are given in minutes). This will reduce the size of data from 2075259 to 34589 but keep the overall strucure of data as shown in the above.  

In [0]:
resamble_data_hours = clean_df.resample('h').mean() 

In [None]:
resamble_data_hours.shape

In [0]:
# its time to normalize the data
from sklearn.preprocessing import MinMaxScaler

In [0]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled = scaler.fit_transform(resamble_data_hours)

In [None]:
scaled # normalized data

In [0]:
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)

In [None]:
reframed

output variables var1(t)(Global Active Power)	var2(t)	var3(t)	var4(t)	var5(t)	var6(t)	var7(t)
we only need var1(t) (Global Active Power) Output variabel so, we should delete other(var2(t)	var3(t)	var4(t)	var5(t)	var6(t)	var7(t)) output varibles

In [0]:
# we only need var1(t) (Global Active Power) Output variabel so,
# we should delete other(var2(t)	var3(t)	var4(t)	var5(t)	var6(t)	var7(t)) output varibles

reframed.drop(reframed.columns[[8,9,10,11,12,13]], inplace=True, axis=1)

In [None]:
reframed.head(10)

### Now Its time to split the data into training and validation
### it is time series data so we cant split randomly

### * First, I split the prepared dataset into train and test sets. To speed up the training of the model (for the sake of the demonstration), we will only train the model on the first year of data, then evaluate it on the next 3 years of data.

In [0]:
# from sklearn.model_selection import train_test_split


#### We reshaped the input into the 3D format as expected by LSTMs, namely [samples, timesteps, features].

<ol> 
<li> Samples. One sequence is one sample. A batch is comprised of one or more samples. </li>

<li>Time Steps. One time step is one point of observation in the sample.</li>

<li>
Features. One feature is one observation at a time step.</li>
</ol>

In [None]:
# split into train and test sets
values = reframed.values

n_train_time = 365*24
train = values[:n_train_time, :]
test = values[n_train_time:, :]
##test = values[n_train_time:n_test_time, :]
# split into input and outputs

train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) 
# We reshaped the input into the 3D format as expected by LSTMs, namely [samples, timesteps, features].

# Model architecture

### 1)  LSTM with 100 neurons in the first visible layer 
### 3) dropout 20%
### 4) 1 neuron in the output layer for predicting Global_active_power. 
### 5) The input shape will be 1 time step with 7 features.

### 6) I use the Mean Absolute Error (MAE) loss function and the efficient Adam version of stochastic gradient descent.
### 7) The model will be fit for 20 training epochs with a batch size of 70.


In [0]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,LSTM,Conv1D,MaxPool1D
from tensorflow.keras.optimizers import SGD,Adam

In [None]:
X_train.shape[2]

In [0]:
model = Sequential()
model.add(LSTM(100, input_shape = (X_train.shape[1],X_train.shape[2])))

model.add(Dropout(0.2))
# model.add(LSTM(80))
# model.add(Dropout(0.3))

model.add(Dense(1))
model.compile(optimizer='adam',loss='mse')

In [None]:
# now fit the model
history = None
history = model.fit(x=X_train,y=Y_train,batch_size=70,epochs=50,verbose=2,validation_data=(X_test,Y_test),shuffle=False)

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title("Loss of Training and Validation")
plt.xlabel("Epoches")
plt.ylabel("Loss")
plt.legend(['Train','Test'], loc = 'upper right')
plt.show()

In [0]:
from sklearn.metrics import mean_squared_error

In [None]:
# invert predictions
# make a prediction
yhat = model.predict(X_test)

test_X = X_test.reshape((X_test.shape[0], 7))

# # invert scaling for forecast
# test_X[:,-6:] mean only features

inv_yhat = np.concatenate((yhat, test_X[:, -6:]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0] #out put variable

# # invert scaling for actual

test_y = Y_test.values.reshape((len(Y_test), 1))
inv_y = np.concatenate((test_y, test_X[:, -6:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE

rmse = np.sqrt(mean_squared_error(inv_y, inv_yhat))
print('Test RMSE: %.3f' % rmse)

In [None]:
# without inverse, its is normlize
y_predict = model.predict(X_test)
mse = np.sqrt(mean_squared_error(Y_test,y_predict))
print(f"The Mean Squarred error is: {mse}")

In [None]:
sample = list(range(200))
plt.figure(figsize=(10,5))
plt.plot(sample,inv_y[:200],marker = '.', label = 'Actual')

plt.plot(sample,inv_yhat[:200],marker = '.', label = 'Prediction')
plt.ylabel('Global_active_power', size=15)
plt.xlabel('Time step', size=20)
plt.legend(fontsize=15)
plt.title("This is not Overfitting the data")
plt.show()

### * Here I have used the LSTM neural network which is now the state-of-the-art for sequencial problems. 

### * In order to reduce the computation time, and get some results quickly, I took the first year of data (resampled over hour) to train the model and the rest of data to test the model.  

### * I put together a very simple LSTM neural-network to show that one can obtain reasonable predictions. However numbers of rows is too high and as a result the computation is very time-consuming (even for the simple model in the above it took few mins to be run on  2.8 GHz Intel Core i7).  The Best is to write the last part of code using Spark (MLlib) running on GPU.  

### * Moreover, the neural-network architecture that I have designed is a toy model. It can be easily improved by adding CNN  and dropout layers.  The CNN is useful here since there are correlations in data (CNN layer is a good way to probe the local structure of data).  

if you have any query related to RNN, Plz feel free to ask at khizersultan007@gmail.com