The dataset is sourced from [Kaggle](https://www.kaggle.com/datasets/robikscube/hourly-energy-consumption?select=PJMW_hourly.csv) and made available for public use. Please refer to the [source](https://www.kaggle.com/datasets/robikscube/hourly-energy-consumption?select=PJMW_hourly.csv) for any specific terms of use or licensing information.


Import Library

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from keras.layers import Dense, LSTM

Import Dataset

In [2]:
url ="https://raw.githubusercontent.com/dimsdika12/EnergyConsumption-TimeSeries-MLModel/main/dataset/PJMW_hourly.csv"
data_train = pd.read_csv(url)
data_train.head()

Unnamed: 0,Datetime,PJMW_MW
0,2002-12-31 01:00:00,5077.0
1,2002-12-31 02:00:00,4939.0
2,2002-12-31 03:00:00,4885.0
3,2002-12-31 04:00:00,4857.0
4,2002-12-31 05:00:00,4930.0


Checking and counting missing values in the 'data_train' dataset

In [3]:
data_train.isnull().sum()

Datetime    0
PJMW_MW     0
dtype: int64

Displaying the shape of the 'data_train' DataFrame

In [4]:
data_train.shape

(143206, 2)

Create a plot of data

Normalize data

In [5]:
min_max_scaler = MinMaxScaler()
data_train[["PJMW_MW"]] = min_max_scaler.fit_transform(data_train[["PJMW_MW"]])
data_train.head()

Unnamed: 0,Datetime,PJMW_MW
0,2002-12-31 01:00:00,0.504008
1,2002-12-31 02:00:00,0.488855
2,2002-12-31 03:00:00,0.482925
3,2002-12-31 04:00:00,0.479851
4,2002-12-31 05:00:00,0.487866


Split dataset into training and validation sets

In [6]:
energy_normalized = data_train['PJMW_MW'].values
train_energy, val_energy = train_test_split(energy_normalized, test_size=0.2, shuffle=False)

In [7]:
num_data_train = train_energy.shape[0]
num_data_val = val_energy.shape[0]

print(f"Number of data points in data_train: {num_data_train}")
print(f"Number of data points in data_val: {num_data_val}")

Number of data points in data_train: 114564
Number of data points in data_val: 28642


windowed function

In [8]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    series = tf.expand_dims(series, axis=-1)
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[-1:]))
    return ds.batch(batch_size).prefetch(1)

Define windowed datasets

In [9]:
train_set = windowed_dataset(train_energy, window_size=60, batch_size=100, shuffle_buffer=1000)
val_set = windowed_dataset(val_energy, window_size=60, batch_size=100, shuffle_buffer=1000)

model

In [10]:
model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(60, return_sequences=True),
    tf.keras.layers.LSTM(60),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(30, activation="relu"),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(10, activation="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1),
])

compile

In [11]:
optimizer = tf.keras.optimizers.SGD(lr=1.0000e-04, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])



train

In [12]:
threshold_mae = (data_train['PJMW_MW'].max() - data_train['PJMW_MW'].min()) * 10/100

class AccuracyThresholdCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
      if(logs.get('mae')< threshold_mae and logs.get('val_mae')<threshold_mae):
        print("\nthe model has an MAE value < 10% of the data scale!")
        self.model.stop_training = True
callback = AccuracyThresholdCallback()

history = model.fit(train_set, epochs=100, validation_data=val_set, callbacks=[callback])

Epoch 1/100
Epoch 2/100
Epoch 3/100
the model has an MAE value < 10% of the data scale!
