# LSTM timeseries forecasting with Keras Tuner
> A example of using an LSTM network to forecast timeseries, using Keras Tuner for hyperparameters tuning.

- toc: true 
- badges: true
- comments: true
- categories: [lstm, keras, keras tuner, python, machine learning, timeseries]
- image: images/2021-05-31-LSTM timeseries forecasting with Keras Tuner-MAIN.jpg

# About

This notebook is a demonstration of some of capabilities of [Keras Tuner](https://github.com/keras-team/keras-tuner). This project is an attempt to use an `LSTM` based neural network (RNN) to forecast `timeseries` data.

## The required libraries

Import the `must-have` libraries:

In [1]:
#collapse-output
import numpy as np
import pandas as pd
import datetime as dt

Import the `keras` elements from the `tensorflow` library:

In [2]:
#collapse-output
import tensorflow as tf
from tensorflow import keras as k
from tensorflow.keras import backend as kb
from tensorflow.keras import callbacks as kc
from tensorflow.keras import models as km
from tensorflow.keras import layers as kl
from tensorflow.keras import regularizers as kr
from tensorflow.keras import optimizers as ko
from tensorflow.keras import utils as ku

Import the `keras-tuner` library as we'll use it to tune `hyperparameters`:

In [3]:
#collapse-output
import kerastuner as kt
from kerastuner import tuners as ktt

Import the `mlviz` library used to plot `time-series` visualizations:

In [5]:
#collapse-output
from mlviz.timeseries import visualizationhelpers as mwvh
from mlviz.utilities import graphichelpers as mwgh

## The timeseries data

The input data is available in a `csv` file named `2021-05-31-LSTM timeseries forecasting with Keras Tuner-DATA.csv` located in the `data` folder. It has got 2 columns `date` containing the date of event and `value` holding the value of the source. We'll rename these 2 columns as `ds` and `y` for convenience. Let's load the `csv` file using the `pandas` library and have a look at the data.

In [24]:
#collapse-hide
df = pd.read_csv(
    filepath_or_buffer='../assets/data/2021-05-31-LSTM timeseries forecasting with Keras Tuner-DATA.csv',
    sep=';')

df.rename(
    columns = {
        'date': 'index',
        'value': 'y'
    }, 
    inplace=True)

df['index'] = pd.to_datetime(
    arg=df['index'], 
    dayfirst=True)

df.sort_values(
    by='index', 
    ascending=True,
    inplace=True)

df.set_index(
    keys='index', 
    inplace=True)

df = df.asfreq(
    freq='W-SAT')

df['ds'] = df.index

print('df.shape = {0}'.format(df.shape))

df.tail(5)

df.shape = (625, 2)


Unnamed: 0_level_0,y,ds
index,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-09-28,5547,2019-09-28
2019-10-05,6459,2019-10-05
2019-10-12,5838,2019-10-12
2019-10-19,5894,2019-10-19
2019-10-26,7925,2019-10-26


### Data preparation

Let's keep the fisrt 80% of the data for `training` and the last 20% for `testing`. The cutoff date would be as follow.

In [28]:
#hide_input
threshold_date = pd.to_datetime(df.index[int(df.shape[0] * .8)])
print('Cutoff date for training/testing split is {0}'.format(threshold_date.strftime('%d/%m/%Y')))

Cutoff date for training/testing split is 10/06/2017


We can now split the dataframe into 2:

In [32]:
# Define filter to split the testing data.
test_mask = (df['ds'] >= threshold_date)

In [33]:
df_train = df[~test_mask]
print('df_train.shape = {0}'.format(df_train.shape))

df_train.shape = (500, 2)


In [34]:
df_test = df[test_mask]
print('df_test.shape = {0}'.format(df_test.shape))

df_test.shape = (125, 2)
