![logo](../../picture/license_header_logo.png)
> **Copyright &copy; 2020 - 2021 CertifAI Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). <br>
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0**

# 03 - Recap Exercise 

## Requirement
You should complete all the exercises from Day 1 to Day 3 before proceeding with this recap exercise to optimize your learning experience. 


## Introduction 
This tutorial is to demonstrate the process flow of building time series forecast algorithms from scratch.<br>

>**Example of time series forecasting flow diagram:**
![image](https://user-images.githubusercontent.com/59526258/117595686-0f8aa800-b174-11eb-8996-b08559378fad.png)

This exercise will use five different types of time series forecast model to solve the problem:

1. Naive Forecast
2. Exponential Smoothing Average
3. ARIMA 
4. SARIMA
5. Multilayer perceptron (MLP)

## Problem Statement
You are given a set of the dataset that measures milk production(pounds per cow) as per month from January 1962 to December 1975. You are required to build a time series forecast algorithm from scratch. Your algorithm must include:
1. Basic analytics of the data.
2. Time series modeling with statistical method.
3. Time series modeling with deep learning method.
4. Compare and choose the best model base on the performance.

## What will we accomplish?
By the end of this tutorial, you will be able to:|
1. Understand the time series forecasting flow.
2. Compare and select the optimal model as forecasting model base on the model performance. 


## Notebook Outline
Below is the outline for this tutorial
1. [Basic Analytics](#BasicAnalytics)
    * [Data Preparation](#DataPreparation)
    * [Data Visualization](#DataVisualization)
    * [Data Splitting](#DataSplitting)
    * [ACF Plot](#ACFPlot)
    * [Time Series Decomposition](#TimeSeriesDecomposition)
    
2. [Time Series Modeling with Statistical Method](#TimeSeriesModelingwithStatisticalMethod)
    * [Naive Forecast](#NaiveForecast)
    * [Exponential Moving Average](#ExponentialMovingAverage)
        * [Simple Exponential Moving Average (SEMA)](#SimpleExponentialMovingAverage(SEMA))
        * [Holt-Winters Exponential Moving Average Method](#Holt-WintersMethod)
    * [ARIMA Forecast](#ARIMAForecast)
        * [Log Transform](#LogTransform)
        * [Seasonal Differencing (Deseasonalize)](#SeasonalDifferencing(Deseasonalize))
        * [ADF Test](#ADFTest)
        * [1st order differencing (Detrending)](#1storderdifferencing(Detrending))
        * [ACF and PACF plot](#ACFandPACFplot)
        * [ARIMA model configuration](#ARIMAmodelconfiguration)
        * [ARIMA model forecast](#ARIMAmodelforecast)
        * [Reverse Differencing](#ReverseDifferencing)
            * [Reverse 1st order differencing](#Reverse1storderdifferencing)
            * [Reverse seasonal differencing](#Reverseseasonaldifferencing)
        * [Inverse Log Transform](#InverseLogTransform)
    * [SARIMA Forecast](#SARIMAForecast)
    
3. [Time Series Modeling with Deep Learning Method (MLP)](#TimeSeriesModelingwithDeepLearningMethod(MLP))
    * [Hyperparamter](#Hyperparamter)
    * [Data Scaling](#DataScaling)
    * [Window Sliding](#WindowSliding)
    * [Data Iterator](#DataIterator)
    * [Multilayer perceptron (MLP) configuration](#Multilayerperceptron(MLP)configuration)
    * [Input Model](#InputModel)
    * [Model Summary](#ModelSummary)
    * [Training](#Training)
    * [Validation](#Validation)
4. [Summary](#Summary)
5. [Reference](#Reference)

First, let's import the package needed.

In [None]:
# Import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import seaborn as sns
import math
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from statsmodels.graphics.tsaplots import plot_acf , plot_pacf
from statsmodels.tsa.holtwinters import Holt, ExponentialSmoothing, SimpleExpSmoothing
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima_model import ARIMA
from matplotlib import pylab
from torch.utils.data import Dataset, DataLoader, TensorDataset
import torch
import torch.nn as nn
import torch.nn.functional as F
%matplotlib inline 
pylab.rcParams['figure.figsize'] = (10.0, 8.0)
import warnings
warnings.filterwarnings("ignore")

## <a name="BasicAnalytics">1. Basic Analytics</a>
### <a name="DataPreparation">1.1 - Data Preparation
In data preparation, you are required to read the data and make the `Month` as the index.
>**Instruction:**<br>
Change the data frame index as `Month` and set the frequency as `MS` using `df.index.freq`<br>

>**Expected Result:**<br>
Example of first 5 row data 

Month| Monthly milk production (pounds per cow)
---|---
1962-01-01|	589
1962-02-01|	561
1962-03-01|	640
1962-04-01|	656
1962-05-01|	727

In [None]:
# Read the CSV data
milk_data = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/monthly-milk-production-pounds.csv")
milk_data.head()

In [None]:
# Set "Month" as data index
# YOUR CODE HERE

milk_data.head()

### <a name="DataVisualization">1.2 - Data Visualization

In [None]:
# Visualize the data
milk_data.plot()
plt.title("Montly Milk Production")
plt.ylabel("Pounds per cow")

As you can notice, the data have shown that it has increased in trend and it contains seasonality.

### <a name="DataSplitting">1.3 - Data Splitting
Split the data into train and test data using `train_test_split`

In [None]:
# Data Splitting
split_ratio = 0.7
train_data, test_data = train_test_split(milk_data, train_size=split_ratio , shuffle = False)
train_time,test_time = train_data.index, test_data.index

Before we build the model, we must analyze the time series data pattern. There are two ways to visualize the data seasonality:
1. ACF plot
2. Time Series Decomposition

### <a name="ACFPlot"> 1.4 - ACF Plot

In [None]:
# Create ACF Plot
plot_acf(train_data,lags = 60)
plt.show()

### <a name="TimeSeriesDecomposition">1.5 - Time Series Decomposition
Decomposition gives us more details about the time series data pattern by decomposing the data into trend, seasonality, and residual.
>**Instruction:**<br>
Choose the correct decomposition parameters and perform time series decomposition to the `train_data`. Save the output with a variable name `decomposition`.

>**Expected Result:**<br>
![image](https://user-images.githubusercontent.com/59526258/117621580-52647400-b1a4-11eb-9590-96ed7fc97198.png)

In [None]:
# Time Series Decomposition
# YOUR CODE HERE

decomposition.plot()
plt.show()

## <a name="TimeSeriesModelingwithStatisticalMethod">2. Time Series Modeling with Statistical Method
### <a name="NaiveForecast">2.1 - Naive Forecast
First, we will use the naive forecast method as our benchmark model. We only accept those model that is out performing than the naive forecast model based on the result.


In [None]:
# Naive forecast method 
naive_forecast = test_data.shift(1)

# Function to plot the forecast data
def forecast_plot(forecast_data,forecast_label, test_label='Test Data',test_time = test_time,test_data= test_data):
    plt.figure(figsize=(15,6))
    plt.plot(test_time,forecast_data,'r',label = forecast_label,)
    plt.plot(test_time,test_data,label = test_label)
    plt.legend()
    plt.title("Montly Milk Production Forecast")
    plt.xlabel("Month")
    plt.ylabel("Pounds per cow")
    
# Plot the forecast data
forecast_plot(forecast_data = naive_forecast,forecast_label = 'Naive Forecast')    

In [None]:
# Calculate RMSE of naive model
testScore_naive = math.sqrt(mean_squared_error(test_data[1:], naive_forecast[1:]))
testScore_naive

In [None]:
# Save the result into Dataframe
result = pd.DataFrame({'Naive Forecast' :testScore_naive},index=["RMSE"])
result

### <a name="ExponentialMovingAverage">2.2 - Exponential Moving Average 
#### <a name="SimpleExponentialMovingAverage(SEMA)">2.2.1 - Simple Exponential Moving Average (SEMA)
Let's start with the Simple Exponential Moving Average (SEMA) to determine whether this method is the correct method to use as our forecasting model.
>**Instruction:**<br>
Use `SimpleExpSmoothing` as the forecast model and fit it with `train_data`. Your forecast data must be same length as the test_data. Save the return with a variable name with `sema_forecast`

>**Expected Result:**<br>
![image](https://user-images.githubusercontent.com/59526258/117624498-8f7e3580-b1a7-11eb-8f46-781b87644a0e.png)


In [None]:
# Simple Exponential Moving Average (SEMA)
# YOUR CODE HERE

sema_forecast.head()

In [None]:
# Forecast data plot
forecast_plot(forecast_data = sema_forecast,forecast_label = 'SEMA Forecast' )    

In [None]:
# # Save the result into Dataframe
sema_forecast_result = math.sqrt(mean_squared_error(sema_forecast,test_data))
result['SEMA'] =  sema_forecast_result 
result

It seems like Simple Exponential Moving Average is not a good model for the data with seasonality. Let's try the Holt-Winters Method because the model is designed to handle the time series data with seasonality.

#### <a name="Holt-WintersMethod">2.2.2 - Holt-Winters Exponential Moving Average Method
>**Instruction:**<br>
Perform both additive and multiplicative Holt-Winters Method with the `train_data` and save the model result into `result`. The number of forecast data must be the same size as the `test_data`. 

>*Hints: You may use For Loop to assist you with the iterate of parameters such as trend and seasonal.* <br>
Example:<br>
method = ['add','mul']<br>
&ensp; for trend in method:<br>
&ensp;&ensp;     for seasonal in method:<br>
&ensp;&ensp;&ensp;    ----<br>
&ensp;&ensp;&ensp;    ----<br>
&ensp;&ensp;&ensp;    ----<br>

>**Expected Result:**<br>

---|Naive Forecast|	SEMA|	add_add_forecast|	add_mul_forecast|	mul_add_forecast|	mul_mul_forecast
---|---|	---|	---|	---|	---|	---
RMSE|	47.390505|	85.317285|	51.227547|	43.414228|	65.745351|	56.389882


In [None]:
# Define Holt-Winters Exponential Moving Average model
# YOUR CODE HERE

result

The result shows that Holt-Winters Method with the additive trend and multiplicative seasonal parameter is the best. Let's visualize the forecast data.

In [None]:
# Use additive trend and multiplicative seasonal Holt-Winters method to forecast
exponential = ExponentialSmoothing(train_data, seasonal_periods=12, trend='add', seasonal='mul').fit()
exponential_forecast = exponential.forecast(len(test_data))

In [None]:
# Exponential forecast plot
forecast_plot(forecast_data = exponential_forecast,forecast_label = 'Holt-Winters Method Forecast')  

### <a name="ARIMAForecast">2.3 - ARIMA Forecast

#### <a name="LogTransform">2.3.1 - Log Transform

In [None]:
# Log Transform 
train_data_log = np.log(train_data)
train_data_log.plot()
plt.title("Logged Montly Milk Production Forecast")
plt.ylabel("Pounds per cow")

#### <a name="SeasonalDifferencing(Deseasonalize)">2.3.2 -  Seasonal Differencing (Deseasonalize)
`ARIMA` is only good at handling the data without seasonality. Let's remove the data seasonality by using the differencing technique with a `differencing_month = 12`. The `differencing_month` is determined base on the seasonality period of the data.

**How seasonal differencing work?**<br>
The table below shown the calculation process of seasonal differencing (seasonal period=2). 
> The seasonal period can be any values base on the nature of the time series data seasonal period.

After shifting the data down to two columns (due to seasonal period=2), perform subtraction to the `Data` and the shifted data (`Shift(2)`). Your final result will be the data that has been remove seasonality.
![concept](../../picture/seasonal_difference.png)


In [None]:
# Seasonal Differencing
differencing_month = 12 
remove_seasonal = train_data_log.diff(differencing_month)
deseasonal_data = remove_seasonal[differencing_month:]
deseasonal_data.plot()
plt.title("Deseasonalize Montly Milk Production Forecast")
plt.ylabel("Pounds per cow")

After removing the seasonality, it is essential to perform an `ADF` test to check whether the data is achieved through stationary data. The data after seasonal differencing still have the trend properties, which will make the data non-stationary.

#### <a name="ADFTest"> 2.3.3 - ADF Test

In [None]:
# ADF Test
def print_adf_result(adf_result):
    df_results = pd.Series(adf_result[0:4], index=['ADF Test Statistic','P-Value','# Lags Used','# Observations Used'])
    
    for key, value in adf_result[4].items():
        df_results['Critical Value (%s)'% key] = value
    print('Augmented Dickey-Fuller Test Results:')
    print(df_results)
    

adf_result = adfuller(deseasonal_data, maxlag=12)
print_adf_result(adf_result)

It seems that the data is not stationary yet. You are required to perform 1st order differencing to make the data stationary.

####  <a name="1storderdifferencing(Detrending)">2.3.4 - 1st order differencing (Detrending)
**How 1st order differencing work?**<br>
The concept is similar to seasonal differencing. The only difference is 1st order differencing will always perform subtraction to the shifted data with period=1.<br>
For Example:<br>
![detrending_concept](../../picture/TS_detrending_concept.png)
>**Instruction:**<br>
Use `df.diff()` to perform differencing to remove the trend. Save the return with variable name `detrend_data`

>**Expected Result:**<br>
![image](https://user-images.githubusercontent.com/59526258/117633502-ae34fa00-b1b0-11eb-8ec4-0e3788335210.png)

In [None]:
# 1st order differencing (Detrending
# YOUR CODE HERE

detrend_data.plot()
plt.title("Detrending Montly Milk Production Forecast")
plt.ylabel("Pounds per cow")

In [None]:
# ADF Test
result_adf = adfuller(detrend_data, maxlag=12)
print_adf_result(result_adf)

The ADF test show that the data is in stationary now.

#### <a name="ACFandPACFplot">2.3.5 -  ACF and PACF plot
After the data is stationary, use `ACF` and `PACF` to find the `p` and `q` parameters for ARIMA model.<br>
Remind that:  <br>
`p` is determined by `PACF`<br>
`q` is determined by `ACF`<br>
>**Instruction:**<br>
Plot ACF and PACF to find the `p` and `q` parameters for ARIMA model.

>**Expected Result:**<br>
![image](https://user-images.githubusercontent.com/59526258/117634502-8c884280-b1b1-11eb-91c5-d79d91e5f983.png)

In [None]:
# ACF and PACF plot
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(16, 4))
# YOUR CODE HERE

ax1.set_title('ACF of differenced series')
ax2.set_title('PACF of differenced series')
plt.show()

#### <a name="ARIMAmodelconfiguration">2.3.6 - ARIMA model configuration

In [None]:
#  Define ARIMA model
arima = ARIMA(detrend_data.dropna(), order=(5,1,1)).fit()
arima.summary()

#### <a name="ARIMAmodelforecast">2.3.6 - ARIMA model forecast

In [None]:
#  ARIMA model forecast
def arima_forecast(test_data):
    arima_forecast,_,_ = arima.forecast(len(test_data))
    arima_forecast = pd.Series(arima_forecast, index=test_data.index)
    return arima_forecast 
arima_forecast = arima_forecast(test_data)
arima_forecast = pd.Series(arima_forecast)
arima_forecast

#### <a name="ReverseDifferencing">2.3.7 - Reverse Differencing
After getting the forecast data, you must reverse the differencing data because the forecast data is not on a correct scale.

**Data Differencing Roadmap:**<br>
Log Data -> Seasonal Differencing -> 1st Order Differencing 

**Reverse Differencing Roadmap:**<br>
Reverse 1st Order Differencing -> Reverse Seasonal Differencing -> Inverse Log Data

##### <a name="Reverse1storderdifferencing">2.3.7.1 - Reverse 1st order differencing

**Strategy to perform reverse 1st order differencing in time series data**<br>
1. Stack the forecast data and `detrend_data` together and save it as `series_merge`
2. Copy the first row of `deseasonal_data` and stack with the `series_merge`
3. Perform cumulative sum over the `series_merge`

**How to perform reverse 1st order differencing.**<br>
Instead of subtraction, the reverse of 1st order differencing using addition. The cache columns are the places to temporary store the previous answer after performing the addition between the previous data and previous cache, as shown below figure:: 
![detrending_concept](../../picture/reverse_differencing.png)

> `Cache` can be an empty list that able to store and append the latest answer 

In Python, there is an optional way to do this operation using `df.cumsum()`, which means perform a cumulative sum towards the data.
For example:<br>
![detrending_concept](../../picture/cumulative_sum.png)

**How to perform reverse 1st order differencing after getting the forecast data**<br>
First, you are required to stack the forecast data then perform the cumulative sum over the list.<br>
For example:<br>
![detrending_concept_forecast](../../picture/reverse_differencing_forecast.png)

In [None]:
# Stack the forecast data and detrend_data together and save it as series_merge
series_merge = np.hstack([detrend_data['Monthly milk production (pounds per cow)'].values,arima_forecast])

# Copy the first row of deseasonal_data and stack with the series_merge
original_data = deseasonal_data.values[0]

# Perform cumulative sum over the series_merge
trend = np.hstack([original_data,series_merge]).cumsum()

Perform sanity check to make sure the `Deseasonal Data` is identical with the `trend` data except for the forecast data 

In [None]:
plt.figure(figsize=(10,4))
plt.plot(milk_data.index[12:],trend,'r',label='Forecast Data')
plt.title("Reverse 1st order differencing")
plt.plot(deseasonal_data,label='Deseasonal Data')
plt.legend()
plt.ylabel("Pounds per cow")

##### <a name="Reverseseasonaldifferencing">2.3.7.2 - Reverse seasonal differencing
**Strategy to perform reverse seasonal differencing in time series data**<br>
1. Stack the first 12 of original data(data that before remove the seasonality) and `trend` data undergoing reverse 1st order differencing together. 
2. Iterate and perform summation until the end of the list.

**How to perform reverse seasonal differencing?**<br>
The concept is almost similar with the reverse 1st order differencing. The only difference is that you need to create an empty list to store the `cache`.<br> 
For example, below table shown the calculation process of reverse seasonal differencing with seasonal period=2.<br>
![detrending_concept_forecast](../../picture/Deseasonal_Concept.png)
    
In Python, you can utilize the list `append()` method to store the original data and the result after performing addition with `cache`.
![detrending_concept_forecast](../../picture/python_reverse_seasonality.png)

In [None]:
# Create the `cache`
inverse_seasonal = np.vstack([np.zeros((differencing_month,1)),trend.reshape(-1,1)])

In [None]:
# Store the first 12 orignal data to list
seasonal_data = train_data_log.values[:differencing_month].tolist()

# Iterate and perform summation until the end of the list
for i in range(differencing_month,len(trend)+differencing_month):
    seasonal_data.append(seasonal_data[i-differencing_month] + inverse_seasonal[i])

Perform sanity check to make sure the `train_data_log` is same with the `seasonal_data` data except for the forecast part

In [None]:
#  Plot the Reverse seasonal differencing for sanity check
plt.plot(milk_data.index,seasonal_data,'r',label='Forecast Data')
plt.plot(train_data_log,label='Logged Train Data')
plt.title("Reverse seasonal differencing")
plt.legend()
plt.ylabel("Pounds per cow")

The `train_data_log` is the same as the `seasonal_data` data except for the forecast part. It means that the reverse transform for 1st order differencing and seasonality differencing is correct. Do take note the data is still on the log scale, you are required to transform it back to the original scale.

#### <a name="InverseLogTransform">2.3.8 - Inverse Log Transform
>**Instruction:**<br>
Use `np.exp` to perform exponential to logged value (`seasonal_data`) to transform the data back to original values.

>**Expected Result:**<br>
array([[589.],<br>
       [561.],<br>
       [640.],<br>
       [656.],<br>
       [727.]])

In [None]:
# Inverse Log Transform
# YOUR CODE HERE

inverse_log[:5]

In [None]:
# Plot the Inverse log data
plt.plot(milk_data.index,inverse_log,label ='Inverse Logged Data')
plt.plot(milk_data.index[-len(test_data):],inverse_log[-len(test_data):],'r',label='Forecast Data')
plt.title("Inverse Logged Data")
plt.ylabel("Pounds per cow")
plt.legend()

In [None]:
# Plot the Forecast Result
arima_prediction = inverse_log[-len(test_data):]
forecast_plot(forecast_data = arima_prediction,forecast_label = 'ARIMA Forecast')  

In [None]:
# Save the result into Dataframe
arima_forecast_result = math.sqrt(mean_squared_error(arima_prediction,test_data))
result['ARIMA Forecast'] =  arima_forecast_result
result

### <a name="SARIMAForecast">2.4 - SARIMA Forecast
>**Instruction:**<br>
Use `pm.arima.auto_arima()` to perform the prediction by using SARIMA model with seasonal period, `m=12`

>**Expected Result:**<br>
Fit ARIMA: order=(2, 0, 2) seasonal_order=(1, 1, 1, 12); AIC=734.227, BIC=755.458, Fit time=0.754 seconds<br>
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 0, 12); AIC=848.245, BIC=853.553, Fit time=0.009 seconds<br>
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 0, 12); AIC=741.855, BIC=752.471, Fit time=0.154 seconds<br>
.<br>
.<br>
.<br>
Fit ARIMA: order=(1, 0, 0) seasonal_order=(0, 1, 1, 12); AIC=733.075, BIC=743.691, Fit time=0.120 seconds<br>

In [None]:
import six
import sys
sys.modules['sklearn.externals.six'] = six
import joblib
sys.modules['sklearn.externals.joblib'] = joblib
import pmdarima as pm

# YOUR CODE HERE


In [None]:
# SARIMA Summary
auto_arima.summary()

In [None]:
# SARIMA model prediction
auto_arima_forecast = auto_arima.predict(len(test_data))
auto_arima_forecast_series = pd.Series(auto_arima_forecast, index=test_data.index)

In [None]:
# Plot the forecast data
forecast_plot(forecast_data = auto_arima_forecast_series, forecast_label = 'SARIMA Forecast') 

In [None]:
# Save the result into Dataframe
sarima_result = math.sqrt(mean_squared_error(test_data,auto_arima_forecast_series))
result['SARIMA Forecast'] =  sarima_result
result

# <a name="TimeSeriesModelingwithDeepLearningMethod(MLP)">3. Time Series Modeling with Deep Learning Method (MLP)

### <a name="Hyperparamter">3.1 - Hyperparamter

In [None]:
# Hyperparamter
window_size = 3
n_epoch = 500
batch_size = 5 

### <a name="DataScaling">3.2 - Data Scaling


In [None]:
# Data Scaling
scaler = StandardScaler().fit(train_data)
train_data_scale = scaler.transform(train_data)
test_data_scale = scaler.transform(test_data)

### <a name="WindowSliding">3.3 - Window Sliding 
#### Optional 1 - Use the previous sliding window function

In [None]:
# Window Sliding Function
def sliding_window(univariate_data,window_size):
    x,y = list(),list()
    for i in range(len(univariate_data)):
        end_ix = i + window_size
        if end_ix > len(univariate_data)-1:
            break
        seq_x, seq_y = univariate_data[i:end_ix], univariate_data[end_ix]
        x.append(seq_x)
        y.append(seq_y)
    return np.array(x),np.array(y)

train_feature , train_label = sliding_window(train_data_scale,window_size)
test_feature , test_label = sliding_window(test_data_scale,window_size)

#### Optional 2 - Use helper function from data_module

In [None]:
# Helper Function from previous exercise
import data_module
train_feature , train_label = data_module.univariate_single_step(train_data_scale,window_size)
test_feature , test_label = data_module.univariate_single_step(test_data_scale,window_size)

### <a name="DataIterator">3.4 - Data Iterator

####  Optional 1 - Using Pytorch Custom Dataset Method
>**Instruction:**<br>
Create Data Iterator using Pytorch Custom Dataset Method

In [None]:
class Custom_Dataset(Dataset):
    # YOUR CODE HERE
    

In [None]:
train_dataset = Custom_Dataset(train_feature,train_label)
test_dataset = Custom_Dataset(test_feature,test_label)
train_iterator = DataLoader(train_dataset,batch_size,shuffle = False)
test_iterator = DataLoader(test_dataset,batch_size,shuffle = False)

#### Optional 2 - TensorDataset
>**Instruction:**<br>
Create Data Iterator using `TensorDataset`

In [None]:
trainX = torch.from_numpy(train_feature).type(torch.Tensor)
trainY = torch.from_numpy(train_label).type(torch.Tensor)
testX = torch.from_numpy(test_feature).type(torch.Tensor)
testY = torch.from_numpy(test_label).type(torch.Tensor)

# YOUR CODE HERE


train_iterator = DataLoader(train_dataset,batch_size,shuffle = False)
test_iterator = DataLoader(test_dataset,batch_size,shuffle = False)

### <a name="Multilayerperceptron(MLP)configuration">3.5 - Multilayer perceptron (MLP) configuration
>**Instruction:**<br>
Create the MLP configuration based on the **Expected Result:**<br>

>**Expected Result:**<br>
![image](https://user-images.githubusercontent.com/59526258/118422293-83d5c600-b6f5-11eb-842e-300b63950228.png)


In [None]:
class MLP(nn.Module):
    def __init__(self,input_size,output_size):
        super(MLP,self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.input_layer = nn.Linear(input_size,10)
        self.hidden_layer = nn.Linear(10,5)
        self.output_layer = nn.Linear(5,output_size)
        
    def forward(self,x):
        x = x.view(-1,input_size)
        out = F.relu(self.input_layer(x))
        out = F.relu(self.hidden_layer(out))
        out = self.output_layer(out)
        return out

### <a name="InputModel">3.6 - Input Model

In [None]:
torch.manual_seed(123)
input_size = window_size
output_size = 1
model = MLP(input_size, output_size)
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),lr= 0.01)

In [None]:
# Xavier weight intialization
torch.manual_seed(123)
def weights_init(m):
    if isinstance(m, nn.Linear):
        nn.init.xavier_uniform_(m.weight.data)

        
model.apply(weights_init)

### <a name="ModelSummary">3.7 - Model Summary

In [None]:
%%capture
# pip install this package to view the summary of model  
# used pip install due to it does not have conda version
# %%capture suppress information of torchsummaryX installation
!pip install torchsummaryX
from torchsummaryX import summary

In [None]:
inputs = torch.zeros((batch_size,window_size),dtype=torch.float) # batch size,seq_dimension
print(summary(model,inputs))

### <a name="Training">3.8 - Training

In [None]:
def training(num_epochs,train_iter,test_iter,optimizer,loss_fn,model):
    # Create a list of zero value to store the averaged value
    train_loss = np.zeros(num_epochs)
    val_loss = np.zeros(num_epochs)
    
    # YOUR CODE HERE
    
    return train_loss,val_loss

In [None]:
train_loss,val_loss = training(n_epoch,train_iterator,test_iterator,optimizer,loss_fn,model)

In [None]:
for i in range(n_epoch):
    print(f"Epoch: {i}, train loss: {train_loss[i]} ,test loss: {val_loss[i]}")

### <a name="Validation">3.9 - Validation

In [None]:
with torch.no_grad():
    train_prediction = model(trainX)
    test_prediction = model(testX)

In [None]:
# Inverse Scaling
train_label_rescale = scaler.inverse_transform(train_label)
test_label_rescale = scaler.inverse_transform(test_label)
train_prediction_rescale = scaler.inverse_transform(train_prediction)
test_prediction_rescale = scaler.inverse_transform(test_prediction)

In [None]:
print("Test Data\t\t\tForecast Data")
for i in range(len(test_label_rescale )):
    print(f"{test_label_rescale[i]}\t\t{test_prediction_rescale[i]}")

In [None]:
# Plot the Forecast Data
forecast_plot(forecast_data = test_label_rescale, 
              forecast_label = 'MLP Forecast',
              test_time=test_time[window_size:],
              test_data=test_prediction_rescale) 

In [None]:
mlp_train_result = math.sqrt(mean_squared_error(train_label_rescale,train_prediction_rescale))
mlp_forecast_result = math.sqrt(mean_squared_error(test_label_rescale,test_prediction_rescale))
print('Train Score: %.2f RMSE' % (mlp_train_result))
print('Test Score: %.2f RMSE' % (mlp_forecast_result))

In [None]:
result['MLP Forecast'] =  mlp_forecast_result
result

In conclusion, MLP model gives the lowest RMSE which is appropriate to make it as our forecast model for future unseen data. The second option goes to SARIMA model which has a slightly higher RMSE than MLP model, but have a faster runtime.

## <a name="Summary">Summary
From this tutorial, you should have learned:

1. Understand the time series forecasting flow.
2. Compare and select the optimal model as forecasting model base on the model performance. 

This tutorial only covers MLP in the Deep Learning section. You may include the LSTM and CNN deep learning model on your own and observe the performance between them.<br>
    
Congratulations, that concludes this lesson.

## <a name="Reference">Reference
1. [Deep Learning for Time Series Forecasting (Predict the Future with MLPs,CNNs and LSTMs in Python) , Jason Brownlee](https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/)
2. [Time-series Forecasting Flow](https://towardsdatascience.com/time-series-forecasting-flow-2e49740664de)