# N-BEATS : Neural Basis Expansion Analysis for interpretable Time Series forasting

We will be replicating the N-Beats model for Time series based on the paper: https://arxiv.org/pdf/1905.10437
Since it is my first try I will be following and imitating the code written on the lecture of Daniel Bourke on Time Series (TensorFlow for Deep Learning Bootcamp in Udemy)

Reference:
1. Code from Daniel Brouke: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/10_time_series_forecasting_in_tensorflow.ipynb
2. Data from Kaggle: https://www.kaggle.com/datasets/vijayvvenkitesh/microsoft-stock-time-series-analysis/data

In [1]:
#Importing all the required libraries
import tensorflow as tf
import pandas as pd
import numpy as np

## Building N-Beats block Layer

In [2]:
# Create NBeatBlock custom layer
class NBeatsBlock(tf.keras.layers.Layer):
    def __init__(self,
                 input_size: int,
                 theta_size: int,
                 horizon: int,
                 n_neurons: int,
                 n_layers: int,
                 **kwargs):
        super().__init__(**kwargs)
        self.input_size = input_size
        self.theta_size = theta_size
        self.horizon = horizon
        self.n_neurons = n_neurons
        self.n_layers = n_layers

        # Block contains stack of 4 fully connected layers each with ReLU activation
        self.hidden = [tf.keras.layers.Dense(n_neurons, activation="relu") for _ in range(n_layers)]
        # Output of block is a theta layer with linear activation
        self.theta_layer = tf.keras.layers.Dense(theta_size, activation="relu")

    def call(self, inputs):
        # create a mini model with functional api approach

        # Inputs to be passed through the FC stack
        x = inputs
        # FC Stack
        for layer in self.hidden:
            x = layer(x)
        # Theta layer for the Backcast and Forecast
        theta = self.theta_layer(x)
        backcast, forecast = theta[:, :self.input_size], theta[: , -self.horizon:]
        return backcast, forecast
        

An object needs to be created with the above class by giving the Input size, Theta size, horizon, number of neurons, number of layers as the input. The call function can be called just by using the object as a function, this will return the forecast and backcast. 

## Prepare the Data

The data I am going to use will be of the Microsoft Stock price taken from Kaggle Dataset. 
https://www.kaggle.com/datasets/vijayvvenkitesh/microsoft-stock-time-series-analysis/data

Disclaimer: Not meant for any stock advises :D

In [5]:
original_data = pd.read_csv("Microsoft_Stock.csv",
                           parse_dates = ["Date"],
                           index_col = ["Date"])
original_data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-04-01 16:00:00,40.6,40.76,40.31,40.72,36865322
2015-04-02 16:00:00,40.66,40.74,40.12,40.29,37487476
2015-04-06 16:00:00,40.34,41.78,40.18,41.55,39223692
2015-04-07 16:00:00,41.61,41.91,41.31,41.53,28809375
2015-04-08 16:00:00,41.48,41.69,41.04,41.42,24753438


In [6]:
original_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1511 entries, 2015-04-01 16:00:00 to 2021-03-31 16:00:00
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Open    1511 non-null   float64
 1   High    1511 non-null   float64
 2   Low     1511 non-null   float64
 3   Close   1511 non-null   float64
 4   Volume  1511 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 70.8 KB


In [7]:
original_data.describe()

Unnamed: 0,Open,High,Low,Close,Volume
count,1511.0,1511.0,1511.0,1511.0,1511.0
mean,107.385976,108.437472,106.294533,107.422091,30198630.0
std,56.691333,57.382276,55.977155,56.702299,14252660.0
min,40.34,40.74,39.72,40.29,101612.0
25%,57.86,58.06,57.42,57.855,21362130.0
50%,93.99,95.1,92.92,93.86,26629620.0
75%,139.44,140.325,137.825,138.965,34319620.0
max,245.03,246.13,242.92,244.99,135227100.0


We will be considering only the closing price. Hence lets create a dataframe which has date as index and close as our only column.

In [9]:
all_data = pd.DataFrame(original_data["Close"])
all_data

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2015-04-01 16:00:00,40.72
2015-04-02 16:00:00,40.29
2015-04-06 16:00:00,41.55
2015-04-07 16:00:00,41.53
2015-04-08 16:00:00,41.42
...,...
2021-03-25 16:00:00,232.34
2021-03-26 16:00:00,236.48
2021-03-29 16:00:00,235.24
2021-03-30 16:00:00,231.85


There seems to be missig dates in the data.