This is the 4th notebook of the neural networks introduction series. In this notebook, we will implement a gated recurrent unit (GRU) in Pytorch and use it to train a model for stock price prediction. This tutorial covers:

- What is a GRU?
- Why was GRU introduced?
- When to choose GRU over LSTM?
- Applications of GRU
- How does a GRU work?
- Implementation of a time series forecaster using GRU

<center><img src="https://namra.ir/static/kaggle/gru-all.png" height=300></center>

# What is a GRU?

<div style="text-align:justify">Gated Recurrent Unit (GRU) is a type of neural network architecture that was introduced after the Long Short-Term Memory (LSTM) network. Both GRU and LSTM are variations of recurrent neural networks (RNNs) designed to address the vanishing gradient problem, which can occur when training traditional RNNs on long sequences of data.
</div>

# Why was GRU introduced?

<div style="text-align:justify">The GRU was proposed by Kyunghyun Cho, et al. in a paper titled "Learning to Forget: Continual Prediction with LSTM" in 2014. The authors introduced the GRU as a simplified version of the LSTM with comparable performance. GRUs have fewer parameters than LSTMs and lack certain components like the memory cell, making them computationally more efficient in some cases.<br><br>
LSTMs were introduced earlier by Sepp Hochreiter and Jürgen Schmidhuber in 1997, with the paper titled "Long Short-Term Memory." LSTMs and GRUs are both popular choices for modeling sequential data in deep learning, and the choice between them often depends on the specific task and dataset.
</div>



# When to choose GRU over LSTM?

<div style="text-align:justify">
Choosing between Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks often depends on the specific characteristics of the task at hand, as well as considerations related to computational resources and model complexity. Here are some guidelines on when you might choose GRU over LSTM:
<br>
<ul>

<li>Computational Efficiency:</li>

GRUs typically have fewer parameters compared to LSTMs, making them computationally more efficient. If you are working with limited computational resources, GRUs might be a more suitable choice.

<li>Simplicity:</li>

GRUs have a simpler architecture than LSTMs because they lack an explicit memory cell. If your task doesn't require complex memory management and you want a simpler model, GRUs might be preferable.

<li>Data Size:</li>

If you have a relatively small dataset, GRUs might be a better choice. LSTMs tend to perform better than GRUs on larger datasets where the ability to capture long-term dependencies becomes more crucial.

<li>Overfitting:</li>

LSTMs, with their more complex structure, might be more prone to overfitting, especially when dealing with smaller datasets. GRUs, being simpler, might be less prone to overfitting in such scenarios.

<li>Real-time Applications:</li>

GRUs are often considered more suitable for real-time applications due to their lower computational requirements. If your application has strict real-time constraints, a GRU might be a better fit.

<li>Task-specific Performance:</li>

In practice, the choice between GRU and LSTM may also depend on empirical performance on the specific task you're working on. It's a good idea to experiment with both architectures and evaluate their performance on your dataset.

<li>Interpretability:</li>

If interpretability is a significant concern, GRUs might be easier to interpret since they have a simpler structure. LSTMs, with their memory cell and gates, may introduce additional complexity in understanding how information is processed over time.
</ul>

Ultimately, the choice between GRU and LSTM should involve empirical experimentation and validation on your specific task and dataset. It's common to try both architectures and select the one that performs better for your particular use case.


</div>

# Applications of GRU

Gated Recurrent Units (GRUs) are widely used in various applications due to their ability to capture sequential dependencies in data. Here are some applications of GRU neural networks, along with references to relevant papers:

1. **Natural Language Processing (NLP):**
   - GRUs are frequently used for tasks in NLP, such as language modeling, machine translation, and sentiment analysis.
   - **Reference:** Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. *arXiv preprint arXiv:1409.1259*.
   

2. **Speech Recognition:**
   - GRUs are employed in speech recognition systems to model sequential patterns in audio signals.
   - **Reference:** Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. *arXiv preprint arXiv:1409.0473*.

3. **Time Series Prediction:**
   - GRUs are used for time series prediction tasks, where the goal is to forecast future values based on historical data.
   - **Reference:** Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. *arXiv preprint arXiv:1412.3555*.

4. **Healthcare:**
   - GRUs find applications in healthcare for tasks such as patient monitoring, disease prediction, and analysis of medical records.
   - **Reference:** Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Doctor AI: Predicting clinical events via recurrent neural networks. *arXiv preprint arXiv:1511.05942*.

5. **Gesture Recognition:**
   - GRUs are utilized in gesture recognition systems to model and understand sequential patterns in gesture data.
   - **Reference:** Fragkiadaki, K., Levine, S., Felsen, P., & Malik, J. (2015). Recurrent network models for human dynamics. In *Proceedings of the IEEE International Conference on Computer Vision (ICCV)*.

6. **Video Analysis:**
   - GRUs are applied in video analysis tasks, including action recognition and video captioning.
   - **Reference:** Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*.

7. **Financial Time Series Analysis:**
   - GRUs are employed in predicting financial market trends and analyzing time series data in finance.
   - **Reference:** Lim, Y. J., Na, J. C., Lee, W. S., Kim, Y. G., & Kim, K. H. (2019). Stock price prediction using LSTM, RNN and GRU neural network. *Sustainability, 11*(18), 4933.

These references provide insights into the applications and effectiveness of GRUs in various domains. Keep in mind that the field of deep learning is rapidly evolving, and new papers may emerge with advancements in GRU-based models.

# How does a GRU work?

<div style="text-align:justify">A Gated Recurrent Unit takes as input the previous hidden state and the current input, and outputs the current hidden state. The hidden state is also known as the memory of the network. The GRU has two gates: a reset gate and an update gate. The reset gate determines how to combine the new input with the previous memory, and the update gate defines how much of the previous memory to keep around. We will describe the process of calculating the output in the next cells.</div>




## Reset Gate

The reset gate $r_t$ determines which parts of the previous hidden state $h_{t-1}$ should be forgotten or reset. It takes a concatenated input of the previous hidden state $h_{t-1}$ and the current input $x_t$, and outputs a number between 0 and 1 for each number in the hidden state $h_{t-1}$. A value of 0 means the corresponding number in the hidden state should be reset to 0, while a value of 1 means the corresponding number in the hidden state should be left unchanged.

<center><img src="https://namra.ir/static/kaggle/reset-gate.png" height=300></center>

$$r_t = \sigma(W_r[h_{t-1};x_t]+b_r)$$

## Update Gate

The update gate $z_t$ decides how much of the new candidate hidden state should be blended with the previous hidden state. The input to the update gate is the same as the reset gate, a concatenated input of the previous hidden state $h_{t-1}$ and the current input $x_t$. The output of the update gate is a number between 0 and 1 for each number in the hidden state $h_{t-1}$. A value of 0 means the corresponding number in the hidden state should be completely forgotten, while a value of 1 means the corresponding number in the hidden state should be completely remembered.

<center><img src="https://namra.ir/static/kaggle/update-gate.png" height=300></center>

$$z_t = \sigma(W_z[h_{t-1};x_t]+b_z)$$

## The effect of the reset gate

The output of the reset gate $r_t$ is multiplied element-wise with the previous hidden state $h_{t-1}$.

<center><img src="https://namra.ir/static/kaggle/reset-gate-continue.png" height=300></center>

$$r_t \odot h_{t-1}$$

## Candidate Hidden State

The candidate hidden state $\tilde{h}_t$ represents the new information that could be added to the hidden state. It is a normalized version of the concatenatation of the previous hidden state $h_{t-1}$ and the current input $x_t$. This normalization is done using the $\tanh$ function, which squashes the values to be between -1 and 1.

<center><img src="https://namra.ir/static/kaggle/gru-candidate-hidden-state.png" height=300></center>

$$\tilde{h}_t = \mathrm{TH}(W_h[r_t \odot h_{t-1};x_t] + b_h)$$

## Hidden State

The hidden state at a timestep like $t$ (denoted by $h_t$) is actually the output of the network at that time. To calculate the new hidden state, we should combine the previous hidden state $h_{t-1}$ and the new candidate state using the update gate.

<center><img src="https://namra.ir/static/kaggle/gru-hidden-state.png" height=300></center>

$$h_t = (1-z_t)\odot h_{t-1}+z_t \odot \tilde{h}_t$$

This equation blends the previous hidden state with the new candidate hidden state based on the update gate. If $z_t$ is close to 1, the new information is retained; if $z_t$ is close to 0, more of the previous state is retained.

## All in one intuition

<center><img src="https://namra.ir/static/kaggle/gru-all.png" height=300></center>

- **Reset Gate ($r_t$):** Decides which parts of the previous hidden state to forget. If \(r_t\) is close to 1, it means that the model should consider more of the previous hidden state.

$$r_t = \sigma(W_r[h_{t-1};x_t]+b_r)$$

- **Update Gate ($z_t$):** Determines how much of the new candidate hidden state should be included in the final hidden state. If \(z_t\) is close to 1, the model gives more weight to the new candidate hidden state.

$$z_t = \sigma(W_z[h_{t-1};x_t]+b_z)$$

- **Candidate Hidden State ($\tilde{h}_t$):** Represents the new information that could be added to the hidden state.

$$\tilde{h}_t = \mathrm{TH}(W_h[r_t \odot h_{t-1};x_t] + b_h)$$

- **Final Hidden State ($h_t$):** The updated hidden state that considers both the previous state and the new information based on the reset and update gates.

$$h_t = (1-z_t)\odot h_{t-1}+z_t \odot \tilde{h}_t$$

These mechanisms allow GRUs to selectively remember or forget information, enabling them to capture long-term dependencies in sequential data.

# Configuration

In [1]:
# !pip install gensim
# !pip install spacy
# !python3.11 -m spacy download en_core_web_sm

In [2]:
import pandas as pd
from collections import Counter
import spacy
import numpy as np
import re
import string
import gensim
import torch
from torch import nn, optim, Tensor
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import pickle as pkl



In [3]:
config = {
    'data_path': 'data.csv',
    'batch_size': 64,
    'device': 'cuda', # mps for mac m1, cuda for gpu-enabled devices
    'learning_rate': 0.01,
    'num_epochs': 100,
    'train_size': 0.8,
    'random_seed': 50
}

In [4]:
np.random.seed(config['random_seed'])
torch.random.manual_seed(config['random_seed'])

<torch._C.Generator at 0x7853d9326e50>

# Loading the Dataset

For this tutorial, we will use the Amazon's stock market data. Basically, we will use the previous days stock market data to predict the next day's closing price.

In [5]:
!wget -O 'data.csv' 'https://www.dropbox.com/scl/fi/5zgutd3y6sm5jwuak60rp/data.csv?rlkey=2mivltwxvmx3rtjfzhltp0e09&dl=1'

--2023-12-04 20:03:56--  https://www.dropbox.com/scl/fi/5zgutd3y6sm5jwuak60rp/data.csv?rlkey=2mivltwxvmx3rtjfzhltp0e09&dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.83.18, 2620:100:6033:18::a27d:5312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.83.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc0e53467e8896953b63b2815fe2.dl.dropboxusercontent.com/cd/0/inline/CIzY-cWyevX2Gy_OIIVVyafcddjMx0WUi0zFkmBDKWTlJg6sRQGoEGE8xzZXsY2MnCtaVtapxQiVGzS3MNKYDwe37CSGln5xU5vcRbDwIEqfL5xnyNWge6noFO4Rv3w7Th3bZm5b58ZCYKO7Pd2CQCT4/file?dl=1# [following]
--2023-12-04 20:03:57--  https://uc0e53467e8896953b63b2815fe2.dl.dropboxusercontent.com/cd/0/inline/CIzY-cWyevX2Gy_OIIVVyafcddjMx0WUi0zFkmBDKWTlJg6sRQGoEGE8xzZXsY2MnCtaVtapxQiVGzS3MNKYDwe37CSGln5xU5vcRbDwIEqfL5xnyNWge6noFO4Rv3w7Th3bZm5b58ZCYKO7Pd2CQCT4/file?dl=1
Resolving uc0e53467e8896953b63b2815fe2.dl.dropboxusercontent.com (uc0e53467e8896953b63b2815fe2.dl.dropboxusercontent.com)... 

In [6]:
df = pd.read_csv(config['data_path'])
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1997-05-15,0.121875,0.125000,0.096354,0.097917,0.097917,1443120000
1,1997-05-16,0.098438,0.098958,0.085417,0.086458,0.086458,294000000
2,1997-05-19,0.088021,0.088542,0.081250,0.085417,0.085417,122136000
3,1997-05-20,0.086458,0.087500,0.081771,0.081771,0.081771,109344000
4,1997-05-21,0.081771,0.082292,0.068750,0.071354,0.071354,377064000
...,...,...,...,...,...,...,...
6511,2023-03-30,101.550003,103.040001,101.010002,102.000000,102.000000,53633400
6512,2023-03-31,102.160004,103.489998,101.949997,103.290001,103.290001,56704300
6513,2023-04-03,102.300003,103.290001,101.430000,102.410004,102.410004,41135700
6514,2023-04-04,102.750000,104.199997,102.110001,103.949997,103.949997,48662500


# Data Preprocessing and Preparation

As you can see, the dataset contains the closing price of Amazon's stock market from 1997 to 2023. We will use the past week (7 days) stock market data to predict the next day's closing price. In order to this, we have to transform the dataset in a way that each row of it contains the past 7 days stock market data and the next day's closing price.

First of all, we drop the unnecessary columns (`Open`, `High`, `Low`, `Adj Close`, `Volume`) and then we create a new column called `Target` which contains the next day's closing price.

In [7]:
df.drop(columns=['Open', 'High', 'Low', 'Adj Close', 'Volume'], inplace=True)

In [8]:
df

Unnamed: 0,Date,Close
0,1997-05-15,0.097917
1,1997-05-16,0.086458
2,1997-05-19,0.085417
3,1997-05-20,0.081771
4,1997-05-21,0.071354
...,...,...
6511,2023-03-30,102.000000
6512,2023-03-31,103.290001
6513,2023-04-03,102.410004
6514,2023-04-04,103.949997


Now, we iterate over the dataset to make our set of features (the closing values corresponding to the past 7 days) and the `target` (the closing value of the next day).

In [9]:
dates = df['Date'].values
close_prices = df['Close'].values

In [10]:
data_dict = {}
for idx, date in enumerate(dates[7:], start=7):
    data_dict[date] = {
            'target': close_prices[idx],
            't-1': close_prices[idx-1],
            't-2': close_prices[idx-2],
            't-3': close_prices[idx-3],
            't-4': close_prices[idx-4],
            't-5': close_prices[idx-5],
            't-6': close_prices[idx-6],
            't-7': close_prices[idx-7],
        }
df = pd.DataFrame.from_dict(data_dict, orient='index')

In [11]:
df = df[['t-7', 't-6', 't-5', 't-4', 't-3', 't-2', 't-1', 'target']]
df

Unnamed: 0,t-7,t-6,t-5,t-4,t-3,t-2,t-1,target
1997-05-27,0.097917,0.086458,0.085417,0.081771,0.071354,0.069792,0.075000,0.079167
1997-05-28,0.086458,0.085417,0.081771,0.071354,0.069792,0.075000,0.079167,0.076563
1997-05-29,0.085417,0.081771,0.071354,0.069792,0.075000,0.079167,0.076563,0.075260
1997-05-30,0.081771,0.071354,0.069792,0.075000,0.079167,0.076563,0.075260,0.075000
1997-06-02,0.071354,0.069792,0.075000,0.079167,0.076563,0.075260,0.075000,0.075521
...,...,...,...,...,...,...,...,...
2023-03-30,100.610001,98.699997,98.709999,98.129997,98.040001,97.239998,100.250000,102.000000
2023-03-31,98.699997,98.709999,98.129997,98.040001,97.239998,100.250000,102.000000,103.290001
2023-04-03,98.709999,98.129997,98.040001,97.239998,100.250000,102.000000,103.290001,102.410004
2023-04-04,98.129997,98.040001,97.239998,100.250000,102.000000,103.290001,102.410004,103.949997


As you see, as we get closer to the recent years, the stock prices are higher. So, we have to normalize the dataset to make the values in the same range. We use the `MinMaxScaler` to normalize the dataset.

In [12]:
scaler = MinMaxScaler(feature_range=(-1, 1))
df[['t-7', 't-6', 't-5', 't-4', 't-3', 't-2', 't-1', 'target']] = scaler.fit_transform(df[['t-7', 't-6', 't-5', 't-4', 't-3', 't-2', 't-1', 'target']])
df


  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


Unnamed: 0,t-7,t-6,t-5,t-4,t-3,t-2,t-1,target
1997-05-27,-0.999698,-0.999821,-0.999832,-0.999872,-0.999983,-1.000000,-0.999955,-0.999911
1997-05-28,-0.999821,-0.999832,-0.999872,-0.999983,-1.000000,-0.999944,-0.999911,-0.999939
1997-05-29,-0.999832,-0.999872,-0.999983,-1.000000,-0.999944,-0.999899,-0.999939,-0.999953
1997-05-30,-0.999872,-0.999983,-1.000000,-0.999944,-0.999899,-0.999927,-0.999953,-0.999955
1997-06-02,-0.999983,-1.000000,-0.999944,-0.999899,-0.999927,-0.999941,-0.999955,-0.999950
...,...,...,...,...,...,...,...,...
2023-03-30,0.078175,0.057693,0.057800,0.051580,0.050615,0.042036,0.074309,0.093076
2023-03-31,0.057693,0.057800,0.051580,0.050615,0.042036,0.074315,0.093076,0.106910
2023-04-03,0.057800,0.051580,0.050615,0.042036,0.074315,0.093081,0.106910,0.097473
2023-04-04,0.051580,0.050615,0.042036,0.074315,0.093081,0.106915,0.097473,0.113988


Now, let us consider the closing values of the past 7 days as our features (`x`) and the closing value of the next day as our target (`y`).

In [13]:
x = df[['t-7', 't-6', 't-5', 't-4', 't-3', 't-2', 't-1']].values
y = df['target'].values

# Splitting the Dataset

We need to split the dataset into training and testing sets. We will use the first 80% of the dataset as the training set, another 10% as the testing set and the rest as the validation set.

In [14]:
x_train, x_rem, y_train, y_rem = train_test_split(x, y, train_size=config['train_size'], shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_rem, y_rem, train_size=0.5, shuffle=True)

Now is the time to make a Pytorch `Dataset` object out of our data. Since we want the GRU network to take into account the fact that these values have a temporal order, we have to make a sliding window of size 7 over the dataset.

In [15]:
class StockDataset(Dataset):
    def __init__(self, values, targets):
        self.values = values.reshape(-1, 7, 1)
        self.labels = np.array(targets).reshape(-1, 1)
    def __len__(self):
        return self.values.shape[0]
    def __getitem__(self, idx):        
        return Tensor(self.values[idx]).to(config['device']), Tensor(self.labels[idx]).to(config['device'])

In [16]:
train_dataset = StockDataset(x_train, y_train)
test_dataset = StockDataset(x_test, y_test)
val_dataset = StockDataset(x_val, y_val)

To iterate over the data while training the model, we need to create a `DataLoader` object. We will use the `DataLoader` class to do this.

In [17]:
train_loader = DataLoader(train_dataset, batch_size=config['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=config['batch_size'], shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=config['batch_size'], shuffle=True)

# Defining the Model Architecture

## Side Note: The DropOut Layer

A Dropout layer is a regularization technique where we randomly set some of the dimensions of the input vector to 0. This helps in preventing overfitting. The `Dropout` class in Pytorch implements this functionality.

In [18]:
class Model(nn.Module):
    
    def __init__(self):
        super(Model, self).__init__()
        self.gru = nn.GRU(input_size=1, hidden_size=4, batch_first=True)
        self.dropout = nn.Dropout(0.2)
        self.dense = nn.Linear(4, 1)
    
    def forward(self, x):
        output, _ = self.gru(x) # output shape: (batch_size, seq_len, hidden_size)
        output = self.dropout(output)
        output = self.dense(output[:, -1, :]) # output shape: (batch_size, 1)
        return output

In [19]:
model = Model().to(config['device'])

In [20]:
model(torch.randn(32, 7, 1).to(config['device'])).shape

torch.Size([32, 1])

# Optimization Rules

In [21]:
optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'])

In [22]:
criterion = nn.MSELoss()

In [23]:
def train_loop(dataloader, model, loss_fn, optimizer, epoch_num):
    num_points = len(dataloader.dataset)
    for batch, (features, labels) in enumerate(dataloader):        
        # Compute prediction and loss
        pred = model(features)
        loss = loss_fn(pred, labels)
        
        # Backpropagation
        optimizer.zero_grad() # sets gradients of all model parameters to zero
        loss.backward() # calculate the gradients again
        optimizer.step() # w = w - learning_rate * grad(loss)_with_respect_to_w

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(features)
            print(f"\r Epoch {epoch_num} - loss: {loss:>7f}  [{current:>5d}/{num_points:>5d}]", end=" ")


def test_loop(dataloader, model, loss_fn, epoch_num, name):
    num_points = len(dataloader.dataset)
    sum_test_loss = 0

    with torch.no_grad():
        for batch, (features, labels) in enumerate(dataloader):
            pred = model(features)
            sum_test_loss += loss_fn(pred, labels).item() # add the current loss to the sum of the losses
            
    sum_test_loss /= num_points
    print(f"\r Epoch {epoch_num} - {name} Avg loss: {sum_test_loss:>8f}", end=" ")

# Training the Model and Evaluating the Results

In [24]:
for epoch_num in range(1, config['num_epochs']+1):
    train_loop(train_loader, model, criterion, optimizer, epoch_num)
    test_loop(val_loader, model, criterion, epoch_num, 'Development/Validation')

 Epoch 100 - Development/Validation Avg loss: 0.000235 

In [25]:
test_loop(test_loader, model, criterion, epoch_num, 'Test')

 Epoch 100 - Test Avg loss: 0.000211 

Calculating the coefficient of determination ($R^2$) is a good way to evaluate the performance of a regression model. The coefficient of determination is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

In [26]:
predictions = model(Tensor(x_test.reshape(-1, 7, 1)).to(config['device'])).cpu().detach().numpy().squeeze()

In [27]:
true_values = y_test[:]

In [28]:
r2_score(true_values, predictions)

0.9368562897619723

# Future Inference

Note that we have calculated the normalized values. In real life, what we need is the actual values. So, we have to inverse the normalization process to get the actual values. In order to do this, we need to somehow store the `scaler` object. We can use the `pickle` module to do this.

In [29]:
pkl.dump(scaler, open('/kaggle/working/scaler', 'wb')) # saving the scaler

Now that we have saved the `scaler` object, we are assured that our work was not in vain. We can load the `scaler` object and use it to inverse the normalization process.

In [30]:
scaler = pkl.load(open('/kaggle/working/scaler', 'rb'))

There's one more thing we need to do when we inverse the normalization process. We have to concatenate the features and the target so that the shape of the input to the scaler becomes `(1,8)`.

In [31]:
concatenated_values = np.concatenate((x_test[0].reshape(1, -1), predictions[0].reshape(1, -1)), axis=1)

In [32]:
scaler.inverse_transform(concatenated_values) # the last value is the predicted value

array([[3.728     , 3.71      , 3.806     , 3.823     , 4.1145    ,
        4.036     , 3.915     , 0.21925712]])

What if we get a new value? How do we predict the next day's closing price?

First, we make an array of shape `(1,8)` by concatenating the last 7 values of the dataset and a zero. Then we normalize the array using the `scaler` object. Now, we can use the model to predict the next day's closing price. The prediction will consist of 8 normalized values in the range of -1 to 1. However, we need the actual value. So, we inverse the normalization process and get the actual value.

In [33]:
def predict_next_day(previous_days):
    raw_values = np.concatenate((previous_days, [0]), axis=0).reshape(1, -1)
    scaled_values = scaler.transform(raw_values)
    scaled_input = scaled_values.squeeze()[:-1].reshape(-1, 7, 1)
    output = model(Tensor(scaled_input).to(config['device'])).cpu().detach().numpy()
    scaled_values = np.concatenate((previous_days.reshape(1,-1), output), axis=1)
    raw_values = scaler.inverse_transform(scaled_values)
    prediction = raw_values.squeeze()[-1]
    return prediction
    

In [34]:
predict_next_day(x_test[0])



4.122449053059763