In this notebook, we will be guiding you on building a Transformer encoder model to predict stock prices. The model will be based on the research papers https://thesai.org/Downloads/Volume12No12/Paper_106-Predicting_Stock_Closing_Prices_in_Emerging_Markets.pdf
 and https://arxiv.org/abs/2208.08300 with some modifications.


### 1. Import the required libraries.

In [None]:
# data processing and visualization
import numpy as np
import pandas as pd
import os, datetime
import matplotlib.pyplot as plt
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler


# for building model
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import layers,models

import warnings
warnings.filterwarnings('ignore')

### 2. Give stock data using Yahoo Finance API.

Find the ticker symbol of any stock and load its data for around 10-15 years.

In [None]:
# Define the ticker symbol
tickerSymbol = '....'

# Get data on this ticker
tickerData = yf.Ticker(tickerSymbol)

# Get the historical prices for this ticker
df = tickerData.history(period='1d', start='....', end='....')

In [None]:
df = df.reset_index()
df.head()

### 3. Select only the columns : Open, High, Low, Volume, Close.
At first, reset the indices of the data as they create problems while plotting.

The index of the "Close" column matters as it is our target variable.

Have a look at the data.

### 4. On running the following cell , you can see the Close price of the stock and the Volume plotted against time.

In [None]:
fig = plt.figure(figsize=(15,8))
st = fig.suptitle("Close Price and Volume", fontsize=20)
st.set_y(0.92)

ax1 = fig.add_subplot(211)
ax1.plot(df['Close'], label='Close Price')
ax1.set_ylabel('Close Price', fontsize=18)
ax1.legend(loc="upper left", fontsize=12)

ax2 = fig.add_subplot(212)
ax2.plot(df['Volume'], label='Volume')
ax2.set_ylabel('Volume', fontsize=18)
ax2.legend(loc="upper left", fontsize=12)

### 5. Use Min Max Scaler to scale the data in the range (0,1).

Remeber not to lose the dataframe. Instead store the scaled data in a new variable. You would need the stock data afterwards.

Have a look at the data. Also check its shape.

### 6. Creating sequences from the data.

We want to build a model that can take stock data of 10 days to predict the data for the 11th day. For that, we have to input sequences and corresponding targets from the data.

In [None]:
sequence_length =10
input_sequences=[]
targets=[]
for i in range("......"):
    input_sequences.append(data["........"])
    targets.append(data["....."])

input_sequences = np.array(input_sequences)
targets = np.array(targets)

### 7. Split the data

Split the data into training and testing samples. Don't shuffle. Order matters !

### 8. Create a class Time2Vec for time2vec encoding of our input sequences.

The research paper of time2vec encoding is here :  https://arxiv.org/pdf/1907.05321v1.pdf

It has 2 weights and 2 bias matrices.
1. Z1 = Weights w1 * inputs + bias b1
2. Z2 = Weights w2 * Z1 + bias b2
3. output = Z2


This will return output sequences of shape (10,x) where x = kernel_size , an input parameter for the class. In the Encoder Block, we will concatenate it on our inputs.

In [None]:
class Time2Vec(keras.layers.Layer):
    def __init__(self, kernel_size):
        super().__init__(trainable=True, name='Time2Vec')
        self.k = kernel_size

    def build(self, input_shape):
        """
        .....

        """
        super().build(input_shape)

    def call(self, inputs, **kwargs):
        """...."""
        return

### 9. Creating a transformer Encoder model.

1. Start by creating an encoder layer. It should take the following parameters while initialisation :
> a) Number of heads for the Multi Head Attention layer.<br/>
b) Embedding dimension : 5+x .<br/>
c) Dimensions of the feed forward layer.<br/>
d) Dropout rate.
>
Its architecture :

  1. Multi Head Attention
  2. Dropout layer for regularization.
  3. Addition and Layer Normalization.
  4. A Feed forward network which returns output of the same shape as embedding dimension.
  5. Dropout.
  6. Addition and Layer Normalization.


In [None]:
class TransformerEncoder(layers.Layer):
    def __init__(self, num_heads, embed_dim, feed_forward_dim, rate):
        super().__init__()
        """...."""

    def call(self, inputs, training):
        """...."""
        return

2. Then create the encoder model. It should take the following parameters while initialisation :
> a) Number of heads for the Multi Head Attention layer.<br/>
b) Time_steps : 10 . Stock data of 10 days as input. <br/>
c) Features : 5 . 5 features for each day price. <br/>
d) Kernel size for the time2vec encoding : x .<br/>
e) Dimensions of the feed forward layer inside the encoder layer.<br/>
f) Number of encoder layers to be stacked.<br/>
g) Dropout rate.
>
Its architecture :

  1. Optional Input Layer
  2. Time2Vec encoding
  3. Sequential Model for the stack of encoder layers.
  >
  This will return output vectors of shape (10,5+x) but we need predictions of shape (1,5). So...
  4. Global Average Pooling Layer to make outputs of shape (1,5+x).
  5. Dropout.
  6. A Feed Forward Layer.
  7. Dropout
  8. Final Feed Forward Layer.

In [None]:
class T2VTransformer(keras.Model):
    def __init__(self):

        super().__init__()


    def call(self, inputs):

        return

### 10. Create an instance of the model.

Adam Optimizer and MSE loss function. Fit the model on the training data.

In [None]:
model = T2VTransformer("""...""")

opt = """..."""
model.compile("""...""")

### 11. Testing

Evaluate the model performance on the test data.

Store the model predictions on the test data.

Inverse transform the predictions and the test targets to bring them to their original scale. Then calculate MSE.

Plot the predictions and the test targets.

Instead, you can predict on the entire data and plot the graph.

## Limitations of the model

The model can be used to predict stock data for only the next day. You may consider taking that prediction and use it as input to predict data two days later. However, after some time, the model will only output a single value. It can't be operated for a long time prediction.

<b>How can we predict for more days in the future ?

</b> We have to train another model that will be trained on the specific number of days that has to be predicted.

Let's try out a 3-days model.

### What are the challenges ?

A feed forward network can't produce outputs of shape (m,n) . So, we will produce outputs of shape m*n and then reshape them. tf.reduce_prod() and tf.reshape() can be a lifesaver.

Pass num_days as an extra parameter to the model.

Split the data accordingly.

In [None]:
num_days =3

In [None]:
class T2VTransformer(keras.Model):
    def __init__(self,num_days,....):
        """...same as previous..."""

    def call(self, inputs):
        return

Load the data.

Plot 3 graphs (predictions for each day).

## Improvements that has to be made

In the upcoming weeks, we have to automate the process of making models according to num-days. Moreover, we need to save the models so that they need not be trained again and again.