# 1.Imports and Folder

In [None]:
import Pipeline 
import Models 
import torch
import logging
import numpy as np
from pathlib import Path
from torchfitter.utils.convenience import get_logger
import pandas as pd
import pandas as pd
import yfinance as yf
import datetime
import torch
import requests
from bs4 import BeautifulSoup
logger = get_logger(name="DTML ")
level = logger.level
logging.basicConfig(level=level)

# 2. Model parameters

In [None]:
### parameters : 
#[0] beta (weight of global market correlation)
#[1] nglobal : number of lines describing market indexes 
#[2] h (length of context vector)
#[3] window-length
#[4] batch_size 
#[5] number of epochs for training
#[6] initial learning rate
#[7] n_layers in LSTM
#[8] dropout_rate

params =torch.tensor([ 0.3 , 1 , 2, 16, 32, 100, 1e-3, 2, 0.1],requires_grad=False).float()

### Index of the stock to predict. Warning : this will be the nglobal + stock_to_predict -1 th line of the dataset since we are
### not predicting the market indexes used
stock_to_predict = 1

# 3. Data pre-processing

### Download Dataset

In [None]:
logger.info(f"IMPORTING DATASET")
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, "html.parser")
table = soup.find("table", {"class": "wikitable sortable"})
tickers = []
for row in table.findAll("tr")[1:]:
    ticker = row.findAll("td")[0].text.strip()
    tickers.append(ticker)

# Add the S&P 500 ETF ticker (SPY) at the beginning of the list
tickers.insert(0, "SPY")
# Download
data = yf.download(tickers = tickers,  
            period = "10y",         
            interval = "1wk",       
            prepost = False,       
            repair = True)  

data = data['Close']

### Clean the dataset

In [None]:
logger.info(f"CLEANING DATASET")
data = data.loc[ data.isna().sum(axis = 1) < 500, :]
data = data.fillna(method = 'ffill')
temp = data.isna().sum(axis = 0)
data = data.loc[ :, temp  == 0 ]


In [None]:
### Convers to Float
data = data.astype(dtype="float32")

### Get the SPY market index first 
idx = data.columns.get_loc("SPY")
cols = [data.columns.values[idx]]
cols  = cols + data.columns.values[0:idx].tolist()
cols  =   cols +  data.columns.values[idx+1:len(data.columns.values)].tolist()
data = data[cols]
### Retrieve ticker - index vector 
tickers = data.columns.values

### get the stock price data into percent change format 
data = data.pct_change(1)
data = data.iloc[1:,:]

### Scaling
s = np.std(data.values,axis=(0,1))
mu = np.mean(data.values, axis = (0,1))
data = (data - mu)/s


### Train Test Validation Split

In [None]:

split_idx = int(data.shape[0] * 0.8)
train_df = data.iloc[0:split_idx,:]
remaining_df = data.iloc[split_idx: , :]
split_idx_2 = int(remaining_df.shape[0] * 0.5)
test_df = remaining_df.iloc[0:split_idx_2,:]
validation_df = remaining_df.iloc[split_idx_2:, :]

### We add a new dimension at the end. The model takes a T x Nstocks x Nfeatures 3D tensor as input and data in this case
### was only 2D

train_df = torch.tensor(train_df.values).unsqueeze(2)
test_df = torch.tensor(test_df.values).unsqueeze(2)
validation_df = torch.tensor(validation_df.values).unsqueeze(2)

# 4. Create and train the model

In [None]:
logger.info(f"INITIALIZING MODEL")
### Initialize pipeline with dataset
pipe = Pipeline.DTMLPipeline()
pipe.input_data( [train_df, train_df,validation_df, validation_df, test_df, test_df])

### Create DTML model within pipeline ank make sure all parameters are float
pipe.create_model(params)
pipe.model = pipe.model.float()
print(" The model has " + str(Pipeline.count_parameters(pipe.model)) + " parameters")

In [None]:
### Train the model 
pipe.train_model(params)

In [None]:
### Plot loss history
pipe.plot_history()

# 5. Retrieve Model predictions

In [None]:
### Predict and plot predictions
y_pred, y_test = pipe.predict() 
### plot_predict(i) plots prediction vs result of stock nb 1 in the list
pipe.preds = torch.mul(pipe.preds,  s) + torch.tensor(mu) 
pipe.tests = torch.mul(pipe.tests, s) + torch.tensor(mu)

In [None]:
pipe.plot_predict(stock_to_predict)

# Discussion of the issues and tentative fixes

1.Description of "Models.py"

    The file "Models.py" contains all the different models that are combined into the DTML model. The fourmain classes to look at are : 
    _ The "contextEncoder" class. It is a layer that encodes a panel of time series into a context matrix by running a LSTM over all lags of each series, and applies an attention layer as well as a custom layer norm to its output. The custom layer normalization is implemented in the class "ContextNorm" present in the same file. 
    The structure is Linear Layer -> Activation -> LSTM over all stocks/lags -> stacking hidden states into context matrix -> Attention Layer -> Context (Layer) Normalization

    _ The "contextAggregator" class. It combines the output from two contextEncoders, one applied to all stocks and one applied to a market index (S&P 500) into a single context matrix. 

    _ The DASATransformer class is our decoder. It applies a multi-head attention layer to the context matrix and implements a residual connexion.

    _the "DTMLModel" class, which combines the three above into the complete model. The structure of DTMLModel is : 
    Split data into stocks and market index -> Run a context encoder on each -> combine the context matrices into one with the contextAggregator -> Apply the DASATransformer -> Apply a Multi-layer perceptron -> output prediction

2.Description of "Pipeline.py"

    The main class is DTMLPipeline. has several methods that allow to run the model end to end : 
    _ __init__ initializes the class 
    _ input_data inputs the dataset (in train val test format)
    _ create_model intitializes the model using our parameters
    _ train_model trains the model. It creates the data loading structures, a warm-up learning rate scheduler and the model iteration over train and validation set
    _ plot_history() plots the history of validation and training losses throughout the training
    _ predict() runs the model on the test set and outputs the results 
    _plot_predict(n) plots the forecast and true data for the stock of index n.
    

3.Description of issues and potential fixes

    The main issue that we have faced while implementing this model has to do with training. Initially, all gradients in the encoding part of the model were vanishing, which is a common problem when having deep layers followed by attention layers. This is why we carefully implemented a residual connection as well as several layer normalizations in the Decoder (DASATransformer class). While this fixed the vanishing gradients in the encoding part of the network, the gradients were still vanishing in the multi-head attention block of the decoder. 
    We were then confronted with convergence issues, with the model converging to very different outcomes every run or simply not converging at all, producing swings in losses over the training. To remedy this, we used the paper "Understainding the difficulty of training transformers" (Liu 2020). We implemented several ideas : trying pre-LN normalization did not fix any convergence issue, but changing the optimizer to Adam as well as implementing a warm-up learning rate scheduler did fix the instability. As you can see if you run the model, the losses should be smooth throughout the training. 
    We still could not find how to fix the vanishing gradient problem within the attention layer of the decoder, and hence we produce a lot of flat forecasts. Another issue is that this model contains a lot of parameters, so the weekly dataset on only prices may be limiting its performance significantly as Transformers models need a lot of data to train properly.