#  LSTM with Attention — Architectural Blueprint

## Overview
We adopt a **CNN-BiLSTM-Attention** architecture inspired by modern volatility forecasting research:

1. **CNN layer**: Extracts localized features across the time series (optional but beneficial).
2. **Bi-LSTM**: Processes sequences forward and backward to capture full context.
3. **Attention layer**: Learns to focus on the most informative time steps.
4. **Dense layer**: Outputs the predicted volatility.

---

##  Why this architecture?
| Component | Role |
|-----------|------|
| CNN | Captures short-term, local patterns (e.g., sudden volatility jumps) |
| Bi-LSTM | Captures temporal dependencies in both directions |
| Attention | Dynamically weighs time steps for each forecast |
| Dense Output | Regression to continuous volatility value |

---

## Research References
- **CNN-BiLSTM-Attention** model for volatility forecasting demonstrated superior performance over traditional methods  [oai_citation:5‡sciencedirect.com](https://www.sciencedirect.com/science/article/abs/pii/S0301420723010309?utm_source=chatgpt.com) [oai_citation:6‡MDPI](https://www.mdpi.com/2227-7390/13/11/1889?utm_source=chatgpt.com) [oai_citation:7‡arXiv](https://arxiv.org/abs/2204.02623?utm_source=chatgpt.com) [oai_citation:8‡ajbsr.net](https://ajbsr.net/data/uploads/6141.pdf?utm_source=chatgpt.com) [oai_citation:9‡xml.jips-k.org](https://xml.jips-k.org/full-text/view?doi=10.3745%2FJIPS.02.0121&utm_source=chatgpt.com) [oai_citation:10‡sciencedirect.com](https://www.sciencedirect.com/science/article/pii/S0952197624003816/pdf?utm_source=chatgpt.com).
- **Sparse Multi-Head Attention (SP-M-Attention)** offers computational efficiency and long-range focus  [oai_citation:11‡ResearchGate](https://www.researchgate.net/publication/355245855_Financial_Volatility_Forecasting_A_Sparse_Multi-Head_Attention_Neural_Network?utm_source=chatgpt.com).
- **Multi-Transformer** model showed promise for stock volatility forecasting using transformer encoders  [oai_citation:12‡arXiv](https://arxiv.org/abs/2109.12621?utm_source=chatgpt.com).

## LSTM Data Preproccessing

For this notebook objective the idea is to preproccess the data for the LSTM modelc with the objective of proper training parameters. In this matter this notebook will achieve the following:

1. Loads SPY OHLCV data
2. Merges and aligns the data
3. Computes log returns and rolling features
4. Applies normalization
5. Transforms data into sequences suitable for LSTM
6. Splits into training and testing sets
7. Saves the processed data for model training

In [1]:
# Getting all the necessary imports
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import os

In [2]:
# Model Parameters
SEQUENCE_LENGTH = 60  # Time steps for LSTM input
TEST_RATIO = 0.2

In [12]:
# Create training sequence
def create_sequences(data, target_col):
    X, y = [], []
    for i in range(SEQUENCE_LENGTH, len(data)):
        X.append(data.iloc[i-SEQUENCE_LENGTH:i].values)
        y.append(data[target_col].iloc[i])
    return np.array(X), np.array(y)

In [5]:
# Getting csv file for SPY
spy = pd.read_csv("../data/processed/spy_lstm.csv", parse_dates=['Date'], index_col='Date').reset_index()
spy["Date"] = pd.to_datetime(spy["Date"])

In [6]:
spy.head(10)

Unnamed: 0,Date,Close,Log_Returns,Realized_Volatility_5,Realized_Volatility_10,Realized_Volatility_20
0,2012-02-01,104.058632,0.008719,0.005348,0.004907,0.004734
1,2012-02-02,104.22361,0.001584,0.004562,0.004711,0.004734
2,2012-02-03,105.684662,0.013921,0.007084,0.00624,0.005444
3,2012-02-06,105.613991,-0.000669,0.006432,0.006114,0.005366
4,2012-02-07,105.881073,0.002526,0.005981,0.006005,0.005366
5,2012-02-08,106.195267,0.002963,0.005685,0.005655,0.005176
6,2012-02-09,106.328827,0.001257,0.005723,0.005096,0.005166
7,2012-02-10,105.543327,-0.007415,0.004237,0.005962,0.005606
8,2012-02-13,106.328827,0.007415,0.005422,0.00587,0.005482
9,2012-02-14,106.195267,-0.001257,0.005481,0.005932,0.005532


In [7]:
spy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3405 entries, 0 to 3404
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Date                    3405 non-null   datetime64[ns]
 1   Close                   3405 non-null   float64       
 2   Log_Returns             3405 non-null   float64       
 3   Realized_Volatility_5   3405 non-null   float64       
 4   Realized_Volatility_10  3405 non-null   float64       
 5   Realized_Volatility_20  3405 non-null   float64       
dtypes: datetime64[ns](1), float64(5)
memory usage: 159.7 KB


In [10]:
# Normalizing the data
scaler = StandardScaler()
scaled_features = scaler.fit_transform(spy.drop(columns=['Date', 'Close']))
df_scaled = pd.DataFrame(scaled_features, index=spy.index, columns=['log_return', 'volatility_5d', 'volatility_10d', 'volatility_20d'])


In [13]:
df_scaled.head(10)

Unnamed: 0,log_return,volatility_5d,volatility_10d,volatility_20d
0,0.77014,-0.448171,-0.611566,-0.728602
1,0.09852,-0.563719,-0.643077,-0.728613
2,1.259876,-0.192997,-0.398245,-0.606294
3,-0.11358,-0.288793,-0.418411,-0.619597
4,0.187148,-0.355092,-0.435908,-0.619614
5,0.22832,-0.398618,-0.491856,-0.652419
6,0.06771,-0.39311,-0.581385,-0.654187
7,-0.748622,-0.611534,-0.442733,-0.578329
8,0.647403,-0.437305,-0.45748,-0.599751
9,-0.16893,-0.428724,-0.447527,-0.591036


In [15]:
X, y = create_sequences(df_scaled, target_col='volatility_5d')

In [16]:
# Split Dataset into training and testing sets
split_index = int(len(X) * (1 - TEST_RATIO))
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

print(f"Training set size: {X_train.shape}, {y_train.shape}")
print(f"Testing set size: {X_test.shape}, {y_test.shape}")

Training set size: (2676, 60, 4), (2676,)
Testing set size: (669, 60, 4), (669,)


In [None]:
# Saving the processed data
os.makedirs('../data/model_input', exist_ok=True)
np.save('../data/model_input/X_train.npy', X_train)
np.save('../data/model_input/X_test.npy', X_test)
np.save('../data/model_input/y_train.npy', y_train)
np.save('../data/model_input/y_test.npy', y_test)

## Notebook Final step

This workflow prepares sequence-labeled time series data for training attention-enhanced LSTM models.
It includes SPY log returns, VIX features, and rolling volatilities as predictors.