# New York Stock Exchange: Time-Series Forecasting

## Project Overview
This project focuses on developing a robust Deep Learning pipeline to forecast stock prices using data from the New York Stock Exchange (NYSE). Utilizing historical daily price data (Open, High, Low, Close, and Volume), we aim to predict future closing prices. The workflow is designed to be rigorous, covering data validation, exploratory analysis, advanced feature engineering, and model deployment.

**Dataset**: The analysis uses the [Kaggle NYSE dataset](https://www.kaggle.com/datasets/dgawlik/nyse), specifically `prices-split-adjusted.csv`.

### Methodology: Why Long Short-Term Memory (LSTM)?
Financial time-series data is inherently noisy and non-stationary. Traditional feed-forward networks often fail to capture the temporal dependencies—the "story" of the price movement over time.

We selected **Long Short-Term Memory (LSTM)** networks over Vanilla RNNs and GRUs for the following reasons:
1.  **Long-Term Dependencies**: Vanilla RNNs suffer from the vanishing gradient problem, making them unable to remember patterns from 30+ days ago. LSTMs possess an internal cell state efficiently managed by gates (Input, Forget, Output), allowing them to retain critical trend information over long sequences.
2.  **Complexity Handling**: While GRUs are computationally cheaper, LSTMs generally offer superior performance on complex, volatile datasets like stock markets due to their separate memory cell and hidden state.

![RNN vs LSTM vs GRU Comparison](RNN%20vs%20LSTM%20vs%20GRU.png)

### Architecture: Stacked LSTM
To capture patterns at different levels of abstraction, we implement a **Stacked LSTM** architecture:
- **Layer 1**: Captures low-level temporal features and short-term volatility.
- **Layer 2**: Integrates these features to understand broader market trends.

### System Flow Diagram

```text
                  Global Trend Integration
                          │
       Target Output ──── (L2_30)
                          ▲  ▲
Recurrent Flow (Time) ────│──┘
                          │ 
    [Layer 2: Context]  (h2) ──► (h2) ──► ... ──► (h2)
                          ▲       ▲                ▲
                          │       │                │
    [Layer 1: Sensor]   (h1) ──► (h1) ──► ... ──► (h1)
                          ▲       ▲                ▲
                          │       │                │
    [Input Markets]     [X_1]   [X_2]            [X_30]
                       t=1     t=2              t=30
```


---

## 1. Environment Setup
We begin by importing the necessary libraries for data manipulation (Pandas), numerical computing (NumPy), and deep learning (TensorFlow/Keras). We also configure the plotting environment using Plotly for high-fidelity interactive visualizations and set random seeds to ensure the reproducibility of our results.

In [1]:
import os
import json
import pickle
import random
from pathlib import Path

import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio

# ==============================
# Reproducibility
# ==============================
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

# ==============================
# Plotly Template (Optional)
# ==============================
PLOTLY_TEMPLATE = "plotly_dark"
px.defaults.template = PLOTLY_TEMPLATE


2026-02-18 19:56:58.004340: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771444618.294766      17 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771444618.386544      17 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771444619.117228      17 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771444619.117330      17 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771444619.117334      17 computation_placer.cc:177] computation placer alr

## 2. Project Configuration
Here we define the global hyperparameters and constants that control the pipeline. This includes the target ticker symbol, the lookback window size (sequence length), and model training parameters like batch size and epoch count. Centralizing these configurations ensures consistency and ease of experimentation.

In [2]:
DATA_DIR = Path("/kaggle/input/datasets/dgawlik/nyse")
DATA_FILE = DATA_DIR / "prices-split-adjusted.csv"

TICKER = "EQIX"
FEATURE_COLUMNS = ["open", "high", "low", "close", "volume"]
TARGET_COLUMN = "close"
SEQUENCE_LENGTH = 30

TRAIN_RATIO = 0.80
VALID_RATIO = 0.10
TEST_RATIO = 0.10

LSTM_UNITS = 96
LSTM_LAYERS = 2
DROPOUT_RATE = 0.15
BATCH_SIZE = 64
EPOCHS = 40

ARTIFACT_DIR = Path("/kaggle/working/nyse_lstm_artifacts")
ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)

if not DATA_FILE.exists():
    raise FileNotFoundError(f"Missing dataset: {DATA_FILE}")

print("Dataset:", DATA_FILE)
print("Artifacts:", ARTIFACT_DIR)


Dataset: /kaggle/input/datasets/dgawlik/nyse/prices-split-adjusted.csv
Artifacts: /kaggle/working/nyse_lstm_artifacts


## 3. Data Ingestion and Validation
We load the raw CSV data and perform initial validation checks. This step is critical to ensure the integrity of the data before processing. We verify the existence of the specific ticker symbol and inspect the data structure to confirm it parses correctly.

In [3]:
market_df = pd.read_csv(DATA_FILE)
market_df["date"] = pd.to_datetime(market_df["date"])

print("Rows:", len(market_df))
print("Unique symbols:", market_df["symbol"].nunique())

available_symbols = sorted(market_df["symbol"].unique())
if TICKER not in available_symbols:
    raise ValueError(f"Ticker '{TICKER}' not found. Example symbols: {available_symbols[:20]}")

ticker_df = (
    market_df.loc[market_df["symbol"] == TICKER, ["date", "symbol"] + FEATURE_COLUMNS]
    .sort_values("date")
    .reset_index(drop=True)
)

ticker_df.head()


Rows: 851264
Unique symbols: 501


Unnamed: 0,date,symbol,open,high,low,close,volume
0,2010-01-04,EQIX,106.519997,109.620003,106.510002,109.559998,576300.0
1,2010-01-05,EQIX,109.589996,109.589996,108.379997,108.540001,681900.0
2,2010-01-06,EQIX,108.949997,110.57,108.220001,109.529999,1397500.0
3,2010-01-07,EQIX,109.25,110.349998,106.639999,107.290001,797200.0
4,2010-01-08,EQIX,106.800003,107.279999,105.900002,106.769997,432400.0


## 4. Exploratory Data Analysis (EDA)
Before modeling, it is essential to understand the underlying behavior of the asset. We generate a combined Candlestick and Volume chart to visualize the price action, identifying potential support/resistance levels and trading intensity over the historical period.

In [4]:
# 4.1 Candlestick + volume overview
fig = make_subplots(
    rows=2,
    cols=1,
    shared_xaxes=True,
    vertical_spacing=0.06,
    subplot_titles=(f"{TICKER} Candlestick", f"{TICKER} Volume"),
    row_heights=[0.72, 0.28],
)

fig.add_trace(
    go.Candlestick(
        x=ticker_df["date"],
        open=ticker_df["open"],
        high=ticker_df["high"],
        low=ticker_df["low"],
        close=ticker_df["close"],
        name="OHLC",
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Bar(x=ticker_df["date"], y=ticker_df["volume"], name="Volume", marker_color="#5DADE2"),
    row=2,
    col=1,
)

fig.update_layout(height=780, width=1200, template=PLOTLY_TEMPLATE, xaxis_rangeslider_visible=False)
fig.update_yaxes(title_text="Price", row=1, col=1)
fig.update_yaxes(title_text="Volume", row=2, col=1)
fig.show()


In [5]:
# 4.2 Trend and return structure
viz_df = ticker_df.copy()
viz_df["ma_20"] = viz_df["close"].rolling(20).mean()
viz_df["ma_60"] = viz_df["close"].rolling(60).mean()
viz_df["daily_return"] = viz_df["close"].pct_change()

fig = make_subplots(rows=1, cols=2, subplot_titles=("Close + Moving Averages", "Daily Return Distribution"))

fig.add_trace(go.Scatter(x=viz_df["date"], y=viz_df["close"], mode="lines", name="Close", line=dict(color="#58D68D")), row=1, col=1)
fig.add_trace(go.Scatter(x=viz_df["date"], y=viz_df["ma_20"], mode="lines", name="MA 20", line=dict(color="#F5B041")), row=1, col=1)
fig.add_trace(go.Scatter(x=viz_df["date"], y=viz_df["ma_60"], mode="lines", name="MA 60", line=dict(color="#EC7063")), row=1, col=1)

fig.add_trace(go.Histogram(x=viz_df["daily_return"].dropna(), nbinsx=70, name="Returns", marker_color="#A569BD"), row=1, col=2)
fig.add_vline(x=0, line_dash="dash", line_color="white", row=1, col=2)

fig.update_layout(height=460, width=1250, template=PLOTLY_TEMPLATE)
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_yaxes(title_text="Price", row=1, col=1)
fig.update_xaxes(title_text="Return", row=1, col=2)
fig.update_yaxes(title_text="Count", row=1, col=2)
fig.show()


In [6]:
# 4.3 Correlation heatmap
corr = ticker_df[FEATURE_COLUMNS].corr()
fig = go.Figure(
    data=go.Heatmap(
        z=corr.values,
        x=corr.columns,
        y=corr.index,
        colorscale="Viridis",
        text=np.round(corr.values, 2),
        texttemplate="%{text}",
    )
)
fig.update_layout(title=f"{TICKER} Feature Correlation", height=520, width=700, template=PLOTLY_TEMPLATE)
fig.show()


## 5. Feature Engineering and Preprocessing
Neural networks require numerical inputs on a similar scale to converge effectively. In this section, we:
1.  **Normalize** the data using MinMax scaling to the [0, 1] range.
2.  **Sequence Construction**: Convert the linear time-series data into a 3-dimensional supervised learning format `(Samples, TimeSteps, Features)` required by the LSTM layers.

In [7]:
# Original-style OHLC normalization (column-wise min-max)
def normalize_ohlc(df_stock):
    scaler = MinMaxScaler()
    out = df_stock.copy()
    out["open"] = scaler.fit_transform(df_stock[["open"]])
    out["high"] = scaler.fit_transform(df_stock[["high"]])
    out["low"] = scaler.fit_transform(df_stock[["low"]])
    out["close"] = scaler.fit_transform(df_stock[["close"]])
    return out


def create_sequences(stock_frame, seq_len, valid_pct, test_pct):
    data_raw = stock_frame.values.astype(np.float32)
    seq = []
    for i in range(len(data_raw) - seq_len):
        seq.append(data_raw[i: i + seq_len])

    seq = np.array(seq, dtype=np.float32)
    valid_size = int(np.round(valid_pct / 100 * seq.shape[0]))
    test_size = int(np.round(test_pct / 100 * seq.shape[0]))
    train_size = seq.shape[0] - (valid_size + test_size)

    x_train = seq[:train_size, :-1, :]
    y_train = seq[:train_size, -1, :]
    x_valid = seq[train_size:train_size + valid_size, :-1, :]
    y_valid = seq[train_size:train_size + valid_size, -1, :]
    x_test = seq[train_size + valid_size:, :-1, :]
    y_test = seq[train_size + valid_size:, -1, :]

    return x_train, y_train, x_valid, y_valid, x_test, y_test


# IMPORTANT: remove non-numeric columns used only for plotting/labels
model_df = ticker_df[ticker_df["symbol"] == TICKER].copy()
model_df = model_df.drop(columns=["date", "symbol", "volume"])

# Keep deterministic feature order for indexing consistency
MODEL_FEATURES = ["open", "high", "low", "close"]
model_df = model_df[MODEL_FEATURES]

model_df_norm = normalize_ohlc(model_df)

x_train, y_train, x_valid, y_valid, x_test, y_test = create_sequences(
    model_df_norm,
    SEQUENCE_LENGTH,
    valid_pct=VALID_RATIO * 100,
    test_pct=TEST_RATIO * 100,
)

print("model_df dtypes:\n", model_df.dtypes)
print("x_train:", x_train.shape, "| y_train:", y_train.shape, "| dtype:", x_train.dtype)
print("x_valid:", x_valid.shape, "| y_valid:", y_valid.shape, "| dtype:", x_valid.dtype)
print("x_test:", x_test.shape, "| y_test:", y_test.shape, "| dtype:", x_test.dtype)



model_df dtypes:
 open     float64
high     float64
low      float64
close    float64
dtype: object
x_train: (1386, 29, 4) | y_train: (1386, 4) | dtype: float32
x_valid: (173, 29, 4) | y_valid: (173, 4) | dtype: float32
x_test: (173, 29, 4) | y_test: (173, 4) | dtype: float32


In [8]:
# Preprocessing effect visualization: raw OHLC vs normalized OHLC
fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("Before Normalization (Raw OHLC)", "After Normalization (MinMax OHLC)")
)

for col, color in zip(MODEL_FEATURES, ["#E74C3C", "#F1C40F", "#3498DB", "#2ECC71"]):
    fig.add_trace(
        go.Scatter(
            x=np.arange(len(model_df)),
            y=model_df[col].values,
            mode="lines",
            name=f"raw_{col}",
            line=dict(color=color),
            legendgroup=col,
        ),
        row=1,
        col=1,
    )

for col, color in zip(MODEL_FEATURES, ["#E74C3C", "#F1C40F", "#3498DB", "#2ECC71"]):
    fig.add_trace(
        go.Scatter(
            x=np.arange(len(model_df_norm)),
            y=model_df_norm[col].values,
            mode="lines",
            name=f"norm_{col}",
            line=dict(color=color, dash="dot"),
            legendgroup=col,
            showlegend=False,
        ),
        row=1,
        col=2,
    )

fig.update_xaxes(title_text="Time [days]", row=1, col=1)
fig.update_yaxes(title_text="Raw price", row=1, col=1)
fig.update_xaxes(title_text="Time [days]", row=1, col=2)
fig.update_yaxes(title_text="Scaled value (0-1)", row=1, col=2)
fig.update_layout(height=460, width=1300, template=PLOTLY_TEMPLATE)
fig.show()



### Neural Network Structure
The following diagram illustrates the exact tensor flow through our Staked LSTM architecture, detailing the dimensionality changes at every step:

```text
                     Stock Price Prediction (Stacked LSTM)
                                      │
              Predicted Close Price ──(Output: 1)
                                      ▲
              [Layer 4: Output]    Dense(1)
                                      ▲
              [Layer 3: Dense]     Dense(25)
                                      ▲
                                      │  ← Single context vector (96,)
  [Layer 2: LSTM]  (h2) ──► (h2) ──► ... ──► (h2) → 96 features
  return_seq=False   ▲       ▲                 ▲
                     │       │                 │
  [Layer 1: LSTM]  (h1) ──► (h1) ──► ... ──► (h1)   96-dim sequence
  return_seq=True    ▲       ▲                 ▲
                     │       │                 │
                  [Open ]  [Open ]           [Open ]
                  [High ]  [High ]           [High ]
                  [Low  ]  [Low  ]           [Low  ]
                  [Close]  [Close]           [Close]
                  [Vol  ]  [Vol  ]           [Vol  ]
                    t=1      t=2    . . .     t=30

                  Input Shape: (30, 5)
```


## 6. Model Architecture and Training
We construct the Deep Learning model using the Keras Functional API. The model includes:
-   Two **LSTM layers** with Dropout regularization to prevent overfitting.
-   **Dense layers** to map the LSTM outputs to the final regression value.
-   **Early Stopping** and **Model Checkpointing** callbacks to save the best weights and halt training if validation loss plateaus.

In [9]:
n_steps = SEQUENCE_LENGTH - 1
n_inputs = 4
n_outputs = 4

lstm_model = Sequential(name="nyse_lstm_ohlc")
lstm_model.add(Input(shape=(n_steps, n_inputs), name="sequence_input"))

for idx in range(LSTM_LAYERS):
    ret_seq = idx < (LSTM_LAYERS - 1)
    lstm_model.add(LSTM(LSTM_UNITS, return_sequences=ret_seq))
    lstm_model.add(Dropout(DROPOUT_RATE))

lstm_model.add(Dense(n_outputs))
lstm_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss="mse")

best_path = ARTIFACT_DIR / "best_lstm.keras"
callbacks = [
    EarlyStopping(monitor="val_loss", patience=7, restore_best_weights=True),
    ModelCheckpoint(filepath=str(best_path), monitor="val_loss", save_best_only=True),
]

history = lstm_model.fit(
    x_train,
    y_train,
    validation_data=(x_valid, y_valid),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    verbose=1,
    callbacks=callbacks,
)



2026-02-18 19:57:26.181095: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


Epoch 1/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 101ms/step - loss: 0.0513 - val_loss: 0.0053
Epoch 2/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 71ms/step - loss: 0.0035 - val_loss: 0.0027
Epoch 3/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 72ms/step - loss: 0.0017 - val_loss: 0.0021
Epoch 4/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 75ms/step - loss: 0.0015 - val_loss: 0.0021
Epoch 5/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 71ms/step - loss: 0.0014 - val_loss: 0.0015
Epoch 6/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 75ms/step - loss: 0.0014 - val_loss: 0.0015
Epoch 7/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 82ms/step - loss: 0.0013 - val_loss: 0.0013
Epoch 8/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 74ms/step - loss: 0.0012 - val_loss: 0.0017
Epoch 9/40
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━

In [10]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=history.history["loss"], mode="lines", name="train_loss", line=dict(color="#58D68D")))
fig.add_trace(go.Scatter(y=history.history["val_loss"], mode="lines", name="val_loss", line=dict(color="#EC7063")))
fig.update_layout(title="Training vs Validation Loss", xaxis_title="Epoch", yaxis_title="MSE", height=450, width=1000, template=PLOTLY_TEMPLATE)
fig.show()


## 7. Performance Evaluation
Once trained, we generate predictions on the Training, Validation, and Test sets. We evaluate the model using:
-   **MAE & RMSE**: To measure the average magnitude of the prediction error.
-   **Directional Accuracy**: To assess the model's ability to correctly classify the direction of the next day's price movement (Up vs. Down), a crucial metric for trading strategies.

In [11]:
y_train_pred = lstm_model.predict(x_train, verbose=0)
y_valid_pred = lstm_model.predict(x_valid, verbose=0)
y_test_pred = lstm_model.predict(x_test, verbose=0)

feature_index_map = {name: i for i, name in enumerate(MODEL_FEATURES)}
ft = feature_index_map["open"]  # plotted feature index
idx_open = feature_index_map["open"]
idx_close = feature_index_map["close"]

mae = mean_absolute_error(y_test[:, ft], y_test_pred[:, ft])
rmse = np.sqrt(mean_squared_error(y_test[:, ft], y_test_pred[:, ft]))

corr_price_development_train = np.mean(np.sign(y_train[:, idx_close]-y_train[:, idx_open]) == np.sign(y_train_pred[:, idx_close]-y_train_pred[:, idx_open]))
corr_price_development_valid = np.mean(np.sign(y_valid[:, idx_close]-y_valid[:, idx_open]) == np.sign(y_valid_pred[:, idx_close]-y_valid_pred[:, idx_open]))
corr_price_development_test = np.mean(np.sign(y_test[:, idx_close]-y_test[:, idx_open]) == np.sign(y_test_pred[:, idx_close]-y_test_pred[:, idx_open]))

print(f"MAE: {mae:.6f}")
print(f"RMSE: {rmse:.6f}")
print('correct sign prediction for close-open train/valid/test: %.2f/%.2f/%.2f' % (
    corr_price_development_train,
    corr_price_development_valid,
    corr_price_development_test,
))



MAE: 0.018641
RMSE: 0.023467
correct sign prediction for close-open train/valid/test: 0.53/0.36/0.39


In [12]:
fig = make_subplots(rows=1, cols=2, subplot_titles=("Past and Future Stock Prices", "Future Stock Prices"))

fig.add_trace(go.Scatter(x=np.arange(y_train.shape[0]), y=y_train[:,ft], mode="lines", name="train target", line=dict(color="#5DADE2")), row=1, col=1)
fig.add_trace(go.Scatter(x=np.arange(y_train.shape[0], y_train.shape[0]+y_valid.shape[0]), y=y_valid[:,ft], mode="lines", name="valid target", line=dict(color="#AAB7B8")), row=1, col=1)
fig.add_trace(go.Scatter(x=np.arange(y_train.shape[0]+y_valid.shape[0], y_train.shape[0]+y_valid.shape[0]+y_test.shape[0]), y=y_test[:,ft], mode="lines", name="test target", line=dict(color="#F8C471")), row=1, col=1)

fig.add_trace(go.Scatter(x=np.arange(y_train_pred.shape[0]), y=y_train_pred[:,ft], mode="lines", name="train prediction", line=dict(color="#2ECC71")), row=1, col=1)
fig.add_trace(go.Scatter(x=np.arange(y_train_pred.shape[0], y_train_pred.shape[0]+y_valid_pred.shape[0]), y=y_valid_pred[:,ft], mode="lines", name="valid prediction", line=dict(color="#F39C12")), row=1, col=1)
fig.add_trace(go.Scatter(x=np.arange(y_train_pred.shape[0]+y_valid_pred.shape[0], y_train_pred.shape[0]+y_valid_pred.shape[0]+y_test_pred.shape[0]), y=y_test_pred[:,ft], mode="lines", name="test prediction", line=dict(color="#E74C3C")), row=1, col=1)

fig.add_trace(go.Scatter(x=np.arange(y_test.shape[0]), y=y_test[:,ft], mode="lines", name="test target (zoom)", line=dict(color="#F8C471")), row=1, col=2)
fig.add_trace(go.Scatter(x=np.arange(y_test_pred.shape[0]), y=y_test_pred[:,ft], mode="lines", name="test prediction (zoom)", line=dict(color="#E74C3C")), row=1, col=2)

fig.update_xaxes(title_text="Time [days]", row=1, col=1)
fig.update_yaxes(title_text="Normalized price", row=1, col=1)
fig.update_xaxes(title_text="Time [days]", row=1, col=2)
fig.update_yaxes(title_text="Normalized price", row=1, col=2)
fig.update_layout(height=470, width=1300, template=PLOTLY_TEMPLATE)
fig.show()


In [13]:
residuals = y_test[:, ft] - y_test_pred[:, ft]

fig = make_subplots(rows=1, cols=2, subplot_titles=("Residual Distribution", "Residuals vs Predicted"))
fig.add_trace(go.Histogram(x=residuals, nbinsx=40, marker_color="#AF7AC5", name="residuals"), row=1, col=1)
fig.add_vline(x=0, line_dash="dash", line_color="white", row=1, col=1)

fig.add_trace(
    go.Scatter(
        x=y_test_pred[:, ft],
        y=residuals,
        mode="markers",
        marker=dict(size=6, opacity=0.6, color="#5DADE2"),
        name="residual scatter",
    ),
    row=1,
    col=2,
)
fig.add_hline(y=0, line_dash="dash", line_color="white", row=1, col=2)

fig.update_xaxes(title_text="Residual", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_xaxes(title_text="Predicted", row=1, col=2)
fig.update_yaxes(title_text="Residual", row=1, col=2)
fig.update_layout(height=460, width=1250, template=PLOTLY_TEMPLATE)
fig.show()


## 8) Export for Deployment


In [14]:
model_path = ARTIFACT_DIR / "nyse_lstm_ohlc.keras"
lstm_model.save(model_path)

config = {
    "ticker": TICKER,
    "feature_columns": FEATURE_COLUMNS,
    "target_column": TARGET_COLUMN,
    "sequence_length": SEQUENCE_LENGTH,
    "lstm_units": LSTM_UNITS,
    "lstm_layers": LSTM_LAYERS,
    "dropout_rate": DROPOUT_RATE,
}

(ARTIFACT_DIR / "config.json").write_text(json.dumps(config, indent=2), encoding="utf-8")

preprocess_meta = {
    "normalization": "column-wise MinMax on OHLC",
    "columns": ["open", "high", "low", "close"],
}
with open(ARTIFACT_DIR / "preprocess_meta.pkl", "wb") as f:
    pickle.dump(preprocess_meta, f)

print("Saved artifacts:")
for fpath in sorted(ARTIFACT_DIR.glob("*")):
    print("-", fpath)


Saved artifacts:
- /kaggle/working/nyse_lstm_artifacts/best_lstm.keras
- /kaggle/working/nyse_lstm_artifacts/config.json
- /kaggle/working/nyse_lstm_artifacts/nyse_lstm_ohlc.keras
- /kaggle/working/nyse_lstm_artifacts/preprocess_meta.pkl


## 9) Inference Utility


In [15]:
def predict_next_ohlc(model, normalized_window, seq_len=SEQUENCE_LENGTH):
    # normalized_window expected shape: (seq_len - 1, 4)
    x = np.asarray(normalized_window, dtype=np.float32)
    expected = (seq_len - 1, 4)
    if x.shape != expected:
        raise ValueError(f"Expected {expected}, got {x.shape}")

    pred = model.predict(x.reshape(1, seq_len - 1, 4), verbose=0)
    return pred.reshape(-1)

# Example:
# next_pred = predict_next_ohlc(lstm_model, x_test[0])
# print(next_pred)


## 10) Final Notes
- Change `TICKER` and rerun from top to retrain for another stock.
- Artifacts are written to `/kaggle/working/nyse_lstm_artifacts`.
- Keep sequence formatting identical during training and inference.


## 11. Targeted Ticker Analysis (VZ & WAT)
In this final section, we validate the pipeline's robustness by applying it to specific tickers of interest. We perform independent data splitting, training, and visualization for **Verizon (VZ)** and **Waters Corporation (WAT)** to demonstrate the model's adaptability.

In [16]:
def run_ticker_pipeline(symbol, df, seq_len=30, epochs=10):
    print(f"\nProcessing {symbol}...")
    
    # 1. Data Prep
    tdf = df[df['symbol'] == symbol].sort_values('date').reset_index(drop=True)
    # Ensure we have enough data
    if len(tdf) < seq_len + 100:
        print(f"Not enough data for {symbol}")
        return

    data = tdf[FEATURE_COLUMNS].values
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    target_idx = FEATURE_COLUMNS.index(TARGET_COLUMN)
    
    X, y = [], []
    for i in range(seq_len, len(scaled_data)):
        X.append(scaled_data[i-seq_len:i])
        y.append(scaled_data[i, target_idx])
        
    X = np.array(X)
    y = np.array(y).reshape(-1, 1) # Ensure 2D shape (samples, 1)
    
    # 80/10/10 Split logic to match the requested visualization structure
    N = len(X)
    train_size = int(N * 0.80)
    valid_size = int(N * 0.10)
    test_size = N - train_size - valid_size
    
    X_train, y_train = X[:train_size], y[:train_size]
    X_valid, y_valid = X[train_size:train_size+valid_size], y[train_size:train_size+valid_size]
    X_test, y_test = X[train_size+valid_size:], y[train_size+valid_size:]
    
    # 2. Build New Model (Fresh for each ticker)
    model = Sequential([
        Input(shape=(X_train.shape[1], X_train.shape[2])),
        LSTM(LSTM_UNITS, return_sequences=True),
        Dropout(DROPOUT_RATE),
        LSTM(LSTM_UNITS, return_sequences=False),
        Dropout(DROPOUT_RATE),
        Dense(25),
        Dense(1)
    ])
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    # 3. Train
    print(f"Training model for {symbol}...")
    history = model.fit(
        X_train, y_train, 
        batch_size=BATCH_SIZE, 
        epochs=epochs, 
        validation_data=(X_valid, y_valid),
        verbose=0,
        callbacks=[EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)]
    )
    
    # 4. Predict
    y_train_pred = model.predict(X_train, verbose=0)
    y_valid_pred = model.predict(X_valid, verbose=0)
    y_test_pred = model.predict(X_test, verbose=0)
    
    # 5. Visualize using requested structure
    ft = 0 # Feature index for plotting (since y is shape (N,1), index 0 is the target)
    
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Past and Future Stock Prices", "Future Stock Prices"))

    # --- Panel 1: Full Timeline ---
    # Train
    fig.add_trace(go.Scatter(x=np.arange(y_train.shape[0]), y=y_train[:,ft], mode="lines", name="train target", line=dict(color="#5DADE2")), row=1, col=1)
    fig.add_trace(go.Scatter(x=np.arange(y_train_pred.shape[0]), y=y_train_pred[:,ft], mode="lines", name="train prediction", line=dict(color="#2ECC71")), row=1, col=1)
    
    # Valid
    valid_start = y_train.shape[0]
    fig.add_trace(go.Scatter(x=np.arange(valid_start, valid_start+y_valid.shape[0]), y=y_valid[:,ft], mode="lines", name="valid target", line=dict(color="#AAB7B8")), row=1, col=1)
    fig.add_trace(go.Scatter(x=np.arange(valid_start, valid_start+y_valid_pred.shape[0]), y=y_valid_pred[:,ft], mode="lines", name="valid prediction", line=dict(color="#F39C12")), row=1, col=1)
    
    # Test
    test_start = valid_start + y_valid.shape[0]
    fig.add_trace(go.Scatter(x=np.arange(test_start, test_start+y_test.shape[0]), y=y_test[:,ft], mode="lines", name="test target", line=dict(color="#F8C471")), row=1, col=1)
    fig.add_trace(go.Scatter(x=np.arange(test_start, test_start+y_test_pred.shape[0]), y=y_test_pred[:,ft], mode="lines", name="test prediction", line=dict(color="#E74C3C")), row=1, col=1)

    # --- Panel 2: Zoom on Test ---
    fig.add_trace(go.Scatter(x=np.arange(y_test.shape[0]), y=y_test[:,ft], mode="lines", name="test target (zoom)", line=dict(color="#F8C471"), showlegend=False), row=1, col=2)
    fig.add_trace(go.Scatter(x=np.arange(y_test_pred.shape[0]), y=y_test_pred[:,ft], mode="lines", name="test prediction (zoom)", line=dict(color="#E74C3C"), showlegend=False), row=1, col=2)

    fig.update_xaxes(title_text="Time [days]", row=1, col=1)
    fig.update_yaxes(title_text="Normalized price", row=1, col=1)
    fig.update_xaxes(title_text="Time [days]", row=1, col=2)
    fig.update_yaxes(title_text="Normalized price", row=1, col=2)
    fig.update_layout(height=470, width=1300, template=PLOTLY_TEMPLATE, title_text=f"Results for {symbol}")
    fig.show()

In [17]:
# Define specific tickers for analysis
target_tickers = ['VZ', 'WAT']
print(f"Analyzing Specific Tickers: {target_tickers}")

for sym in target_tickers:
    try:
        run_ticker_pipeline(sym, market_df, seq_len=SEQUENCE_LENGTH, epochs=20)
    except Exception as e:
        print(f"Error processing {sym}: {e}")

Analyzing Specific Tickers: ['VZ', 'WAT']

Processing VZ...
Training model for VZ...



Processing WAT...
Training model for WAT...


## 13. Architectural Decisions & Technical Q&A

### Q1: Why choose a Stacked LSTM over a Vanilla LSTM or other architectures (like GRU/RNN)?
**Answer:**
The primary advantage is **Hierarchical Feature Learning**.
- **Vanilla LSTM:** Effective at learning temporal dependencies but limited to a single level of abstraction.
- **Stacked LSTM:** Mimics the "Deep" structure of Deep Learning. 
  - **Layer 1 (Feature Extractor)**: Identifies raw temporal, low-level patterns effectively (e.g., volatility spikes, immediate momentum).
  - **Layer 2 (Pattern Integrator)**: Takes the *sequence of patterns* from Layer 1 as input (rather than raw prices). It models higher-level, slower-moving concepts like "Trend Reversal" or "Consolidation". This depth allows the model to capture complex, non-linear market regimes that a shallow network would miss.

### Q2: Technically, why defines the usage of `return_sequences=True` in Layer 1 vs `False` in Layer 2?
**Answer:**
This parameter controls the tensor dimensionality flow required for stacking:
- **Layer 1 (`return_sequences=True`)**: Outputs the full sequence of hidden states $(h_1, h_2, ..., h_{30})$ resulting in a 3D tensor of shape `(Batch, 30, 96)`. This preserves the **time dimension**, ensuring the next layer sees the data as a sequence.
- **Layer 2 (`return_sequences=False`)**: Outputs only the final hidden state $h_{30}$, resulting in a 2D tensor of shape `(Batch, 96)`. This effectively **encodes** the entire time-series window into a single fixed-length context vector, which is required for the subsequent Dense (fully connected) layers to perform the final regression.

### Q3: How does the LSTM Cell State ($C_t$) technically mitigate the Vanishing Gradient problem during Backpropagation?
**Answer:**
In standard RNNs, gradients multiply through the derivative of the activation function (tanh) at every time step, causing them to decay exponentially (vanish) over long sequences.
- **The Technical Fix**: The LSTM Cell State acts as a **Linear Residual Connection** highway.
- **Mechanism**: During Backpropagation Through Time (BPTT), the gradient flows through the Cell State ($C_t$) via primarily **additive** operations rather than multiplicative ones. The "Forget Gate" regulates this flow linearly. If the gate is open (near 1.0), the error signal propagates back through time unattenuated, allowing the network to learn dependencies from Day 1 just as strongly as Day 30.