# Task 2 — TSLA Time Series Forecasting (ARIMA/SARIMA vs Multivariate LSTM)

This notebook is the **only** notebook for Task 2. All modeling and artifact generation is done via scripts.

## Required deliverables shown here
1. **Chronological split** with cutoff on the **last trading day of 2024** (loaded from `split_info.json`).
2. **ARIMA/SARIMA model specification** (loaded from `arima_params.json`).
3. **LSTM architecture + training configuration** (loaded from `lstm_architecture.json`).
4. **Forecast CSVs aligned to test dates** (`tsla_arima_forecast.csv`, `tsla_lstm_forecast.csv`, `tsla_forecasts_merged.csv`).
5. **Performance metrics table** with MAE/RMSE/MAPE (`model_comparison.csv`).
6. A brief **discussion**: which model performed better and why, grounded in the metrics and behavior.


In [None]:
# Standard imports
import os
import json
import warnings
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

warnings.filterwarnings("ignore")

def _find_repo_root(start: Path) -> Path:
    start = start.resolve()
    for candidate in [start, *start.parents]:
        if (candidate / "src").is_dir() and (candidate / "outputs").exists():
            return candidate
    return start

# Ensure REPO_ROOT is always a concrete Path (avoids Path | Unbound type issues in VS Code/Pylance)
_existing_repo_root = globals().get("REPO_ROOT", None)
if isinstance(_existing_repo_root, Path):
    REPO_ROOT = _existing_repo_root
else:
    REPO_ROOT = _find_repo_root(Path.cwd())
    globals()["REPO_ROOT"] = REPO_ROOT
    print("Repo root (auto-detected):", REPO_ROOT)

ERROR_DIAGNOSTICS_PATH = REPO_ROOT / "outputs" / "task2" / "metrics" / "error_diagnostics.csv"

# Display settings
pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 120)

print("Notebook working directory:", os.getcwd())

## 0) Define paths (repo-relative)

These paths match the script outputs defined in `src/config.py`.


In [None]:
# Repo-relative paths (robust to running from notebooks/)
from pathlib import Path

def _find_repo_root(start: Path) -> Path:
    start = start.resolve()
    for candidate in [start, *start.parents]:
        if (candidate / "src").is_dir() and (candidate / "outputs").exists():
            return candidate
    return start

REPO_ROOT = _find_repo_root(Path.cwd())
print("Repo root:", REPO_ROOT)

SPLIT_INFO_PATH = REPO_ROOT / "outputs" / "task2" / "metrics" / "split_info.json"
ARIMA_PARAMS_PATH = REPO_ROOT / "outputs" / "task2" / "metrics" / "arima_params.json"
LSTM_ARCH_PATH = REPO_ROOT / "outputs" / "task2" / "metrics" / "lstm_architecture.json"
MODEL_COMPARISON_PATH = REPO_ROOT / "outputs" / "task2" / "metrics" / "model_comparison.csv"

ARIMA_FORECAST_PATH = REPO_ROOT / "outputs" / "task2" / "forecasts" / "tsla_arima_forecast.csv"
LSTM_FORECAST_PATH = REPO_ROOT / "outputs" / "task2" / "forecasts" / "tsla_lstm_forecast.csv"
MERGED_FORECASTS_PATH = REPO_ROOT / "outputs" / "task2" / "forecasts" / "tsla_forecasts_merged.csv"

FORECAST_PLOT_PATH = REPO_ROOT / "outputs" / "task2" / "figures" / "forecast_test_period.png"

required_files = [
    SPLIT_INFO_PATH,
    ARIMA_PARAMS_PATH,
    LSTM_ARCH_PATH,
    MODEL_COMPARISON_PATH,
    ARIMA_FORECAST_PATH,
    LSTM_FORECAST_PATH,
    MERGED_FORECASTS_PATH,
 ]

missing = [p for p in required_files if not p.exists()]
if missing:
    print("Missing required artifacts. Run scripts first:")
    for m in missing:
        print(" -", m)
else:
    print("All required artifacts exist.")

print("Optional plot exists:", FORECAST_PLOT_PATH.exists())

## 1) Load split information (proof of correct chronological cutoff)

Task requirement: chronological split where training ends on the **last trading day of 2024**.


In [None]:
with SPLIT_INFO_PATH.open("r", encoding="utf-8") as f:
    split_info = json.load(f)

split_info

### Split summary
- Asset: **TSLA**
- Split year: **2024**
- Cutoff date (train end): **last trading day in 2024**
- Test period begins the next trading day and continues through the dataset end


In [None]:
split_summary = pd.DataFrame([split_info])
split_summary

## 2) Load forecasts (aligned to test dates)

We load:
- ARIMA forecast CSV
- LSTM forecast CSV
- Merged forecast CSV (should align by test dates)


In [None]:
arima_fc = pd.read_csv(ARIMA_FORECAST_PATH)
lstm_fc = pd.read_csv(LSTM_FORECAST_PATH)
merged_fc = pd.read_csv(MERGED_FORECASTS_PATH)

for df in (arima_fc, lstm_fc, merged_fc):
    df["date"] = pd.to_datetime(df["date"])

print("ARIMA forecast shape:", arima_fc.shape)
print("LSTM forecast shape:", lstm_fc.shape)
print("Merged forecast shape:", merged_fc.shape)

arima_fc.head()

In [None]:
lstm_fc.head()

In [None]:
merged_fc.head()

### Forecast alignment checks
We verify that:
- test dates are sorted
- merged dates match ARIMA dates
- merged contains both model predictions


In [None]:
def assert_monotonic_dates(df, name="df"):
    if not df["date"].is_monotonic_increasing:
        raise ValueError(f"{name} dates are not sorted ascending")

assert_monotonic_dates(arima_fc, "arima_fc")
assert_monotonic_dates(lstm_fc, "lstm_fc")
assert_monotonic_dates(merged_fc, "merged_fc")

dates_arima = set(arima_fc["date"])
dates_merged = set(merged_fc["date"])

print("Dates in ARIMA but not merged:", len(dates_arima - dates_merged))
print("Dates in merged but not ARIMA:", len(dates_merged - dates_arima))

missing_cols = [c for c in ["y_true", "arima_pred", "lstm_pred"] if c not in merged_fc.columns]
if missing_cols:
    raise ValueError(f"Merged forecasts missing columns: {missing_cols}")

print("Merged columns OK.")

## 3) Load model specifications (ARIMA/SARIMA parameters + LSTM architecture)

These JSON files are the **documentation artifacts** required for the report.


In [None]:
with open(ARIMA_PARAMS_PATH, "r", encoding="utf-8") as f:
    arima_params = json.load(f)

with open(LSTM_ARCH_PATH, "r", encoding="utf-8") as f:
    lstm_info = json.load(f)

arima_params

### ARIMA/SARIMA summary
Key items to report:
- `order = (p,d,q)`
- `seasonal_order = (P,D,Q,m)` if seasonal
- selection criterion (AIC)


In [None]:
arima_summary = {
    "asset": arima_params.get("asset"),
    "target_col": arima_params.get("target_col"),
    "seasonal": arima_params.get("seasonal"),
    "m": arima_params.get("m"),
    "order": arima_params.get("order"),
    "seasonal_order": arima_params.get("seasonal_order"),
    "aic": arima_params.get("aic"),
    "bic": arima_params.get("bic"),
}
pd.DataFrame([arima_summary])

### LSTM architecture & training summary
We report:
- lookback window
- features used
- scaling strategy
- layers/units/dropout
- epochs/batch size/learning rate


In [None]:
lstm_info

In [None]:
run_cfg = lstm_info.get("run", {})
lstm_summary = {
    "lookback": run_cfg.get("lookback"),
    "horizon": run_cfg.get("horizon"),
    "n_features": len(run_cfg.get("feature_cols", [])),
    "feature_cols": ", ".join(run_cfg.get("feature_cols", [])),
    "scaler_type": run_cfg.get("scaler_type"),
    "units_1": run_cfg.get("units_1"),
    "units_2": run_cfg.get("units_2"),
    "dropout": run_cfg.get("dropout"),
    "rec_dropout": run_cfg.get("rec_dropout"),
    "epochs": run_cfg.get("epochs"),
    "batch_size": run_cfg.get("batch_size"),
    "learning_rate": run_cfg.get("learning_rate"),
}
pd.DataFrame([lstm_summary]).T.rename(columns={0:"value"})

## 4) Required metrics table (MAE / RMSE / MAPE)

This is the required deliverable: **`model_comparison.csv`**.


In [None]:
comparison = pd.read_csv(MODEL_COMPARISON_PATH)
comparison

### Sanity check: recompute metrics from merged forecasts
We recompute MAE/RMSE/MAPE directly from `tsla_forecasts_merged.csv` to confirm consistency.


In [None]:
def mae(y_true, y_pred):
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    return float(np.mean(np.abs(y_true - y_pred)))

def rmse(y_true, y_pred):
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    return float(np.sqrt(np.mean((y_true - y_pred) ** 2)))

def mape(y_true, y_pred, eps=1e-8):
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    denom = np.maximum(np.abs(y_true), eps)
    return float(np.mean(np.abs((y_true - y_pred) / denom)) * 100.0)

y_true = merged_fc["y_true"].values
arima_pred = merged_fc["arima_pred"].values
lstm_pred = merged_fc["lstm_pred"].values

recalc = pd.DataFrame([
    {"model": "ARIMA_recalc", "MAE": mae(y_true, arima_pred), "RMSE": rmse(y_true, arima_pred), "MAPE_pct": mape(y_true, arima_pred)},
    {"model": "LSTM_recalc", "MAE": mae(y_true, lstm_pred), "RMSE": rmse(y_true, lstm_pred), "MAPE_pct": mape(y_true, lstm_pred)},
])
recalc

## 5) Visual comparison: Actual vs Forecasts on test period

We display the saved figure if it exists; otherwise we generate the plot inline.


In [None]:
# Show saved plot if present
if os.path.exists(FORECAST_PLOT_PATH):
    from PIL import Image
    img = Image.open(FORECAST_PLOT_PATH)
    display(img)
else:
    print("Saved plot not found; generating inline plot...")
    df = merged_fc.sort_values("date")

    plt.figure(figsize=(12, 5))
    plt.plot(df["date"], df["y_true"], label="Actual (TSLA adj_close)", linewidth=2)
    plt.plot(df["date"], df["arima_pred"], label="ARIMA", linewidth=1.5)
    plt.plot(df["date"], df["lstm_pred"], label="LSTM (multivariate)", linewidth=1.5)
    plt.title("TSLA Forecasts on Test Period")
    plt.xlabel("Date")
    plt.ylabel("Price")
    plt.grid(True, alpha=0.3)
    plt.legend()
    plt.tight_layout()
    plt.show()

## 6) Extra section — deliverable proof (head/tail) + error distribution diagnostics

This section strengthens the report by:
- Showing **first/last rows** of forecast artifacts (alignment proof)
- Providing error distribution statistics (bias, dispersion, quantiles)
- Showing how often one model beats the other per-day (absolute error)


### 6.1 Forecast artifact previews (head / tail)


In [None]:
def preview_df(df, name, n=5):
    print(f"\n{name} — head({n})")
    display(df.head(n))
    print(f"\n{name} — tail({n})")
    display(df.tail(n))

preview_df(arima_fc, "ARIMA forecast CSV")
preview_df(lstm_fc, "LSTM forecast CSV")
preview_df(merged_fc, "Merged forecasts CSV")

### 6.2 Additional error diagnostics (per model)

This block:

- loads the saved diagnostics (error_diagnostics.csv) as the report source-of-truth
- recomputes win-rate from merged_fc (optional) and checks if it matches the artifact
- removes the confusing “two win-rate tables” situation


In [None]:
## 6.2 Additional error diagnostics (per model) + win-rate (from artifact + verified)

# 1) Load script-generated diagnostics (source-of-truth for reporting)
diag_script = pd.read_csv(ERROR_DIAGNOSTICS_PATH)

# Keep only the real model rows for display (exclude the "TIES" summary row)
diag_models = diag_script[diag_script["model"].isin(["ARIMA", "LSTM_multivariate"])].copy()

display(diag_models)

# 2) Also compute win-rate from merged_fc to verify it matches the artifact
df = merged_fc.sort_values("date").copy()
comp = df[["date", "y_true", "arima_pred", "lstm_pred"]].dropna().copy()

comp["abs_err_arima"] = (comp["arima_pred"] - comp["y_true"]).abs()
comp["abs_err_lstm"]  = (comp["lstm_pred"]  - comp["y_true"]).abs()

n = len(comp)
lstm_win_rate_pct = (comp["abs_err_lstm"] < comp["abs_err_arima"]).mean() * 100
arima_win_rate_pct = (comp["abs_err_arima"] < comp["abs_err_lstm"]).mean() * 100
tie_rate_pct = (comp["abs_err_arima"] == comp["abs_err_lstm"]).mean() * 100

win_check = pd.DataFrame(
    {
        "win_rate_vs_other_pct_recalc": [arima_win_rate_pct, lstm_win_rate_pct, tie_rate_pct],
        "n_compared": [n, n, n],
    },
    index=["ARIMA", "LSTM_multivariate", "TIES"]
)

display(win_check)

# 3) Consistency check vs artifact
# (Artifact includes a TIES row, so we can compare all three.)
merged_check = diag_script[["model", "win_rate_vs_other_pct"]].merge(
    win_check.reset_index().rename(columns={"index": "model"}),
    on="model",
    how="inner"
)

merged_check["abs_diff_pct_points"] = (
    merged_check["win_rate_vs_other_pct"] - merged_check["win_rate_vs_other_pct_recalc"]
).abs()

display(merged_check)

max_diff = merged_check["abs_diff_pct_points"].max()
print(f"Max absolute difference (pct points) between artifact and notebook recalc: {max_diff:.6f}")

### 6.4 Error over time (optional diagnostic plot)
This plot helps identify whether one model drifts or fails during volatility regimes.


In [None]:
# 6.4 Error over time (optional diagnostic plot)
plot_df = merged_fc.sort_values("date").copy()

# Compute error columns if they aren't already present
if "err_arima" not in plot_df.columns or "err_lstm" not in plot_df.columns:
    y_col = "y_true" if "y_true" in plot_df.columns else ("actual" if "actual" in plot_df.columns else None)
    arima_col = "arima_pred" if "arima_pred" in plot_df.columns else ("arima" if "arima" in plot_df.columns else None)
    lstm_col = "lstm_pred" if "lstm_pred" in plot_df.columns else ("lstm" if "lstm" in plot_df.columns else None)
    if not (y_col and arima_col and lstm_col):
        raise KeyError(
            "Expected columns for error plotting not found. "
            f"Need y_true/actual, arima_pred/arima, lstm_pred/lstm. Got: {list(plot_df.columns)}"
        )
    plot_df["err_arima"] = plot_df[arima_col] - plot_df[y_col]
    plot_df["err_lstm"] = plot_df[lstm_col] - plot_df[y_col]

plt.figure(figsize=(12, 4))
plt.plot(plot_df["date"], plot_df["err_arima"], label="ARIMA error (pred-actual)", linewidth=1)
plt.plot(plot_df["date"], plot_df["err_lstm"], label="LSTM error (pred-actual)", linewidth=1)
plt.axhline(0, color="black", linewidth=1, alpha=0.7)
plt.title("Forecast Errors Over Test Period")
plt.xlabel("Date")
plt.ylabel("Error")
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

## 7) Discussion (brief, metrics-grounded)

### Model behavior considerations
- **ARIMA/SARIMA** (univariate) models linear autocorrelation structure in `adj_close`. It tends to perform well when the series is relatively smooth and the next-step value is strongly related to recent values.
- **Multivariate LSTM** can exploit nonlinear relationships and additional predictors (OHLCV and engineered indicators), but it is sensitive to:
  - feature scaling,
  - lookback choice,
  - limited data regime / nonstationarity,
  - regime shifts and volatility spikes.

### Which model is better here?
We choose the model with **lower RMSE** (primary), then MAE and MAPE as supporting metrics.

Use the *Error diagnostics* section above to comment on:
- bias (mean error): systematic over/under prediction
- dispersion (std): stability of errors
- quantiles: whether extreme errors are worse for one model

### Notes on validity / leakage
- Chronological split is enforced by cutoff date.
- LSTM scalers are fit on **train only** (per scripts) to avoid leakage.


In [None]:
# Auto-generate a short conclusion snippet based on RMSE
cmp = comparison.copy()
cmp = cmp.sort_values("RMSE")
best = cmp.iloc[0].to_dict()
runner_up = cmp.iloc[1].to_dict() if len(cmp) > 1 else None

print("Best model by RMSE:")
print(best)

if runner_up:
    print("\nRunner-up:")
    print(runner_up)

print("\nConclusion draft:")
print(
    f"Based on the test-period evaluation, the best-performing model is {best['model']} "
    f"with RMSE={best['RMSE']:.4f}, MAE={best['MAE']:.4f}, MAPE={best['MAPE_pct']:.2f}%. "
    "This indicates it better captures the short-horizon dynamics of TSLA adj_close over the held-out period." 
)

## 8) Deliverables checklist (for submission)

Confirm these files exist in your repo after running scripts:

- **Splits & features**
  - `data/task2/splits/tsla_train.parquet`
  - `data/task2/splits/tsla_test.parquet`
  - `data/task2/features/tsla_features_train.parquet`
  - `data/task2/features/tsla_features_test.parquet`

- **Model artifacts**
  - `outputs/task2/models/lstm_model.keras`
  - (optional) `outputs/task2/models/arima_model.pkl`

- **Forecasts (aligned to test dates)**
  - `outputs/task2/forecasts/tsla_arima_forecast.csv`
  - `outputs/task2/forecasts/tsla_lstm_forecast.csv`
  - `outputs/task2/forecasts/tsla_forecasts_merged.csv`

- **Metrics & documentation**
  - `outputs/task2/metrics/model_comparison.csv`  ✅ required table
  - `outputs/task2/metrics/arima_params.json`      ✅ ARIMA/SARIMA parameters
  - `outputs/task2/metrics/lstm_architecture.json` ✅ LSTM architecture
  - `outputs/task2/metrics/split_info.json`        ✅ cutoff proof

- **Figures**
  - `outputs/task2/figures/forecast_test_period.png`
