<a href="https://colab.research.google.com/github/YthanW/STATS507-Fall2025/blob/main/arima_finetune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install --quiet --upgrade pip

# 先卸载可能冲突的版本
!pip uninstall -y torch torchvision torchaudio

# 重装一套互相兼容的官方版本（GPU，cu121）
!pip install --quiet "torch==2.4.0" "torchvision==0.19.0" "torchaudio==2.4.0" \
  --index-url https://download.pytorch.org/whl/cu121


Found existing installation: torch 2.7.1
Uninstalling torch-2.7.1:
  Successfully uninstalled torch-2.7.1
Found existing installation: torchvision 0.24.0+cu126
Uninstalling torchvision-0.24.0+cu126:
  Successfully uninstalled torchvision-0.24.0+cu126
Found existing installation: torchaudio 2.9.0+cu126
Uninstalling torchaudio-2.9.0+cu126:
  Successfully uninstalled torchaudio-2.9.0+cu126


In [2]:
!pip install --quiet "autogluon.timeseries[chronos]==0.9.5"


[31mERROR: Ignored the following versions that require a different python version: 0.5.0 Requires-Python >=3.7,<3.10; 0.5.0b20220623 Requires-Python >=3.7,<3.10; 0.5.0rc1 Requires-Python >=3.7,<3.10; 0.5.1 Requires-Python >=3.7,<3.10; 0.5.1b20220624 Requires-Python >=3.7,<3.10; 0.5.1b20220625 Requires-Python >=3.7,<3.10; 0.5.1b20220626 Requires-Python >=3.7,<3.10; 0.5.1b20220627 Requires-Python >=3.7,<3.10; 0.5.1b20220628 Requires-Python >=3.7,<3.10; 0.5.1b20220629 Requires-Python >=3.7,<3.10; 0.5.1b20220630 Requires-Python >=3.7,<3.10; 0.5.1b20220701 Requires-Python >=3.7,<3.10; 0.5.1b20220702 Requires-Python >=3.7,<3.10; 0.5.1b20220703 Requires-Python >=3.7,<3.10; 0.5.1b20220704 Requires-Python >=3.7,<3.10; 0.5.1b20220705 Requires-Python >=3.7,<3.10; 0.5.1b20220706 Requires-Python >=3.7,<3.10; 0.5.1b20220707 Requires-Python >=3.7,<3.10; 0.5.1b20220708 Requires-Python >=3.7,<3.10; 0.5.1b20220709 Requires-Python >=3.7,<3.10; 0.5.1b20220710 Requires-Python >=3.7,<3.10; 0.5.1b20220711 R

In [3]:
!pip install "chronos-forecasting>=1.3.0"



In [5]:
import autogluon.core as ag_core
import autogluon.timeseries as ag_ts
import chronos
import torch

print("AutoGluon Core version:", ag_core.__version__)
print("AutoGluon TimeSeries version:", ag_ts.__version__)
print("chronos-forecasting version:", chronos.__version__)
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())


AutoGluon Core version: 1.4.0
AutoGluon TimeSeries version: 1.4.0
chronos-forecasting version: 2.1.0
PyTorch version: 2.4.0+cu121
CUDA available: True


In [6]:
import warnings
warnings.filterwarnings("ignore")

import os
import pickle

import numpy as np
import pandas as pd
import yfinance as yf

from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error
from IPython.display import display
# AutoGluon
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor


In [7]:
# ========= 50 US large-cap tickers =========
TICKERS = [
    "AAPL", "MSFT", "NVDA", "GOOGL", "AMZN",
    "META", "AVGO", "TSLA", "BRK-B", "LLY",
    "JPM", "JNJ", "XOM", "UNH", "V",
    "WMT", "PG", "MA", "HD", "COST",
    "PEP", "ABBV", "MRK", "CRM", "ORCL",
    "ADBE", "CSCO", "TMO", "ACN", "MCD",
    "DHR", "ABT", "KO", "AMD", "BAC",
    "PFE", "NFLX", "INTU", "LIN", "TXN",
    "CAT", "LRCX", "IBM", "QCOM", "CVS",
    "NKE", "UPS", "HON", "LOW", "AMAT",
]


def flatten_columns(df: pd.DataFrame) -> pd.DataFrame:
    """
    If columns are a MultiIndex (e.g. ('Adj Close', 'AAPL')), keep only level-0.
    """
    if isinstance(df.columns, pd.MultiIndex):
        df = df.copy()
        df.columns = [c[0] for c in df.columns]
    return df


def pick_price_column(df: pd.DataFrame) -> pd.DataFrame:
    """
    Pick a reasonable price column from a yfinance DataFrame.
    Preference order: 'Adj Close' -> 'Close' -> 'Open'.
    Returns a single-column DataFrame named 'price'.
    """
    for col in ["Adj Close", "Close", "Open"]:
        if col in df.columns:
            return df[[col]].rename(columns={col: "price"})
    raise KeyError(f"No usable price column found in columns: {df.columns.tolist()}")


def download_multi_stock_data(
    tickers,
    start: str = "2015-01-01",
    end: str | None = None,
) -> pd.DataFrame:
    """
    Download daily prices for multiple tickers using yfinance and compute log returns.

    Returns long-format DataFrame:
        ['date', 'ticker', 'price', 'log_return'].
    """
    all_list = []
    failed = []

    for t in tickers:
        print(f"Downloading {t} ...")
        df = yf.download(t, start=start, end=end, progress=False)
        if df.empty:
            print(f"[WARN] Empty data for {t}, skipped.")
            failed.append(t)
            continue

        df = flatten_columns(df)

        try:
            px = pick_price_column(df)
        except Exception as e:
            print(f"[WARN] {t}: {e} -> skipped.")
            failed.append(t)
            continue

        # Compute log returns of the selected price
        px["log_return"] = np.log(px["price"]).diff()
        px = px.dropna(subset=["log_return"])

        # Attach ticker and date columns
        px["ticker"] = t
        px = px.reset_index().rename(columns={"Date": "date"})
        all_list.append(px)

    if not all_list:
        raise ValueError("No valid tickers downloaded.")

    full = pd.concat(all_list, ignore_index=True)
    full = full.sort_values(["ticker", "date"]).reset_index(drop=True)

    print("Failed tickers:", failed)
    return full

In [8]:
# ---- dataset download ----
data_long = download_multi_stock_data(TICKERS, start="2015-01-01")
print("========== Data Overview ==========")
print("Data shape:", data_long.shape)
print(data_long.head())

Downloading AAPL ...
Downloading MSFT ...
Downloading NVDA ...
Downloading GOOGL ...
Downloading AMZN ...
Downloading META ...
Downloading AVGO ...
Downloading TSLA ...
Downloading BRK-B ...
Downloading LLY ...
Downloading JPM ...
Downloading JNJ ...
Downloading XOM ...
Downloading UNH ...
Downloading V ...
Downloading WMT ...
Downloading PG ...
Downloading MA ...
Downloading HD ...
Downloading COST ...
Downloading PEP ...
Downloading ABBV ...
Downloading MRK ...
Downloading CRM ...
Downloading ORCL ...
Downloading ADBE ...
Downloading CSCO ...
Downloading TMO ...
Downloading ACN ...
Downloading MCD ...
Downloading DHR ...
Downloading ABT ...
Downloading KO ...
Downloading AMD ...
Downloading BAC ...
Downloading PFE ...
Downloading NFLX ...
Downloading INTU ...
Downloading LIN ...
Downloading TXN ...
Downloading CAT ...
Downloading LRCX ...
Downloading IBM ...
Downloading QCOM ...
Downloading CVS ...
Downloading NKE ...
Downloading UPS ...
Downloading HON ...
Downloading LOW ...
Downlo

In [9]:

# ======================
# Train / Test
# ======================
def time_based_train_test_split(df_long: pd.DataFrame, test_frac: float = 0.2):
    """
    Split the long-format dataset into train/test sets by DATE.
    All tickers share the same cutoff_date.
    """
    unique_dates = np.sort(df_long["date"].unique())
    n_dates = len(unique_dates)

    cutoff_idx = int((1 - test_frac) * n_dates)
    cutoff_date = unique_dates[cutoff_idx]

    train = df_long[df_long["date"] <= cutoff_date].copy()
    test = df_long[df_long["date"] > cutoff_date].copy()

    return train, test, cutoff_date


train_df, test_df, cutoff_date = time_based_train_test_split(
    data_long, test_frac=0.2
)

print("\n========== Train/Test Split ==========")
print("Cutoff date:", cutoff_date)
print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)
print("Train date range:", train_df["date"].min(), "→", train_df["date"].max())
print("Test date range:", test_df["date"].min(), "→", test_df["date"].max())

print("\nTrain samples per ticker:")
print(train_df.groupby("ticker")["date"].count())

print("\nTest samples per ticker:")
print(test_df.groupby("ticker")["date"].count())


Cutoff date: 2023-09-28T00:00:00.000000000
Train shape: (109950, 4)
Test shape: (27450, 4)
Train date range: 2015-01-05 00:00:00 → 2023-09-28 00:00:00
Test date range: 2023-09-29 00:00:00 → 2025-12-05 00:00:00

Train samples per ticker:
ticker
AAPL     2199
ABBV     2199
ABT      2199
ACN      2199
ADBE     2199
AMAT     2199
AMD      2199
AMZN     2199
AVGO     2199
BAC      2199
BRK-B    2199
CAT      2199
COST     2199
CRM      2199
CSCO     2199
CVS      2199
DHR      2199
GOOGL    2199
HD       2199
HON      2199
IBM      2199
INTU     2199
JNJ      2199
JPM      2199
KO       2199
LIN      2199
LLY      2199
LOW      2199
LRCX     2199
MA       2199
MCD      2199
META     2199
MRK      2199
MSFT     2199
NFLX     2199
NKE      2199
NVDA     2199
ORCL     2199
PEP      2199
PFE      2199
PG       2199
QCOM     2199
TMO      2199
TSLA     2199
TXN      2199
UNH      2199
UPS      2199
V        2199
WMT      2199
XOM      2199
Name: date, dtype: int64

Test samples per ticker:
tick

In [None]:
# ======================
# Switch & save directory
# ======================
TRAIN_ARIMA_5   = True
TRAIN_CHRONOS_5 = True
TRAIN_CHRONOS_60 = True
BASE_DIR = "./saved_models"
os.makedirs(BASE_DIR, exist_ok=True)
arima_5_path = os.path.join(BASE_DIR, "arima_5day.pkl")

In [10]:
# ======================
# evaluation func（MAE / RMSE）
# ======================
def evaluate_forecast(y_true: pd.Series, y_pred: pd.Series) -> dict:
    """
    Compute MAE and RMSE between true and predicted values.
    Align indices and drop any NaNs before evaluation.
    """

    y_true, y_pred = y_true.align(y_pred, join="inner")
    mask = ~(y_true.isna() | y_pred.isna())
    y_true = y_true[mask]
    y_pred = y_pred[mask]

    if len(y_true) == 0:
        return {"MAE": np.nan, "RMSE": np.nan}

    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))

    return {"MAE": mae, "RMSE": rmse}


# ======================
# ARIMA CONFIG
# ======================
H_SHORT = 5               # short-horizon multi-step forecast length
H_LONG  = 60              # long-horizon multi-step forecast length
ARIMA_ORDER = (2, 0, 0)   # ARIMA(p,d,q)




def run_arima_baseline(
    train_df: pd.DataFrame,
    test_df: pd.DataFrame,
    arima_order=(2, 0, 0),
    h_short=5,
    h_long=60,
) -> pd.DataFrame:
    """
    Fit ARIMA per ticker and compute 5-day and 60-day multi-step forecast errors.
    Returns a DataFrame with per-ticker metrics.
    """
    per_ticker_results = []
    tickers = sorted(train_df["ticker"].unique())

    print(f"Total tickers: {len(tickers)}")

    for ticker in tickers:
        # Extract per-ticker training & test series
        train_series = (
            train_df[train_df["ticker"] == ticker]
            .sort_values("date")["log_return"]
            .dropna()
        )
        test_series = (
            test_df[test_df["ticker"] == ticker]
            .sort_values("date")["log_return"]
            .dropna()
        )

        # Ensure test set is long enough for long-horizon evaluation
        if len(test_series) < h_long:
            print(f"[WARN] {ticker}: test length {len(test_series)} < {h_long}, skipped.")
            continue

        # Avoid unstable ARIMA on extremely short training sets
        if len(train_series) < 10:
            print(f"[WARN] {ticker}: train length {len(train_series)} too short, skipped.")
            continue

        print(
            f"Fitting ARIMA{arima_order} for {ticker} "
            f"(train={len(train_series)}, test={len(test_series)}) ..."
        )

        # Fit ARIMA model with error handling
        try:
            model = ARIMA(train_series, order=arima_order)
            model_fit = model.fit()
        except Exception as e:
            print(f"[ERROR] ARIMA failed for {ticker}: {e}")
            continue

        # ======= 5-step forecast =======
        fc_short = model_fit.forecast(steps=h_short)
        true_short = test_series.iloc[:h_short]
        pred_short = pd.Series(fc_short, index=true_short.index)
        metrics_short = evaluate_forecast(true_short, pred_short)

        # ======= 60-step forecast =======
        fc_long = model_fit.forecast(steps=h_long)
        true_long = test_series.iloc[:h_long]
        pred_long = pd.Series(fc_long, index=true_long.index)
        metrics_long = evaluate_forecast(true_long, pred_long)

        # Save per-ticker metrics
        per_ticker_results.append({
            "ticker": ticker,
            "n_train": len(train_series),
            "n_test": len(test_series),
            "MAE_5d": metrics_short["MAE"],
            "RMSE_5d": metrics_short["RMSE"],
            "MAE_60d": metrics_long["MAE"],
            "RMSE_60d": metrics_long["RMSE"],
        })

    arima_results_df = pd.DataFrame(per_ticker_results)

    # Remove rows where metrics could not be computed
    arima_results_df = arima_results_df.dropna(
        subset=["MAE_5d", "RMSE_5d", "MAE_60d", "RMSE_60d"],
        how="any"
    )

    print("\n===== ARIMA Per-Ticker Results =====")
    display(arima_results_df)

    print("\n===== ARIMA Overall Mean Metrics =====")
    display(arima_results_df[["MAE_5d", "RMSE_5d", "MAE_60d", "RMSE_60d"]].mean())

    return arima_results_df


In [11]:
# ============================================
# Train-or-load switch for ARIMA 5-day baseline
# ============================================
if TRAIN_ARIMA_5:
    print("[ARIMA-5] Training and saving results...")

    # run baseline
    arima_results_df = run_arima_baseline(
        train_df=train_df,
        test_df=test_df,
        arima_order=ARIMA_ORDER,
        h_short=H_SHORT,
        h_long=H_LONG,
    )

    # aggregate metrics for 5-day horizon
    mae_arima_5 = arima_results_df["MAE_5d"].mean()
    rmse_arima_5 = arima_results_df["RMSE_5d"].mean()

    # save to disk
    with open(arima_5_path, "wb") as f:
        pickle.dump(
            {
                "results_df": arima_results_df,
                "mae_5d": mae_arima_5,
                "rmse_5d": rmse_arima_5,
            },
            f,
        )

    print(f"[ARIMA-5] Saved results to: {arima_5_path}")
    print(f"[ARIMA-5] MAE_5d={mae_arima_5:.6f}, RMSE_5d={rmse_arima_5:.6f}")

else:
    print("[ARIMA-5] Loading results from disk...")

    with open(arima_5_path, "rb") as f:
        arima_5_data = pickle.load(f)

    arima_results_df = arima_5_data["results_df"]
    mae_arima_5 = arima_5_data["mae_5d"]
    rmse_arima_5 = arima_5_data["rmse_5d"]

    print(f"[ARIMA-5] Loaded from: {arima_5_path}")
    print(f"[ARIMA-5] MAE_5d={mae_arima_5:.6f}, RMSE_5d={rmse_arima_5:.6f}")


[ARIMA-5] Training and saving results...
Total tickers: 50
Fitting ARIMA(2, 0, 0) for AAPL (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for ABBV (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for ABT (train=2199, test=549) ...


  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for ACN (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for ADBE (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for AMAT (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for AMD (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for AMZN (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for AVGO (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for BAC (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for BRK-B (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for CAT (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for COST (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for CRM (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for CSCO (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for CVS (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for DHR (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for GOOGL (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for HD (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for HON (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for IBM (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for INTU (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for JNJ (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for JPM (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for KO (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for LIN (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for LLY (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for LOW (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for LRCX (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for MA (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for MCD (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for META (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for MRK (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for MSFT (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for NFLX (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for NKE (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for NVDA (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(


Fitting ARIMA(2, 0, 0) for ORCL (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for PEP (train=2199, test=549) ...


  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for PFE (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for PG (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for QCOM (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for TMO (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for TSLA (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for TXN (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(


Fitting ARIMA(2, 0, 0) for UNH (train=2199, test=549) ...
Fitting ARIMA(2, 0, 0) for UPS (train=2199, test=549) ...


  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for V (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for WMT (train=2199, test=549) ...


  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Fitting ARIMA(2, 0, 0) for XOM (train=2199, test=549) ...

===== ARIMA Per-Ticker Results =====


  return get_prediction_index(
  return get_prediction_index(


Unnamed: 0,ticker,n_train,n_test,MAE_5d,RMSE_5d,MAE_60d,RMSE_60d
0,AAPL,2199,549,0.007464,0.008394,0.008171,0.009978



===== ARIMA Overall Mean Metrics =====


Unnamed: 0,0
MAE_5d,0.007464
RMSE_5d,0.008394
MAE_60d,0.008171
RMSE_60d,0.009978


[ARIMA-5] Saved results to: ./saved_models/arima_5day.pkl
[ARIMA-5] MAE_5d=0.007464, RMSE_5d=0.008394


In [12]:
# ===============================
# Convert data_long -> TimeSeriesDataFrame (AutoGluon)
# ===============================

# Keep only necessary columns and rename to AutoGluon's convention
df_ag = (
    data_long[["ticker", "date", "log_return"]]
    .rename(columns={
        "ticker": "item_id",      # series id (stock ticker)
        "date": "timestamp",      # time index
        "log_return": "target",   # value to forecast
    })
    .copy()
)

# Ensure timestamp column is datetime and sort by (item_id, timestamp)
df_ag["timestamp"] = pd.to_datetime(df_ag["timestamp"])
df_ag = df_ag.sort_values(["item_id", "timestamp"]).reset_index(drop=True)

# Build AutoGluon TimeSeriesDataFrame
tsdf = TimeSeriesDataFrame.from_data_frame(
    df_ag,
    id_column="item_id",
    timestamp_column="timestamp",
)

print("Raw TimeSeriesDataFrame head:")
print(tsdf.head())

# Optional but recommended: enforce a regular business-day frequency and fill missing values
tsdf = tsdf.convert_frequency(freq="B")   # 'B' = business day
tsdf = tsdf.fill_missing_values()

print("\nTimeSeriesDataFrame summary:")
print("Num series:", len(tsdf.item_ids))
lengths = tsdf.num_timesteps_per_item()
print("Length range per series:", lengths.min(), "→", lengths.max())


Raw TimeSeriesDataFrame head:
                      target
item_id timestamp           
AAPL    2015-01-05 -0.028576
        2015-01-06  0.000094
        2015-01-07  0.013925
        2015-01-08  0.037703
        2015-01-09  0.001072

TimeSeriesDataFrame summary:
Num series: 50
Length range per series: 2850 → 2850


In [13]:
def make_train_test(tsdf: TimeSeriesDataFrame, prediction_length: int):
    """
    Split each time series into train and test.
    The last `prediction_length` timesteps of each item go to the test set.
    """
    prediction_length = int(prediction_length)
    train_data, test_data = tsdf.train_test_split(prediction_length=prediction_length)
    return train_data, test_data


In [14]:
def fit_chronos_bolt_small(
    train_data: TimeSeriesDataFrame,
    prediction_length: int,
    time_limit: int = 600,
    eval_metric: str = "MASE",
):
    """
    Train Chronos-Bolt (Small) in two modes:
      (1) Zero-shot
      (2) Fine-tuned

    AutoGluon will select the best-performing model.
    If Chronos fails entirely, we fall back to default AutoGluon models.
    """
    prediction_length = int(prediction_length)

    # Two Chronos variants: ZeroShot + FineTuned
    hyperparams = {
        "Chronos": [
            {
                "model_path": "bolt_small",
                "ag_args": {"name_suffix": "ZeroShot"},
            },
            {
                "model_path": "bolt_small",
                "fine_tune": True,
                "ag_args": {"name_suffix": "FineTuned"},
            },
        ]
    }

    predictor = TimeSeriesPredictor(
        prediction_length=prediction_length,
        target="target",
        eval_metric=eval_metric,
        verbosity=2,
    )

    chronos_ok = True
    try:
        predictor.fit(
            train_data=train_data,
            hyperparameters=hyperparams,
            enable_ensemble=False,
            time_limit=time_limit,
        )
    except Exception as e:
        print("Chronos-Bolt (ZeroShot + FineTuned) failed:")
        print(e)
        chronos_ok = False

    # Check if any Chronos models actually trained
    model_names = []
    if chronos_ok:
        try:
            if hasattr(predictor, "_trainer"):
                model_names = predictor._trainer.get_model_names()
        except Exception as e:
            print("Warning when checking model names:", e)

    # If Chronos failed → fallback
    if (not chronos_ok) or (not model_names):
        print("\n[Fallback] Using default AutoGluon models.\n")
        predictor = TimeSeriesPredictor(
            prediction_length=prediction_length,
            target="target",
            eval_metric=eval_metric,
            verbosity=2,
        )
        predictor.fit(train_data=train_data, time_limit=time_limit)

    # Show leaderboard
    try:
        print(predictor.leaderboard(silent=True))
    except:
        print("Could not print leaderboard.")

    return predictor


In [15]:
def evaluate_point_forecast(
    predictor: TimeSeriesPredictor,
    train_data: TimeSeriesDataFrame,
    test_data: TimeSeriesDataFrame,
):
    """
    Generate multi-step point forecasts (AutoGluon uses 'mean'),
    align with ground truth, and compute MAE & RMSE.
    """
    preds = predictor.predict(train_data)

    df_pred = preds.to_data_frame()
    df_true = test_data.to_data_frame()

    y_pred = df_pred["mean"]
    y_true = df_true["target"]

    # Align before computing metrics
    y_true, y_pred = y_true.align(y_pred, join="inner")

    mae = mean_absolute_error(y_true, y_pred)

    mse = mean_squared_error(y_true, y_pred)
    rmse = mse ** 0.5

    return mae, rmse


In [18]:
from autogluon.timeseries import TimeSeriesPredictor

# =======================
# 5-DAY (SHORT HORIZON) - CHRONOS
# =======================

PRED_SHORT = 5

# 1) Train-test split
train_5, test_5 = make_train_test(tsdf, prediction_length=PRED_SHORT)
print("5-day train shape:", train_5.shape, "test shape:", test_5.shape)

chronos_5_path = os.path.join(BASE_DIR, "chronos_5day")

if TRAIN_CHRONOS_5:
    # ----- Train and save -----
    print("\n===== Training Chronos-Bolt Small (5-day) =====")

    predictor_5 = TimeSeriesPredictor(
        path=chronos_5_path,
        prediction_length=PRED_SHORT,
        eval_metric="MASE",
    )

    predictor_5.fit(
        train_data=train_5,
        time_limit=600,
        presets="fast_training",
    )

    try:
        best_model_5 = predictor_5.get_model_best()
        print("\nBest model selected (5-day):", best_model_5)
    except Exception as e:
        print("\nWarning: could not get best model name (5-day):", e)

    print("\nSaving 5-day Chronos predictor...")
    predictor_5.save()
    print("Saved to:", chronos_5_path)

else:
    # ----- Load from disk -----
    print("\n===== Loading Chronos-Bolt Small (5-day) from disk =====")
    predictor_5 = TimeSeriesPredictor.load(chronos_5_path)

# 2) Evaluation
mae_5, rmse_5 = evaluate_point_forecast(predictor_5, train_5, test_5)
print("\n===== 5-Day Forecast Performance (Chronos) =====")
print(f"MAE :  {mae_5:.6f}")
print(f"RMSE:  {rmse_5:.6f}")


Beginning AutoGluon training... Time limit = 600s
AutoGluon will save models to '/content/saved_models/chronos_5day'
AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          12
GPU Count:          1
Memory Avail:       163.11 GB / 167.05 GB (97.6%)
Disk Space Avail:   186.99 GB / 235.68 GB (79.3%)
Setting presets to: fast_training

Fitting with arguments:
{'enable_ensemble': True,
 'eval_metric': MASE,
 'hyperparameters': 'very_light',
 'known_covariates_names': [],
 'num_val_windows': 1,
 'prediction_length': 5,
 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
 'random_seed': 123,
 'refit_every_n_windows': 1,
 'refit_full': False,
 'skip_model_selection': False,
 'target': 'target',
 'time_limit': 600,
 'verbosity': 2}

Inferred time series frequency: 'B'
Provided train_data has 142250 rows, 50 time series. Median time series length is 2845 (m

5-day train shape: (142250, 1) test shape: (142500, 1)

===== Training Chronos-Bolt Small (5-day) =====


Models that will be trained: ['Naive', 'SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'ETS', 'Theta']
Training timeseries model Naive. Training for up to 84.8s of the 593.5s of remaining time.
	-1.2784       = Validation score (-MASE)
	0.08    s     = Training runtime
	2.63    s     = Validation (prediction) runtime
Training timeseries model SeasonalNaive. Training for up to 98.5s of the 590.8s of remaining time.
	-1.1353       = Validation score (-MASE)
	0.07    s     = Training runtime
	0.12    s     = Validation (prediction) runtime
Training timeseries model RecursiveTabular. Training for up to 118.1s of the 590.6s of remaining time.
	-0.8370       = Validation score (-MASE)
	12.95   s     = Training runtime
	0.13    s     = Validation (prediction) runtime
Training timeseries model DirectTabular. Training for up to 144.4s of the 577.5s of remaining time.
	-0.7107       = Validation score (-MASE)
	1.79    s     = Training runtime
	0.25    s     = Validation (prediction) runtim



Saving 5-day Chronos predictor...
Saved to: ./saved_models/chronos_5day

===== 5-Day Forecast Performance (Chronos) =====
MAE :  0.011688
RMSE:  0.015377


In [20]:
import sys
print(sys.version)


3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]


In [21]:
# =======================
# 60-DAY (LONG HORIZON) - CHRONOS
# =======================

PRED_LONG = 60

# 1) Train-test split
train_60, test_60 = make_train_test(tsdf, prediction_length=PRED_LONG)
print("60-day train shape:", train_60.shape, "test shape:", test_60.shape)

chronos_60_path = os.path.join(BASE_DIR, "chronos_60day")

if TRAIN_CHRONOS_60:
    print("\n===== Training Chronos-Bolt Small (60-day) =====")

    predictor_60 = TimeSeriesPredictor(
        path=chronos_60_path,
        prediction_length=PRED_LONG,
        eval_metric="MASE",
    )

    predictor_60.fit(
        train_data=train_60,
        time_limit=1200,
        presets="fast_training",
    )

    try:
        best_model_60 = predictor_60.get_model_best()
        print("\nBest model selected (60-day):", best_model_60)
    except Exception as e:
        print("\nWarning: could not get best model name (60-day):", e)

    print("\nSaving 60-day Chronos predictor...")
    predictor_60.save()
    print("Saved to:", chronos_60_path)

else:
    print("\n===== Loading Chronos-Bolt Small (60-day) from disk =====")
    predictor_60 = TimeSeriesPredictor.load(chronos_60_path)

# 2) Evaluation
mae_60, rmse_60 = evaluate_point_forecast(predictor_60, train_60, test_60)
print("\n===== 60-Day Forecast Performance (Chronos) =====")
print(f"MAE :  {mae_60:.6f}")
print(f"RMSE:  {rmse_60:.6f}")


Beginning AutoGluon training... Time limit = 1200s
AutoGluon will save models to '/content/saved_models/chronos_60day'
AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          12
GPU Count:          1
Memory Avail:       162.28 GB / 167.05 GB (97.1%)
Disk Space Avail:   186.92 GB / 235.68 GB (79.3%)
Setting presets to: fast_training

Fitting with arguments:
{'enable_ensemble': True,
 'eval_metric': MASE,
 'hyperparameters': 'very_light',
 'known_covariates_names': [],
 'num_val_windows': 1,
 'prediction_length': 60,
 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
 'random_seed': 123,
 'refit_every_n_windows': 1,
 'refit_full': False,
 'skip_model_selection': False,
 'target': 'target',
 'time_limit': 1200,
 'verbosity': 2}

Inferred time series frequency: 'B'
Provided train_data has 139500 rows, 50 time series. Median time series length is 279

60-day train shape: (139500, 1) test shape: (142500, 1)

===== Training Chronos-Bolt Small (60-day) =====


	-0.9471       = Validation score (-MASE)
	0.07    s     = Training runtime
	2.13    s     = Validation (prediction) runtime
Training timeseries model SeasonalNaive. Training for up to 199.6s of the 1197.7s of remaining time.
	-0.9690       = Validation score (-MASE)
	0.07    s     = Training runtime
	0.12    s     = Validation (prediction) runtime
Training timeseries model RecursiveTabular. Training for up to 239.5s of the 1197.5s of remaining time.
	-0.7765       = Validation score (-MASE)
	31.61   s     = Training runtime
	0.78    s     = Validation (prediction) runtime
Training timeseries model DirectTabular. Training for up to 291.2s of the 1164.7s of remaining time.
	-0.7032       = Validation score (-MASE)
	32.11   s     = Training runtime
	0.71    s     = Validation (prediction) runtime
Training timeseries model ETS. Training for up to 377.1s of the 1131.4s of remaining time.
	-0.6699       = Validation score (-MASE)
	0.07    s     = Training runtime
	4.78    s     = Validation



Saving 60-day Chronos predictor...
Saved to: ./saved_models/chronos_60day

===== 60-Day Forecast Performance (Chronos) =====
MAE :  0.013182
RMSE:  0.019122
