Skip to content

ch3702/ATOMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATOMS: Adaptive Tournament Model Selection

This repository implements ATOMS (Adaptive Tournament Model Selection) and benchmark algorithms, in the following paper:

Capponi, A., Huang, C., Sidaoui, J. A., Wang, K., and Zou, J. (2025). The Nonstationarity-Complexity Tradeoff in Return Prediction. Available at SSRN: https://ssrn.com/abstract=5980654

Algorithms

  • ATOMS: adaptive model selection via (i) adaptive validation-window selection and (ii) a tournament procedure.
  • Fixed-window baselines:
    • Fixed-val($\ell$): select the model with the lowest average validation loss over the last $\ell$ periods.
    • Fixed-CV: select a model using cross-validation on a fixed historical window.

This implementation focuses on model selection (ATOMS and baselines). It does not include the full large-scale training pipeline in the paper.


Quickstart

1) Create an environment and install

python -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows PowerShell

python -m pip install -U pip
python -m pip install -e .

2) Run the demo

Notebook walkthrough: example/demo.ipynb.

The demo generates a synthetic nonstationary dataset, trains a small set of candidate models,
runs ATOMS and baselines, and reports performance summaries.


Core usage

Inputs

ATOMS operates on per-observation validation losses organized by time period.

The inputs include:

  • val_losses: list of length n_models.

    • Each entry is an array of validation losses concatenated by period in chronological order.
    • Single response: shape (n_obs,)
    • Multiple responses (e.g., 17 industry portfolios): shape (n_obs, n_responses) (selection is done separately for each response)
  • val_sizes: list/array (n_0, n_1, ..., n_{T-1}), where n_t is the number of validation observations
    in period t (same concatenation order as val_losses).

ATOMS

from atoms import ATOMS

atoms = ATOMS(delta=0.1, M=1.0, seed=0)
best_idx = atoms.select(val_losses, val_sizes)  # int (single response) or (n_responses,) array

Here, best_idx is a 0-based index into the candidate model list, corresponding to the selected model. In multi-response settings, it outputs one selected index per response.


Fixed-window baselines

Fixed-val($\ell$)

from atoms import fixed_val_select

best_idx = fixed_val_select(val_losses, val_sizes, L=10)

Fixed-CV

from atoms import fixed_cv_select

best_idx = fixed_cv_select(
    specs,
    X_by_period,
    y_by_period,
    t=t,
    cv_window_periods=36,
    n_splits=5,
)

Here, specs is a list of CandidateSpec objects (see below), t is the testing period, and X_by_period and y_by_period store the data in period form: X_by_period[s] is the feature matrix with shape $(n_s,d)$, and y_by_period[s] is the corresponding response array with shape $(n_s,)$ for a single response, or $(n_s,R)$ for $R$ responses.


Candidate specifications

A candidate “model” is represented by a CandidateSpec:

from atoms import CandidateSpec
from sklearn.linear_model import Ridge

spec = CandidateSpec(
    name="Ridge (10 periods)",
    estimator_factory=lambda: Ridge(alpha=1.0),
    train_window=10,   # number of periods used for training
)

Computing out-of-sample $R^2$

The paper reports two $R^2$ metrics:

  • $R^2$ with zero benchmark:

$$1 - \frac{\sum_{i=1}^n (\hat{y}_i - y_i)^2}{\sum_{i=1}^n y_i^2}$$

  • $R^2$ with demeaned denominator:

$$1 - \frac{\sum_{i=1}^n (\hat{y}_i - y_i)^2}{\sum_{i=1}^n (y_i-\overline{y})^2}$$

where $\overline{y}$ is the mean of $y_1,...,y_n$.

They can be computed via

from atoms import oos_r2, oos_r2_over_periods

r2_zero = oos_r2(y_true, y_pred, demean=False)
r2      = oos_r2(y_true, y_pred, demean=True)

Regime-switching model

The repository also includes the monthly refit Markov-switching forecast used for the new algorithm. It can run either on a dated dataframe or directly on the period-based synthetic data structure used in example/demo.ipynb:

from atoms import run_regime_switch_on_periods

pred_df = run_regime_switch_on_periods(
    X_by_period,
    y_by_period,
    start_month="2000-01",
    min_train_months=12,
    k_regimes=2,
)

This returns a dataframe with y_true, y_pred, forecast_month, and regime probability columns for the out-of-sample months.

$R^2$ over user-specified subperiods

You can compute $R^2$ over named windows specified in period indices (0-based, inclusive):

period_sizes = [n_0, n_1, ..., n_{T-1}]   # sample size per period in the concatenation
windows = {
    "Full": (0, T - 1),
    "Late sample": (20, T - 1),
}

r2_by_window = oos_r2_over_periods(
    y_true,
    y_pred,
    period_sizes,
    windows,
    demean=False,
)

Project structure

  • src/atoms/selection.py — ATOMS: adaptive window selection + tournament
  • src/atoms/baselines.py — Fixed-val and Fixed-CV baselines
  • src/atoms/metrics.py — OOS R^2 metrics
  • src/atoms/specs.pyCandidateSpec definition
  • src/atoms/synthetic.py — Synthetic nonstationary data generators
  • examples/demo_synthetic.ipynb — step-by-step demo notebook

Citation

@article{CHS25,
  title={The Nonstationarity-Complexity Tradeoff in Return Prediction},
  author={Capponi, Agostino and Huang, Chengpiao and Sidaoui, J.~Antonio and Wang, Kaizheng and Zou, Jiacheng},
  journal={Available at SSRN 5980654},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors