Skip to content

Forgis-Labs/HEPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HEPA: Horizon-conditioned Event Predictive Architecture

Self-supervised event prediction in multivariate time series via causal JEPA.

HEPA is a single, compact (~2.16M parameter) architecture that pretrains on multivariate time series with no labels, then finetunes a horizon-conditioned predictor for downstream event forecasting. The same encoder transfers across spacecraft telemetry, server metrics, ECG, water-distribution attacks, and turbofan-engine degradation.

The paper makes two contributions:

  1. Predictor-finetuning as the downstream abstraction. After self-supervised pretraining, freeze the encoder and finetune only the horizon-conditioned predictor + a shared event head. The resulting probability surface p(t, Delta_t) is monotone in Delta_t by construction (discrete-hazard CDF) and supports any event-prediction metric (AUPRC, h-AUROC, PA-F1, RMSE).
  2. One architecture across N datasets and M domains. The same 2.16M-param model attains competitive results on every benchmark in Table 1 - no per-dataset tuning, no architecture changes.

Paper: forthcoming - NeurIPS 2026 (link TBD).

Installation

pip install -e .

PyTorch >= 2.0 is required. CPU works for the unit tests; a single GPU is recommended for full pretraining.

Quickstart

# 1. Tell HEPA where your data lives (default: ~/.hepa/data)
export HEPA_DATA_DIR=/path/to/data

# 2. Get download instructions for any supported dataset
python scripts/download_data.py FD001

# 3. Pretrain + finetune + evaluate on C-MAPSS FD001
python scripts/train.py --dataset FD001 --seed 0

The train.py script:

  1. Loads the dataset bundle (pretrain stream, finetune entities).
  2. Pretrains HEPA with the JEPA objective (50 epochs by default).
  3. Freezes the encoder; finetunes the predictor + event head with positive-weighted BCE.
  4. Reports pooled AUPRC, pooled AUROC, h-AUROC (paper's primary metric), and per-horizon breakdown.
  5. Saves the probability surface (*.npz) and metrics (*.json).

Architecture

Component Spec Role
Patch embed group P=16 timesteps, project to d=256 reduce sequence length
Context encoder causal Transformer, L=2, h=4, d=256, RevIN + sin PE x[0:t] -> h_t
Target encoder bidirectional Transformer + attention pool, periodic hard-sync from encoder (SIGReg) x(t : t+Delta_t] -> h*
Predictor MLP(d+1, 256, 256, d) on (h_t, Delta_t) predicted future embedding
Event head LayerNorm + Linear(d, 1) per-horizon hazard logit

Total: ~2.16M parameters. CDF parameterization: lambda_k = sigmoid(head(predictor(h_t, Delta_t_k))), p(t, Delta_t_k) = 1 - prod_{j<=k} (1 - lambda_j).

All hyperparameters live in hepa/utils/config.py under the PROTOCOL dict.

Datasets

Dataset Domain Channels Loader
FD001-004 Turbofan engine degradation 14 hepa.data.cmapss
SMAP Spacecraft telemetry 25 hepa.data.smap
PSM Server metrics 25 hepa.data.psm
MBA Cardiac ECG arrhythmia 2 hepa.data.mba
GECCO Drinking water quality 9 hepa.data.gecco
BATADAL Water-distribution attacks ~43 hepa.data.batadal
TEP Chemical-plant faults 52 hepa.data.tep
ETTm1 Electricity transformer load 7 hepa.data.ettm1
Weather Climate forecasting 21 hepa.data.weather
BeijingAQ Air-quality / public health 11 hepa.data.beijing_aq
VIX Financial volatility 6 hepa.data.vix

HEPA does not redistribute data. See python scripts/download_data.py for source URLs and expected file layouts.

Results

See Table 1 in the paper for the full per-dataset h-AUROC (mean + std over 3 seeds) against the strongest published baselines.

Repository layout

hepa/
  model/        encoder, target_encoder, predictor, event_head, hepa wrapper
  data/         per-dataset loaders + central path config
  training/     pretrain loop, finetune loop, JEPA + BCE losses
  evaluation/   AUPRC / AUROC / h-AUROC / ECE / Brier, surface I/O
  utils/        PROTOCOL dict, seeding helper
scripts/
  train.py            single-entry pretrain + finetune + evaluate
  download_data.py    per-dataset download instructions
tests/
  test_model.py       forward-pass shape tests
  test_metrics.py     metric correctness tests

Citation

@article{hepa2026,
  title   = {HEPA: A Self-Supervised Horizon-Conditioned Event Predictive
             Architecture for Time Series},
  author  = {Anonymous Authors},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT - see LICENSE.

About

HEPA: ASelf-Supervised Horizon-Conditioned Event Prediction Architecture for Time Series

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages