StonkBench: Unified Benchmark for Synthetic Data Generation in Financial Time Series (SDGFTS)

A unified, reproducible benchmark for evaluating synthetic time series generators in finance. All results, metrics, and experiment outputs are automatically saved and organized.

Quickstart

Installation

Python: 3.11+ (recommended)
Install all dependencies:
```
pip install -r requirements.txt
```

Run with Docker Compose (Recommended)

The easiest way to run the complete pipeline is using Docker Compose, which orchestrates all stages from data download to evaluation and plotting.

Build the base image

docker-compose build base

Run the entire pipeline

docker-compose up

This command runs all services in dependency order:

data-download: Downloads and preprocesses SPXUSD time series data
generate-data: Generates synthetic data using both parametric and non-parametric models
eval: Evaluates all generated data using the unified evaluator
plot: Generates publication-ready figures from evaluation results

Run specific services

# Run only data download
docker-compose up data-download

# Run data download and generation only
docker-compose up data-download generate-data

# Run through evaluation, skip plotting
docker-compose up data-download generate-data eval

Environment variables

Set environment variables via .env file or export in your shell:

# Set the CUDA device (if using CUDA)
export CUDA_VISIBLE_DEVICES=0

Volume mounts

The following local directories are mapped into containers:

./data → /data (raw and processed data)
./generated_data → /generated_data (synthetic data outputs)
./results → /results (evaluation results)
./evaluation_plots → /evaluation_plots (plots and figures)
./configs → /app/configs (read-only configuration files)

Run Locally (Non-Docker)

1. Download Dataset

Fetch the required dataset:

python src/data_downloader.py --index spxusd --year 2023 2024

This saves data to data/raw/ and processed data to data/processed/.

2. Generate Synthetic Data

Generate synthetic data using the unified script (handles both parametric and non-parametric models):

python src/generation_scripts/generate_data.py \
  --generation_length 52 \
  --num_samples 1000 \
  --seed 42 \
  --output_dir generated_data

The script trains models on the training set at the ACF-inferred sequence length, then generates samples by stitching log returns to reach the target generation length. Artifacts are saved under generated_data/<ModelName>/<ModelName>_seq_<L>.pt.

3. Evaluate Generated Data

Evaluate all generated artifacts:

python src/unified_evaluator.py \
  --generated_dir generated_data \
  --results_dir results \
  --seq_lengths 52 60 120 180 240 300

Outputs are saved to:

/results/seq_<L>/<ModelName>/metrics.json - Evaluation metrics
/results/seq_<L>/<ModelName>/visualizations/ - Visualization outputs

4. Generate Publication-Ready Plots

Generate comprehensive, publication-ready plots for all evaluation metrics:

python src/plot_statistics/evaluation_plotter.py

This automatically finds the latest evaluation results, generates publication-quality plots (300 DPI), and saves them to evaluation_plots/ directory.

Pipeline Overview

What happens:

Data Preprocessing:
- Non-parametric models: The data is segmented into overlapping sub-sequences of shape (R, l, N) where R is the number of sequences, l is the sequence length, and N is the number of features.
- Parametric models: The original time series is used without segmentation, resulting in data of shape (l, N).
Models are trained on the training set at the ACF-inferred sequence length
Generated samples are stitched to reach target generation lengths
All taxonomy metrics (fidelity, diversity, efficiency, and stylized facts) are computed
Results are printed in the console and saved to detailed JSON files in the results directory

Customizing runs

configs/dataset_cfgs.yaml: Modify the preprocessing of the dataset for parametric/non-parametric models.

Docker Troubleshooting

View logs for a specific service

docker-compose logs -f generate-data

Rebuild after code changes

docker-compose build base
docker-compose up

Run a single service with a custom command

# Build the base image first
docker-compose build base

# Run with a specific python command
docker-compose run --rm generate-data python src/generation_scripts/generate_data.py --generation_length 52

Clean up

# Stop all containers
docker-compose down

# Remove volumes (WARNING: deletes data!)
docker-compose down -v

# Remove images
docker-compose down --rmi all

Project Structure

Unified-benchmark-for-SDGFTS-main/
  ├─ data/                       # Raw and preprocessed datasets
  ├─ notebooks/                  # Validate functionality of parts of the pipeline
  ├─ results/                    # Evaluation results (JSON files)
  ├─ evaluation_plots/           # Publication-ready plots (generated)
  ├─ src/
  │   ├─ models/                 # Generative model implementations
  │   ├─ taxonomies/
  │   │   ├─ diversity.py        # Diversity metrics (e.g., ICD, ED, DTW)
  │   │   ├─ efficiency.py       # Efficiency metrics (runtime, memory)
  │   │   ├─ fidelity.py         # Fidelity/feature metrics + Visualization (MDD, MD, SDD, KD, ACD, t-SNE, Distrib. Plots)
  │   │   └─ stylized_facts.py   # Stylized facts metrics (tails, autocorr, volatility)
  │   ├─ plot_statistics/        # Plotting functionality for evaluation results
  │   │   └─ evaluation_plotter.py  # Main plotting script (executable)
  │   ├─ utils/                  # Configs, display, math, evaluation classes, preprocessing, etc.
  │   │   └─ eval_plot_utils.py  # Utilities for evaluation plotting
  │   └─ data_downloader.py      # Dataset download utility
  ├─ configs/                    # Experiment and preprocessing config templates
  ├─ requirements.txt
  └─ README.md

Supported Models

The benchmark supports a range of both traditional parametric models and modern deep learning approaches:

Parametric Models

Geometric Brownian Motion (GBM)
Ornstein-Uhlenbeck (OU) Process
Merton Jump Diffusion (MJD)
Double Exponential Jump Diffusion (DEJD)
GARCH(1,1)

Non-parametric & Deep Learning Models

TimeGAN
QuantGAN
TimeVAE
Sig-WGAN
Block Bootstrap

All models share a unified interface for training, sample generation, and comprehensive metric evaluation.

Metrics & Evaluation

1. Fidelity Metrics

Feature-based Distances
- Marginal Distribution Difference (MDD)
- Mean Difference (MD)
- Standard Deviation Difference (SDD)
- Kurtosis Difference (KD)
- AutoCorrelation Difference (ACD)
Visualization
- t-SNE Visualization
- Distribution Comparison Plots

2. Diversity Metrics

Intra-Class Distance
- Euclidean Distance (ED)
- Dynamic Time Warping (DTW)

3. Efficiency Metrics

Generation Time (seconds for generating 500 samples)

4. Stylized Facts Metrics

Heavy Tails (Excess Kurtosis)
Lag-1 Autocorrelation of Returns
Volatility Clustering
Long Memory in Volatility
Non-Stationarity Detection

Refer to src/taxonomies/ for implementation details and to src/utils/ for utility functions.

How To Add Your Own Model

Implement your model in src/models/ and ensure you inherit from the appropriate base class (ParametricModel or DeepLearningModel).
Register your model in notebooks/pipeline_validation.py by specifying it under run_complete_evaluation.
Rerun the pipeline and review your results in the results/ directory!

Results

All results are available in:

The console (summary tables per model)
results/ directory (will be created with JSON results containing all metrics, parameters, and evaluation outputs)

Contributors

Name	Role	Email
Eddison Pham	Machine Learning Researcher & Engineer	eddison.pham@mail.utoronto.ca
Albert Lam Ho	Quantitative Researcher	uyenlam.ho@mail.utoronto.ca
Yiqing Irene Huang	Research Supervisor/Professor	iy.huang@mail.utoronto.ca

More

For detailed examples and model-by-model usage, see notebooks/.
To report issues or contribute, see the Contributing section below.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
configs		configs
notebooks		notebooks
src		src
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.base		Dockerfile.base
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

StonkBench: Unified Benchmark for Synthetic Data Generation in Financial Time Series (SDGFTS)

Quickstart

Installation

Run with Docker Compose (Recommended)

Build the base image

Run the entire pipeline

Run specific services

Environment variables

Volume mounts

Run Locally (Non-Docker)

1. Download Dataset

2. Generate Synthetic Data

3. Evaluate Generated Data

4. Generate Publication-Ready Plots

Pipeline Overview

Customizing runs

Docker Troubleshooting

View logs for a specific service

Rebuild after code changes

Run a single service with a custom command

Clean up

Project Structure

Supported Models

Metrics & Evaluation

1. Fidelity Metrics

2. Diversity Metrics

3. Efficiency Metrics

4. Stylized Facts Metrics

How To Add Your Own Model

Results

Contributors

More

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages