Quant Factor Mining

A leakage-safe, walk-forward multi-factor research framework for equity signals. Built for rigorous, reproducible factor research with strict time-ordering, cost-aware backtesting, and holdout validation.

Overview

This framework implements a complete factor research pipeline:

OHLCV Data → Factor Computation → Walk-Forward Validation → Portfolio Backtest → Attribution & Reporting

Three classic equity factors are computed, cross-sectionally z-scored, and combined via information-coefficient (IC) weighting:

Factor	Formula	Default Params	Intuition
Momentum	`close(t−skip) / close(t−lookback−skip) − 1`	lookback=252d, skip=21d	Medium-term continuation
Mean Reversion	`−(close(t) / close(t−lookback) − 1)`	lookback=21d	Short-term reversal
Low Volatility	`−std(returns, window) × √252`	window=63d	Lower risk, higher score

Key design principles:

Signal at t is applied from t+1 — no lookahead leakage
Walk-forward folds estimate weights on train data only
Holdout period is untouched during parameter selection
Deterministic synthetic data by default (seed=42) for full reproducibility

Project Structure

src/qfm/
├── data/           Data contracts, preprocessing, snapshot & live fetch
├── factors/        Momentum, mean-reversion, low-volatility implementations
├── labels/         Forward return label computation
├── modeling/       Walk-forward engine, feature builder, parameter search, stability
├── portfolio/      Risk model, mean-variance optimizer, constraints
├── backtest/       Backtest engine, transaction costs, performance metrics, benchmarks
└── reporting/      Tearsheet & table generation

configs/
├── base.yaml                 Data config (10 tickers, 2018–2024)
└── strategy/
    ├── default.yaml          Full strategy with nested parameter search
    └── stability.yaml        Stability-first selection with holdout gate

scripts/
├── refresh_data.py           Generate synthetic or fetch live market data
├── run_walkforward.py        Main walk-forward research pipeline
├── run_backtest.py           Single full-period backtest
├── run_parameter_search.py   Grid search over factor parameters
└── generate_report.py        Generate markdown report from run artifacts

tests/
├── unit/                     17 test files covering contracts, factors, sizing, costs
└── integration/              Full pipeline & holdout split tests

dashboard/                    Streamlit factor monitor (read-only)
docs/                         Methodology, equations, assumptions, limitations
artifacts/runs/<timestamp>/   Per-run output directory

Quick Start

Install

git clone https://github.com/Wrigggy/quant-factor-mining.git
cd quant-factor-mining
pip install -r requirements.txt

Run the Full Pipeline

# 1. Generate deterministic snapshot data
python3 scripts/refresh_data.py --config configs/base.yaml

# 2. Run walk-forward research with nested parameter search
python3 scripts/run_walkforward.py --config configs/strategy/default.yaml

# 3. Generate human-readable report
python3 scripts/generate_report.py

Results are saved to artifacts/runs/<timestamp>/ with full config snapshots for reproducibility.

Other Workflows

# Single full-period backtest
python3 scripts/run_backtest.py --config configs/strategy/default.yaml

# Standalone grid search
python3 scripts/run_parameter_search.py --config configs/strategy/default.yaml

# Stability-first selection with holdout gating
python3 scripts/run_walkforward.py --config configs/strategy/stability.yaml

# Use live market data from Yahoo Finance
python3 scripts/refresh_data.py --config configs/base.yaml --live
python3 scripts/run_walkforward.py --config configs/strategy/default.yaml --live

Pipeline Details

Data

Default: Synthetic OHLCV for 10 large-cap US equities (AAPL, MSFT, NVDA, AMZN, META, GOOGL, TSLA, JPM, XOM, JNJ) over 2018–2024
Format: MultiIndex (date, ticker) DataFrame with OHLCV columns, validated by a strict data contract
Preprocessing: Forward-fill gaps (max 3 days), remove invalid bars (high < low)
Live mode: Fetches real data via yfinance with --live flag

Walk-Forward Validation

Rolling train/test splits with configurable window sizes:

Train window: 504 days (~2 years) — used to estimate IC-based factor weights
Test window: 126 days (~6 months) — out-of-sample evaluation
Step size: Configurable (default = test size, or 21 days for granular analysis)

Factor weights are the Spearman IC of each factor against forward returns, estimated on train data only. Composite score = IC-weighted sum of normalized factors.

Backtesting

Position sizing: Equal-weight or score-tilted (softmax with temperature + per-name weight cap)
Rebalancing: Every N days (default 21), with turnover tracking
Cost models: Linear bps or full liquidity model (commission + spread + slippage + market impact)
Benchmark: SPY when available, equal-weight fallback for offline mode

Evaluation Metrics

Metric	Description
Sharpe Ratio	Risk-adjusted return (annualized)
Alpha	Excess return vs. benchmark (annual)
Beta	Sensitivity to benchmark
Information Ratio	Active return / tracking error
Max Drawdown	Worst peak-to-trough decline
Bootstrap CIs	95% confidence intervals for alpha and IR

Holdout Gating

An untouched holdout period (default: last 126 days) is reserved for final validation. The holdout gate checks minimum Sharpe and excess return thresholds — parameter sets that fail are flagged, preventing overfitted configs from passing selection.

Run Artifacts

Each run produces a timestamped directory under artifacts/runs/ containing:

File	Description
`metrics.json`	Aggregate metrics across all folds
`fold_metrics.csv`	Per-fold performance breakdown
`equity_curve.parquet`	Daily portfolio value time series
`holdout_metrics.json`	Untouched holdout period evaluation
`nested_search_results.csv`	Grid search results ranked by selection metric
`selected_params.json`	Best parameter set from search
`stability_summary.json`	Fold dispersion diagnostics
`config_snapshot.yaml`	Full config used for the run
`run_snapshot.json`	Run metadata (data summary, timestamp)
`report.md`	Human-readable markdown report

Testing

All tests are offline and deterministic:

python3 -m pytest -q

Coverage includes data contract validation, no-lookahead verification, factor index hygiene, position sizing, benchmark attribution, liquidity costs, walk-forward fold generation, and stability selection.

Documentation

docs/methodology.md — Core workflow and leakage controls
docs/model_equations.md — Full mathematical specification
docs/assumptions.md — Key assumptions
docs/limitations.md — Known limitations

Limitations

This is a research framework, not production trading software
Default mode uses synthetic data — factor premia are not embedded in the generator, so performance reflects noise characteristics rather than real market anomalies
Cost model is a proxy (not LOB simulation), even with liquidity extensions
Factor universe is limited to three classic signals; extending requires subclassing BaseFactor

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
artifacts		artifacts
configs		configs
dashboard		dashboard
docs		docs
legacy		legacy
scripts		scripts
src/qfm		src/qfm
tests		tests
.gitignore		.gitignore
README.md		README.md
generate_demo_data.py		generate_demo_data.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quant Factor Mining

Overview

Project Structure

Quick Start

Install

Run the Full Pipeline

Other Workflows

Pipeline Details

Data

Walk-Forward Validation

Backtesting

Evaluation Metrics

Holdout Gating

Run Artifacts

Testing

Documentation

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quant Factor Mining

Overview

Project Structure

Quick Start

Install

Run the Full Pipeline

Other Workflows

Pipeline Details

Data

Walk-Forward Validation

Backtesting

Evaluation Metrics

Holdout Gating

Run Artifacts

Testing

Documentation

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages