XAUUSD M1 → RL-ready feature + bracket-trading pipeline

This project is built for MT4/MT5-style XAUUSD minute data such as:

Time (EET),Open,High,Low,Close,Volume
2020.01.09 01:00:00,1557.152,1557.452,1555.202,1555.302,0.045

It gives you a complete research pipeline:

Load and validate raw M1 data.
Parse broker/EET time safely.
Resample M1 execution data into M5/M15 decision bars.
Build causal stationary features normalized by ATR/price.
Visualize candles, indicators, volatility, sessions, feature correlations, trades, equity, and drawdown.
Tune a simple trend-following baseline on train/validation splits while keeping the test split sealed.
Optionally train PPO with a MultiDiscrete([direction, SL bucket, TP/R bucket]) action space.
Validate with walk-forward cross-validation (rolling out-of-sample windows) rather than a single train/val split.
Reveal the test split only once via final_holdout_eval.py after the model is frozen.

Quick start

pip install -r requirements.txt

Put your full file here (the long Bid history is the primary dataset; the short Ask file is kept as a fast smoke-test fallback — switch between them in config.py):

data/XAUUSD_1 Min_Bid_2003.05.05_2026.05.31.csv   # primary (~23 years)
data/XAUUSD_1 Min_Ask_2020.01.09_2026.01.15.csv   # smoke-test fallback (~6 years)

Then run:

python run_pipeline.py

This keeps the test split sealed and writes validation-only artifacts, including a walk_forward_baseline.csv showing baseline stability across the rolling folds.

Train the RL agent with walk-forward validation (one model per fold, aggregate out-of-sample report; the final fold is saved as the production model):

python train_ppo.py            # == train_ppo.train_walk_forward()

When you are ready for the one-time holdout reveal, run:

python final_holdout_eval.py

Or open:

notebooks/XAUUSD_RL_Pipeline_Demo.ipynb

Important defaults

SOURCE_TZ = Europe/Helsinki, because many brokers label server time as EET/EEST.
MT4/MT5 M1 timestamps are treated as candle-open timestamps and shifted to candle-close timestamps before resampling/backtesting.
DECISION_TIMEFRAME = H1, while M1 is retained for intrabar TP/SL simulation.
Features are causal and mostly stationary: ATR-normalized distances, ratios, session flags, candle shape ratios.
The observation set is 25 features. Four redundant ones were dropped in the 2026-06-02 collinearity audit (rsi_centered, roc5_atr, body_atr, ema50_slope5_atr); each was |r| ≥ 0.96 with a retained feature (two were exact duplicates). The underlying ema20/50/200 columns are still computed for the baselines and charts.
Sliding-window walk-forward (train_ppo.train_sliding_walk_forward, the default python train_ppo.py entry point) is the realistic validator: each fold trains on sliding_train_years (5y), selects its checkpoint on the next sliding_val_months (6m), and is judged out-of-sample on the following sliding_test_months (6m); the window then slides sliding_step_months (6m) and repeats. This simulates retraining every 6 months and trading the next 6 months live. Every fold's test window is stitched into one continuous out-of-sample equity curve (models/sliding_oos_equity.csv); the gate runs on the test metrics. Because every train window is the same length, equal timesteps per fold = equal passes. ~35 folds over the 23y dataset (raise sliding_step_months to reduce).
Block walk-forward (train_ppo.train_walk_forward; config.py: n_walk_forward_folds, walk_forward_anchored, test_frac) is the alternative: it seals the last test_frac of bars and rolls n_folds train→val windows over the rest. test_frac == 1 - train_frac - val_frac, so the sealed holdout matches the single-split test exactly.
When TP and SL are both inside the same M1 candle, the simulator assumes SL first. This is deliberately pessimistic.
Position size is fixed-fractional risk-based; the RL agent controls direction and bracket shape, not size.
run_pipeline.py now performs a temporal train/validation/test split, tunes on train/val, and keeps test sealed by default.
training_diagnostics.py and train_ppo.py also keep the test split sealed by default.
RL rewards are normalized by risk budget rather than raw cash PnL, which is materially more stable for PPO.

Files

config.py: all main parameters.
data_loader.py: CSV parsing, timezone handling, validation, resampling.
features.py: causal indicators and stationary feature matrix.
leakage_checks.py: simple future-append stability test.
env_bracket.py: Gymnasium-compatible bracket trading environment.
baselines.py: random and EMA/ATR rule policies.
evaluate.py: metrics, trade-log summary, drawdown.
visualize.py: Plotly visualization functions.
train_ppo.py: optional PPO training scaffold.
run_pipeline.py: one-command pre-test pipeline with validation-only outputs.
final_holdout_eval.py: explicit one-time holdout evaluation entry point.
notebooks/XAUUSD_RL_Pipeline_Demo.ipynb: guided notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XAUUSD M1 → RL-ready feature + bracket-trading pipeline

Quick start

Important defaults

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
notebooks		notebooks
README.md		README.md
baselines.py		baselines.py
config.py		config.py
data_loader.py		data_loader.py
env_bracket.py		env_bracket.py
evaluate.py		evaluate.py
features.py		features.py
final_holdout_eval.py		final_holdout_eval.py
leakage_checks.py		leakage_checks.py
model_artifacts.py		model_artifacts.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
train_ppo.py		train_ppo.py
training_diagnostics.py		training_diagnostics.py
view_results.py		view_results.py
visualize.py		visualize.py

Folders and files

Latest commit

History

Repository files navigation

XAUUSD M1 → RL-ready feature + bracket-trading pipeline

Quick start

Important defaults

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages