This project is built for MT4/MT5-style XAUUSD minute data such as:
Time (EET),Open,High,Low,Close,Volume
2020.01.09 01:00:00,1557.152,1557.452,1555.202,1555.302,0.045It gives you a complete research pipeline:
- Load and validate raw M1 data.
- Parse broker/EET time safely.
- Resample M1 execution data into M5/M15 decision bars.
- Build causal stationary features normalized by ATR/price.
- Visualize candles, indicators, volatility, sessions, feature correlations, trades, equity, and drawdown.
- Tune a simple trend-following baseline on train/validation splits while keeping the test split sealed.
- Optionally train PPO with a
MultiDiscrete([direction, SL bucket, TP/R bucket])action space. - Validate with walk-forward cross-validation (rolling out-of-sample windows) rather than a single train/val split.
- Reveal the test split only once via
final_holdout_eval.pyafter the model is frozen.
pip install -r requirements.txtPut your full file here (the long Bid history is the primary dataset; the short
Ask file is kept as a fast smoke-test fallback — switch between them in config.py):
data/XAUUSD_1 Min_Bid_2003.05.05_2026.05.31.csv # primary (~23 years)
data/XAUUSD_1 Min_Ask_2020.01.09_2026.01.15.csv # smoke-test fallback (~6 years)
Then run:
python run_pipeline.pyThis keeps the test split sealed and writes validation-only artifacts, including
a walk_forward_baseline.csv showing baseline stability across the rolling folds.
Train the RL agent with walk-forward validation (one model per fold, aggregate out-of-sample report; the final fold is saved as the production model):
python train_ppo.py # == train_ppo.train_walk_forward()When you are ready for the one-time holdout reveal, run:
python final_holdout_eval.pyOr open:
notebooks/XAUUSD_RL_Pipeline_Demo.ipynb
SOURCE_TZ = Europe/Helsinki, because many brokers label server time as EET/EEST.- MT4/MT5 M1 timestamps are treated as candle-open timestamps and shifted to candle-close timestamps before resampling/backtesting.
DECISION_TIMEFRAME = H1, while M1 is retained for intrabar TP/SL simulation.- Features are causal and mostly stationary: ATR-normalized distances, ratios, session flags, candle shape ratios.
- The observation set is 25 features. Four redundant ones were dropped in the 2026-06-02 collinearity audit (
rsi_centered,roc5_atr,body_atr,ema50_slope5_atr); each was |r| ≥ 0.96 with a retained feature (two were exact duplicates). The underlyingema20/50/200columns are still computed for the baselines and charts. - Sliding-window walk-forward (
train_ppo.train_sliding_walk_forward, the defaultpython train_ppo.pyentry point) is the realistic validator: each fold trains onsliding_train_years(5y), selects its checkpoint on the nextsliding_val_months(6m), and is judged out-of-sample on the followingsliding_test_months(6m); the window then slidessliding_step_months(6m) and repeats. This simulates retraining every 6 months and trading the next 6 months live. Every fold's test window is stitched into one continuous out-of-sample equity curve (models/sliding_oos_equity.csv); the gate runs on the test metrics. Because every train window is the same length, equal timesteps per fold = equal passes. ~35 folds over the 23y dataset (raisesliding_step_monthsto reduce). - Block walk-forward (
train_ppo.train_walk_forward;config.py:n_walk_forward_folds,walk_forward_anchored,test_frac) is the alternative: it seals the lasttest_fracof bars and rollsn_foldstrain→val windows over the rest.test_frac == 1 - train_frac - val_frac, so the sealed holdout matches the single-split test exactly. - When TP and SL are both inside the same M1 candle, the simulator assumes SL first. This is deliberately pessimistic.
- Position size is fixed-fractional risk-based; the RL agent controls direction and bracket shape, not size.
run_pipeline.pynow performs a temporal train/validation/test split, tunes on train/val, and keeps test sealed by default.training_diagnostics.pyandtrain_ppo.pyalso keep the test split sealed by default.- RL rewards are normalized by risk budget rather than raw cash PnL, which is materially more stable for PPO.
config.py: all main parameters.data_loader.py: CSV parsing, timezone handling, validation, resampling.features.py: causal indicators and stationary feature matrix.leakage_checks.py: simple future-append stability test.env_bracket.py: Gymnasium-compatible bracket trading environment.baselines.py: random and EMA/ATR rule policies.evaluate.py: metrics, trade-log summary, drawdown.visualize.py: Plotly visualization functions.train_ppo.py: optional PPO training scaffold.run_pipeline.py: one-command pre-test pipeline with validation-only outputs.final_holdout_eval.py: explicit one-time holdout evaluation entry point.notebooks/XAUUSD_RL_Pipeline_Demo.ipynb: guided notebook.