# RL Trading Model Training & Evaluation

This notebook demonstrates how to train a reinforcement learning (RL) trading model using your full feature pipeline and visualize the results. It loads historical price data, extracts features, trains a PPO agent, and evaluates performance.

In [1]:
# ================================================
# 🔧 SETUP - Add src to Python Path
# ================================================

import sys
import os

# Add src directory to Python path so 'core' module can be found
project_root = os.getcwd()
src_path = os.path.join(project_root, 'src')

if src_path not in sys.path:
    sys.path.insert(0, src_path)
    print(f"✅ Added to Python path: {src_path}")
else:
    print(f"✅ Already in path: {src_path}")

# Verify
print(f"📂 Working directory: {project_root}")
print(f"🔍 Python will search for modules in: {src_path}")
print("=" * 50)

✅ Added to Python path: d:\Dev\trading-bot\src
📂 Working directory: d:\Dev\trading-bot
🔍 Python will search for modules in: d:\Dev\trading-bot\src


In [2]:
# Section 5: Train the RL Model - OPTIMIZED for Speed & Performance
from src.prediction.rl_predictor import RLPredictor

print("🚀 Starting OPTIMIZED RL Model Training...")
print("=" * 60)

symbol = 'BTCUSDT'

from src.training.data_loader import DataLoader

# Data Loader
loader = DataLoader()
dfs = loader.load_data(symbol)
features_df = dfs['15m']

print(f"📊 Total data points: {len(features_df):,}")

print(f"\n🎯 OPTIMIZED Training Session")
print("-" * 40)

# Initialize RL Predictor with optimized settings
rl_predictor = RLPredictor(model_dir='models\\rl_optimized')

try:
    print("\n🏦 Starting OPTIMIZED Training...")
    
    # Optimized training with overrides
    success = rl_predictor.train(
        features_df, 
        continue_training=False, 
        verbose=1,
    )
    
    if success:
        print("✅ Training completed successfully!")
        print(f"📁 Model saved to: {rl_predictor.model_dir}")
    else:
        print("⚠️ Training completed with issues")
    
except KeyboardInterrupt:
    print("🛑 Training interrupted by user")
except Exception as e:
    print(f"❌ Training failed: {e}")
    import traceback
    traceback.print_exc()

print("📊 Check training logs above for performance metrics")



🚀 Starting OPTIMIZED RL Model Training...
📥 Loading data for BTCUSDT...
🔧 Converting levels cache index to DatetimeIndex...
✅ Loaded levels cache: data\levels_cache\BTCUSDT-15m-levels.parquet
📊 Shape: 101,000 rows × 9 columns
📊 Total data points: 272,377

🎯 OPTIMIZED Training Session
----------------------------------------
✅ GPU Available: NVIDIA GeForce RTX 3080 (10.0GB)
🖥️ RL Training Device: cuda

🏦 Starting OPTIMIZED Training...
🚀 Initializing PPO model on cuda...
🆕 Creating new model...
🔧 Fitting normalizers for 31 features...
✅ Fitted 31 normalizers
💾 Saved normalizer to models\rl_optimized\normalizer.pkl
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
Using cuda device
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normali

Output()

🚀 Starting PPO training for 200,000 timesteps on cuda...


-----------------------------
| time/              |      |
|    fps             | 1036 |
|    iterations      | 1    |
|    time_elapsed    | 7    |
|    total_timesteps | 8192 |
-----------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 777         |
|    iterations           | 2           |
|    time_elapsed         | 21          |
|    total_timesteps      | 16384       |
| train/                  |             |
|    approx_kl            | 0.018683102 |
|    clip_fraction        | 0.225       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.84       |
|    explained_variance   | -2.83       |
|    learning_rate        | 0.000288    |
|    loss                 | -0.0496     |
|    n_updates            | 4           |
|    policy_gradient_loss | -0.0249     |
|    std                  | 1           |
|    value_loss           | 0.0862      |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 828         |
|    iterations           | 3           |
|    time_elapsed         | 29          |
|    total_timesteps      | 24576       |
| train/                  |             |
|    approx_kl            | 0.019099157 |
|    clip_fraction        | 0.372       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.84       |
|    explained_variance   | -0.169      |
|    learning_rate        | 0.000275    |
|    loss                 | -0.0191     |
|    n_updates            | 8           |
|    policy_gradient_loss | -0.0398     |
|    std                  | 0.998       |
|    value_loss           | 0.0944      |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 848         |
|    iterations           | 4           |
|    time_elapsed         | 38          |
|    total_timesteps      | 32768       |
| train/                  |             |
|    approx_kl            | 0.032203883 |
|    clip_fraction        | 0.431       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.83       |
|    explained_variance   | -0.675      |
|    learning_rate        | 0.000263    |
|    loss                 | -0.0507     |
|    n_updates            | 12          |
|    policy_gradient_loss | -0.0425     |
|    std                  | 0.993       |
|    value_loss           | 0.0557      |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 855         |
|    iterations           | 5           |
|    time_elapsed         | 47          |
|    total_timesteps      | 40960       |
| train/                  |             |
|    approx_kl            | 0.028004833 |
|    clip_fraction        | 0.433       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.82       |
|    explained_variance   | -0.113      |
|    learning_rate        | 0.000251    |
|    loss                 | 0.105       |
|    n_updates            | 16          |
|    policy_gradient_loss | -0.043      |
|    std                  | 0.986       |
|    value_loss           | 0.19        |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 865         |
|    iterations           | 6           |
|    time_elapsed         | 56          |
|    total_timesteps      | 49152       |
| train/                  |             |
|    approx_kl            | 0.029374462 |
|    clip_fraction        | 0.443       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.81       |
|    explained_variance   | -0.438      |
|    learning_rate        | 0.000239    |
|    loss                 | 0.148       |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.045      |
|    std                  | 0.983       |
|    value_loss           | 0.216       |
-----------------------------------------


----------------------------------------
| time/                   |            |
|    fps                  | 877        |
|    iterations           | 7          |
|    time_elapsed         | 65         |
|    total_timesteps      | 57344      |
| train/                  |            |
|    approx_kl            | 0.03376174 |
|    clip_fraction        | 0.348      |
|    clip_range           | 0.1        |
|    entropy_loss         | -2.8       |
|    explained_variance   | -0.118     |
|    learning_rate        | 0.000226   |
|    loss                 | -0.0616    |
|    n_updates            | 24         |
|    policy_gradient_loss | -0.0297    |
|    std                  | 0.983      |
|    value_loss           | 0.266      |
----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 893         |
|    iterations           | 8           |
|    time_elapsed         | 73          |
|    total_timesteps      | 65536       |
| train/                  |             |
|    approx_kl            | 0.021969449 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.8        |
|    explained_variance   | -0.0516     |
|    learning_rate        | 0.000214    |
|    loss                 | 0.662       |
|    n_updates            | 28          |
|    policy_gradient_loss | -0.0179     |
|    std                  | 0.982       |
|    value_loss           | 0.235       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 898         |
|    iterations           | 9           |
|    time_elapsed         | 82          |
|    total_timesteps      | 73728       |
| train/                  |             |
|    approx_kl            | 0.032698084 |
|    clip_fraction        | 0.399       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.8        |
|    explained_variance   | -0.0917     |
|    learning_rate        | 0.000202    |
|    loss                 | -0.0442     |
|    n_updates            | 32          |
|    policy_gradient_loss | -0.0354     |
|    std                  | 0.981       |
|    value_loss           | 0.179       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 907         |
|    iterations           | 10          |
|    time_elapsed         | 90          |
|    total_timesteps      | 81920       |
| train/                  |             |
|    approx_kl            | 0.039433617 |
|    clip_fraction        | 0.315       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.8        |
|    explained_variance   | -0.0415     |
|    learning_rate        | 0.000189    |
|    loss                 | -0.0818     |
|    n_updates            | 36          |
|    policy_gradient_loss | -0.0258     |
|    std                  | 0.981       |
|    value_loss           | 0.249       |
-----------------------------------------


---------------------------------------
| time/                   |           |
|    fps                  | 909       |
|    iterations           | 11        |
|    time_elapsed         | 99        |
|    total_timesteps      | 90112     |
| train/                  |           |
|    approx_kl            | 0.0366146 |
|    clip_fraction        | 0.398     |
|    clip_range           | 0.1       |
|    entropy_loss         | -2.8      |
|    explained_variance   | -0.0186   |
|    learning_rate        | 0.000177  |
|    loss                 | 0.0947    |
|    n_updates            | 40        |
|    policy_gradient_loss | -0.0354   |
|    std                  | 0.979     |
|    value_loss           | 0.201     |
---------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 911         |
|    iterations           | 12          |
|    time_elapsed         | 107         |
|    total_timesteps      | 98304       |
| train/                  |             |
|    approx_kl            | 0.033189964 |
|    clip_fraction        | 0.405       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.79       |
|    explained_variance   | 0.00334     |
|    learning_rate        | 0.000165    |
|    loss                 | -0.0685     |
|    n_updates            | 44          |
|    policy_gradient_loss | -0.037      |
|    std                  | 0.976       |
|    value_loss           | 0.288       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 917         |
|    iterations           | 13          |
|    time_elapsed         | 116         |
|    total_timesteps      | 106496      |
| train/                  |             |
|    approx_kl            | 0.043046884 |
|    clip_fraction        | 0.296       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.79       |
|    explained_variance   | 0.0229      |
|    learning_rate        | 0.000153    |
|    loss                 | -0.061      |
|    n_updates            | 48          |
|    policy_gradient_loss | -0.027      |
|    std                  | 0.975       |
|    value_loss           | 0.263       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 919         |
|    iterations           | 14          |
|    time_elapsed         | 124         |
|    total_timesteps      | 114688      |
| train/                  |             |
|    approx_kl            | 0.032841336 |
|    clip_fraction        | 0.352       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.79       |
|    explained_variance   | 0.128       |
|    learning_rate        | 0.00014     |
|    loss                 | -0.0527     |
|    n_updates            | 52          |
|    policy_gradient_loss | -0.0332     |
|    std                  | 0.974       |
|    value_loss           | 0.285       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 919         |
|    iterations           | 15          |
|    time_elapsed         | 133         |
|    total_timesteps      | 122880      |
| train/                  |             |
|    approx_kl            | 0.035627156 |
|    clip_fraction        | 0.384       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.78       |
|    explained_variance   | 0.348       |
|    learning_rate        | 0.000128    |
|    loss                 | 0.175       |
|    n_updates            | 56          |
|    policy_gradient_loss | -0.0372     |
|    std                  | 0.973       |
|    value_loss           | 0.266       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 917         |
|    iterations           | 16          |
|    time_elapsed         | 142         |
|    total_timesteps      | 131072      |
| train/                  |             |
|    approx_kl            | 0.031203624 |
|    clip_fraction        | 0.412       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.78       |
|    explained_variance   | 0.486       |
|    learning_rate        | 0.000116    |
|    loss                 | 0.808       |
|    n_updates            | 60          |
|    policy_gradient_loss | -0.0427     |
|    std                  | 0.971       |
|    value_loss           | 0.268       |
-----------------------------------------


----------------------------------------
| time/                   |            |
|    fps                  | 918        |
|    iterations           | 17         |
|    time_elapsed         | 151        |
|    total_timesteps      | 139264     |
| train/                  |            |
|    approx_kl            | 0.02938652 |
|    clip_fraction        | 0.414      |
|    clip_range           | 0.1        |
|    entropy_loss         | -2.78      |
|    explained_variance   | 0.161      |
|    learning_rate        | 0.000103   |
|    loss                 | -0.0113    |
|    n_updates            | 64         |
|    policy_gradient_loss | -0.0427    |
|    std                  | 0.968      |
|    value_loss           | 0.312      |
----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 920         |
|    iterations           | 18          |
|    time_elapsed         | 160         |
|    total_timesteps      | 147456      |
| train/                  |             |
|    approx_kl            | 0.024747938 |
|    clip_fraction        | 0.384       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.77       |
|    explained_variance   | 0.294       |
|    learning_rate        | 9.11e-05    |
|    loss                 | 0.215       |
|    n_updates            | 68          |
|    policy_gradient_loss | -0.042      |
|    std                  | 0.966       |
|    value_loss           | 0.367       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 921         |
|    iterations           | 19          |
|    time_elapsed         | 168         |
|    total_timesteps      | 155648      |
| train/                  |             |
|    approx_kl            | 0.025284268 |
|    clip_fraction        | 0.38        |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.77       |
|    explained_variance   | 0.473       |
|    learning_rate        | 7.88e-05    |
|    loss                 | 0.0186      |
|    n_updates            | 72          |
|    policy_gradient_loss | -0.0434     |
|    std                  | 0.965       |
|    value_loss           | 0.292       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 922         |
|    iterations           | 20          |
|    time_elapsed         | 177         |
|    total_timesteps      | 163840      |
| train/                  |             |
|    approx_kl            | 0.021705776 |
|    clip_fraction        | 0.35        |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.77       |
|    explained_variance   | 0.488       |
|    learning_rate        | 6.65e-05    |
|    loss                 | 0.272       |
|    n_updates            | 76          |
|    policy_gradient_loss | -0.0415     |
|    std                  | 0.964       |
|    value_loss           | 0.369       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 922         |
|    iterations           | 21          |
|    time_elapsed         | 186         |
|    total_timesteps      | 172032      |
| train/                  |             |
|    approx_kl            | 0.021014828 |
|    clip_fraction        | 0.333       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.76       |
|    explained_variance   | 0.62        |
|    learning_rate        | 5.42e-05    |
|    loss                 | 0.246       |
|    n_updates            | 80          |
|    policy_gradient_loss | -0.0412     |
|    std                  | 0.963       |
|    value_loss           | 0.34        |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 923         |
|    iterations           | 22          |
|    time_elapsed         | 195         |
|    total_timesteps      | 180224      |
| train/                  |             |
|    approx_kl            | 0.015653726 |
|    clip_fraction        | 0.296       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.76       |
|    explained_variance   | 0.642       |
|    learning_rate        | 4.2e-05     |
|    loss                 | 0.408       |
|    n_updates            | 84          |
|    policy_gradient_loss | -0.0398     |
|    std                  | 0.962       |
|    value_loss           | 0.377       |
-----------------------------------------


-----------------------------------------
| time/                   |             |
|    fps                  | 924         |
|    iterations           | 23          |
|    time_elapsed         | 203         |
|    total_timesteps      | 188416      |
| train/                  |             |
|    approx_kl            | 0.013900823 |
|    clip_fraction        | 0.26        |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.76       |
|    explained_variance   | 0.637       |
|    learning_rate        | 2.97e-05    |
|    loss                 | 0.0609      |
|    n_updates            | 88          |
|    policy_gradient_loss | -0.0377     |
|    std                  | 0.961       |
|    value_loss           | 0.368       |
-----------------------------------------


------------------------------------------
| time/                   |              |
|    fps                  | 925          |
|    iterations           | 24           |
|    time_elapsed         | 212          |
|    total_timesteps      | 196608       |
| train/                  |              |
|    approx_kl            | 0.0076312153 |
|    clip_fraction        | 0.184        |
|    clip_range           | 0.1          |
|    entropy_loss         | -2.76        |
|    explained_variance   | 0.749        |
|    learning_rate        | 1.74e-05     |
|    loss                 | 0.293        |
|    n_updates            | 92           |
|    policy_gradient_loss | -0.0323      |
|    std                  | 0.961        |
|    value_loss           | 0.361        |
------------------------------------------


------------------------------------------
| time/                   |              |
|    fps                  | 925          |
|    iterations           | 25           |
|    time_elapsed         | 221          |
|    total_timesteps      | 204800       |
| train/                  |              |
|    approx_kl            | 0.0017051683 |
|    clip_fraction        | 0.0456       |
|    clip_range           | 0.1          |
|    entropy_loss         | -2.76        |
|    explained_variance   | 0.781        |
|    learning_rate        | 5.09e-06     |
|    loss                 | 0.211        |
|    n_updates            | 96           |
|    policy_gradient_loss | -0.0168      |
|    std                  | 0.961        |
|    value_loss           | 0.43         |
------------------------------------------


✅ Training completed successfully
✅ Model saved to: models\rl_optimized\ppo_trading.zip
✅ Training completed successfully!
📁 Model saved to: models\rl_optimized
📊 Check training logs above for performance metrics
