# RL Trading Model Training & Evaluation

This notebook demonstrates how to train a reinforcement learning (RL) trading model using your full feature pipeline and visualize the results. It loads historical price data, extracts features, trains a PPO agent, and evaluates performance.

In [1]:
# ================================================
# 🔧 SETUP - Add src to Python Path
# ================================================

import sys
import os

# Add src directory to Python path so 'core' module can be found
project_root = os.getcwd()
src_path = os.path.join(project_root, 'src')

if src_path not in sys.path:
    sys.path.insert(0, src_path)
    print(f"✅ Added to Python path: {src_path}")
else:
    print(f"✅ Already in path: {src_path}")

# Verify
print(f"📂 Working directory: {project_root}")
print(f"🔍 Python will search for modules in: {src_path}")
print("=" * 50)

✅ Added to Python path: d:\Dev\trading-bot\src
📂 Working directory: d:\Dev\trading-bot
🔍 Python will search for modules in: d:\Dev\trading-bot\src


In [None]:
# Section 5: Train the RL Model - OPTIMIZED for Speed & Performance
from src.prediction.rl_predictor import RLPredictor

print("🚀 Starting OPTIMIZED RL Model Training...")
print("=" * 60)

symbol = 'BTCUSDT'

from src.training.data_loader import DataLoader

# Data Loader
loader = DataLoader()
dfs = loader.load_data(symbol)
features_df = dfs['15m']

print(f"📊 Total data points: {len(features_df):,}")

print(f"\n🎯 OPTIMIZED Training Session")
print("-" * 40)

# Initialize RL Predictor with optimized settings
rl_predictor = RLPredictor(model_dir='models\\rl_optimized')

try:
    print("\n🏦 Starting OPTIMIZED Training...")
    
    # Optimized training with overrides
    success = rl_predictor.train(
        features_df, 
        continue_training=False, 
        verbose=1,
    )
    
    if success:
        print("✅ Training completed successfully!")
        print(f"📁 Model saved to: {rl_predictor.model_dir}")
    else:
        print("⚠️ Training completed with issues")
    
except KeyboardInterrupt:
    print("🛑 Training interrupted by user")
except Exception as e:
    print(f"❌ Training failed: {e}")
    import traceback
    traceback.print_exc()

print("📊 Check training logs above for performance metrics")



🚀 Starting OPTIMIZED RL Model Training...
📥 Loading data for BTCUSDT...
🔧 Converting levels cache index to DatetimeIndex...
✅ Loaded levels cache: data\levels_cache\BTCUSDT-15m-levels.parquet
📊 Shape: 101,000 rows × 9 columns
📊 Total data points: 272,377

🎯 OPTIMIZED Training Session
----------------------------------------
✅ GPU Available: NVIDIA GeForce RTX 3080 (10.0GB)
🖥️ RL Training Device: cuda

🏦 Starting OPTIMIZED Training...
🚀 Initializing PPO model on cuda...
🆕 Creating new model...
🔧 Fitting normalizers for 31 features...
✅ Fitted 31 normalizers
💾 Saved normalizer to models\rl_optimized\normalizer.pkl
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
Using cuda device
⚡ Pre-normalizing feature data...
✅ Pre-normalized 31 features
⚡ Pre-normalizing feature data...
✅ Pre-normali

Output()

🚀 Starting PPO training for 200,000 timesteps on cuda...


-----------------------------
| time/              |      |
|    fps             | 341  |
|    iterations      | 1    |
|    time_elapsed    | 23   |
|    total_timesteps | 8192 |
-----------------------------
