# RL Trading Model Training & Evaluation

This notebook demonstrates how to train a reinforcement learning (RL) trading model using your full feature pipeline and visualize the results. It loads historical price data, extracts features, trains a PPO agent, and evaluates performance.

In [1]:
# ================================================
# 🔧 SETUP - Add src to Python Path
# ================================================

import sys
import os

# Add src directory to Python path so 'core' module can be found
project_root = os.getcwd()
src_path = os.path.join(project_root, 'src')

if src_path not in sys.path:
    sys.path.insert(0, src_path)
    print(f"✅ Added to Python path: {src_path}")
else:
    print(f"✅ Already in path: {src_path}")

# Verify
print(f"📂 Working directory: {project_root}")
print(f"🔍 Python will search for modules in: {src_path}")
print("=" * 50)

✅ Added to Python path: d:\Dev\trading-bot\src
📂 Working directory: d:\Dev\trading-bot
🔍 Python will search for modules in: d:\Dev\trading-bot\src


In [2]:
# Section 1: Import Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
import gymnasium as gym

# Project modules



# Section 2: Load and Prepare Data
from src.core.trading_types import ChartInterval
from src.training.data_loader import DataLoader
from pathlib import Path

# Data Loader
loader = DataLoader()



## Section 3: Train the RL Model

Train a PPO agent using the extracted features and visualize training progress.

**Key Improvements for Stability:**
- ✅ **Data Normalization**: All features properly normalized (fixes massive loss values)
- ✅ **Optimized Hyperparameters**: Tuned for 672-window observation space
- ✅ **GPU Acceleration**: Uses RTX 3080 for faster training

In [3]:
# Section 5: Train the RL Model - Main Training Cell
from src.prediction.rl_predictor import RLPredictor
print("🚀 Starting RL Model Training...")
print("=" * 50)

symbol = 'BTCUSDT'
dfs = loader.load_data(symbol)
features_df = dfs['15m']
loop = 1
 

print("Columns in features_df:", len(features_df.columns.tolist()), features_df.columns.tolist())
for session in range(1, loop + 1):
    print(f"\n🎯 Training Session {session}/{loop}")
    print("-" * 30)
    # Initialize RL Predictor
    rl_predictor = RLPredictor(model_dir='models/rl_demo')

    # Train the RL model with default parameters
    print(f"\n🎯 Training PPO agent...")
    try:
        rl_predictor.train(features_df, generate_report=True)
    except Exception as e:
        print(f"❌ Training failed: {e}")
        raise

    print(f"\n🔥 RL Agent ready for prediction generation!")


     

🚀 Starting RL Model Training...
📥 Loading data for BTCUSDT...
🔧 Adding timeframe-specific technical indicators...
🔧 Converting levels cache index to DatetimeIndex...
✅ Loaded levels cache: data\levels_cache\BTCUSDT-15m-levels.parquet
📊 Shape: 101,000 rows × 9 columns
🔄 Recalculating higher timeframe indicators for 15m...
Columns in features_df: 50 ['time', 'open', 'high', 'low', 'close', 'volume', 'rsi', 'macd', 'macd_signal', 'macd_hist', 'bb_upper', 'bb_lower', 'bb_position', 'volume_ma20', 'volume_ratio', 'obv', 'volatility', 'atr', 'adx', 'ema5', 'ema9', 'ema13', 'ema20', 'ema21', 'ema50', 'ema200', 'ema9_ema21_cross', 'ema20_ema50_cross', 'stochastic_k', 'stochastic_d', 'vwap', 'levels_json', 'ema20_1h', 'ema50_1h', 'ema200_1h', 'rsi_1h', 'macd_1h', 'macd_hist_1h', 'ema20_D', 'ema50_D', 'macd_hist_D', 'rsi_D', 'ema20_W', 'ema50_W', 'macd_hist_W', 'rsi_W', 'ema20_M', 'ema50_M', 'macd_hist_M', 'rsi_M']

🎯 Training Session 1/1
------------------------------
⚠️ GPU not available, usin

## 📊 Show Training Report

After training completes, you can display the comprehensive PPO performance analysis with interactive charts and AI-friendly summaries.

In [5]:
# Display Comprehensive Training Report
# This shows PPO performance metrics, interactive charts, and AI-friendly analysis

from src.reporting.model_training_report import ModelTrainingReport


print("📊 Generating Training Performance Report")
print("=" * 50)

report = ModelTrainingReport()
# Show the training report with charts and analysis
# This reads the training data captured by the callback during training
report.show_training_report()
 

📊 Generating Training Performance Report
🚀 PPO Training Performance Report
📊 Loaded enhanced training data with trading metrics
📊 SESSION OVERVIEW
Algorithm: PPO
Timesteps: 9,000 / 10,000
Evaluations: 0
Duration: 57:07
Speed: 3 steps/sec
Status: ✅ Completed

🎯 PPO PERFORMANCE METRICS

�🔍 TRAINING STABILITY
Speed Stability: Unknown
Data Points: 0
Progression: Unknown

💡 RECOMMENDATIONS
  📈 Low clip fraction (<0.05) - could increase learning rate for faster learning
  📊 Moderate explained variance - value function making progress
  🎲 High entropy - model still exploring (good for early training)

📋 COPY-FRIENDLY SUMMARY FOR AI ANALYSIS
Training Configuration:
- Algorithm: PPO
- Timesteps: 9,000
- Duration: 3427.8s
- Speed: 3 steps/sec
- Evaluations: 0

Final Metrics:

Key Issues/Strengths:
- NOTE: Low clip fraction (<0.05) - could increase learning rate for faster learning
- MODERATE: Moderate explained variance - value function making progress
- INFO: High entropy - model still explorin

{'session_overview': {'algorithm': 'PPO',
  'total_timesteps': 10000,
  'actual_timesteps': 9000,
  'duration_seconds': 3427.8327283859253,
  'avg_speed': 2.6255656892096537,
  'start_time': '2025-10-12 18:05:11',
  'end_time': '2025-10-12 18:05:11',
  'completed': True,
  'total_evaluations': 0},
 'performance_metrics': {},
 'loss_trends': {},
 'training_stability': {'status': 'insufficient_data'},
 'trading_performance': {'status': 'no_data'},
 'evaluation_progress': {'status': 'no_data'},
 'portfolio_analysis': {'status': 'no_data'},
 'recommendations': ['📈 Low clip fraction (<0.05) - could increase learning rate for faster learning',
  '📊 Moderate explained variance - value function making progress',
  '🎲 High entropy - model still exploring (good for early training)']}