# 🚀 Simple Model Debug - Train & Test

**Simple workflow:**
1. **Train** model on `/data/` (BTCUSDT)
2. **Test** model on `/data_test/` with plotly visualization

In [1]:
# ================================================
# 🔧 SETUP - Add src to Python Path
# ================================================

import sys
import os

# Add src directory to Python path so 'core' module can be found
project_root = os.getcwd()
src_path = os.path.join(project_root, 'src')

if src_path not in sys.path:
    sys.path.insert(0, src_path)
    print(f"✅ Added to Python path: {src_path}")
else:
    print(f"✅ Already in path: {src_path}")

# Verify
print(f"📂 Working directory: {project_root}")
print(f"🔍 Python will search for modules in: {src_path}")
print("=" * 50)

✅ Added to Python path: d:\Dev\trading-bot\src
📂 Working directory: d:\Dev\trading-bot
🔍 Python will search for modules in: d:\Dev\trading-bot\src


In [2]:
# ================================================
# 🎓 TRAIN MODEL (using /data/ BTCUSDT)
# ================================================


print("🎓 TRAINING MODEL WITH MEMORY & BOUNCE FEATURES")
print("=" * 50)

# Training files - using absolute paths to ensure they're found
import os

from src.training.model_trainer import SimpleModelTrainer

data_folder = os.path.join(os.getcwd(), 'data')
print(f"📁 Data folder: {data_folder}")

training_files = {
    '15m': os.path.join(data_folder, 'BTCUSDT-15m.json'),
    '1h': os.path.join(data_folder, 'BTCUSDT-1h.json'), 
    'M': os.path.join(data_folder, 'BTCUSDT-M.json'),
    'W': os.path.join(data_folder, 'BTCUSDT-W.json'),
    'D': os.path.join(data_folder, 'BTCUSDT-D.json')
}

# Check files exist
print("\n📋 Checking training files:")
for tf, path in training_files.items():
    exists = "✅" if os.path.exists(path) else "❌"
    print(f"   {tf}: {exists} {path}")
    if not os.path.exists(path):
        print(f"      🔍 File not found: {path}")

# Only proceed if we have the essential files
missing_files = [tf for tf, path in training_files.items() if not os.path.exists(path)]
if missing_files:
    print(f"\n❌ Missing required files: {missing_files}")
    print("   Please ensure all BTCUSDT data files are in the 'data' folder")
else:
    print("\n✅ All training files found!")
    
    # Initialize trainer
    trainer = SimpleModelTrainer()

    # 🧠 CONFIGURE MEMORY & BOUNCE DETECTION
    print(f"\n🧠 MEMORY & BOUNCE CONFIGURATION:")
    print(f"   📚 Trade Memory: {trainer.enable_memory_features} (tracks win/loss patterns)")
    print(f"   🎯 Bounce Detection: {trainer.enable_bounce_detection} (identifies support/resistance bounces)")
    print(f"   💾 Memory Size: {trainer.trade_memory.max_memory} trades max")
    print(f"   📊 Current Memory: {len(trainer.trade_memory.trades)} trades loaded")
    
    # Show current memory stats
    recent_perf = trainer.trade_memory.get_recent_performance()
    bounce_perf = trainer.trade_memory.get_bounce_performance()
    consecutive = trainer.trade_memory.get_consecutive_performance()
    
    print(f"\n📈 CURRENT MEMORY STATS:")
    print(f"   Win Rate: {recent_perf['win_rate']:.1%} (last 7 days)")
    print(f"   Avg PnL: {recent_perf['avg_pnl']:.2f}%")
    print(f"   Bounce Win Rate: {bounce_perf['bounce_win_rate']:.1%}")
    print(f"   Consecutive: {consecutive} {'wins' if consecutive > 0 else 'losses' if consecutive < 0 else 'neutral'}")

    # 🎯 BALANCED TRAINING CONFIGURATION - More Realistic Thresholds
    trainer.configure_training(
        profit_threshold=3.0,     # Reduced from 10.0% to 3.0% for more BUY signals
        loss_threshold=-2.0,      # More balanced risk/reward ratio
        lookforward_periods=[5, 10, 20]
    )

    print(f"\n📊 Training Configuration:")
    print(f"   Profit threshold: {trainer.profit_threshold}% (BALANCED - was 10%)")
    print(f"   Loss threshold: {trainer.loss_threshold}% (BALANCED - was -1%)")
    print(f"   Lookforward periods: {trainer.lookforward_periods}")
    print(f"   🚀 GPU Training: Enabled with 8 threads (optimized for your 8-core system)")
    print(f"   ⚡ Parallel Processing: Data prep + XGBoost optimization")
    print(f"   🧠 Memory Features: {10} additional features from trade history")
    print(f"   🎯 Expected: More balanced BUY/SELL/HOLD distribution")

    # Train the model
    success = trainer.train_model(
        training_files=training_files,
        level_timeframes=['M', 'W', 'D', '1h']
    )

    if success:
        # Get model info
        info = trainer.get_model_info()
        print(f"\n🎉 TRAINING SUCCESS!")
        print(f"   Model Type: {info['model_type']}")
        print(f"   Accuracy: {info['accuracy']:.1%}")
        print(f"   Features: {info['features']} (includes {10} memory features)")
        print(f"   Classes: {info['classes']}")
    else:
        print("❌ Training failed!")
        trainer = None

🎓 TRAINING MODEL WITH MEMORY & BOUNCE FEATURES
📁 Data folder: d:\Dev\trading-bot\data

📋 Checking training files:
   15m: ✅ d:\Dev\trading-bot\data\BTCUSDT-15m.json
   1h: ✅ d:\Dev\trading-bot\data\BTCUSDT-1h.json
   M: ✅ d:\Dev\trading-bot\data\BTCUSDT-M.json
   W: ✅ d:\Dev\trading-bot\data\BTCUSDT-W.json
   D: ✅ d:\Dev\trading-bot\data\BTCUSDT-D.json

✅ All training files found!

🧠 MEMORY & BOUNCE CONFIGURATION:
   📚 Trade Memory: True (tracks win/loss patterns)
   🎯 Bounce Detection: True (identifies support/resistance bounces)
   💾 Memory Size: 1000 trades max
   📊 Current Memory: 0 trades loaded

📈 CURRENT MEMORY STATS:
   Win Rate: 50.0% (last 7 days)
   Avg PnL: 0.00%
   Bounce Win Rate: 60.0%
   Consecutive: 0 neutral

📊 Training Configuration:
   Profit threshold: 3.0% (BALANCED - was 10%)
   Loss threshold: -2.0% (BALANCED - was -1%)
   Lookforward periods: [5, 10, 20]
   🚀 GPU Training: Enabled with 8 threads (optimized for your 8-core system)
   ⚡ Parallel Processing: Data 

In [3]:
# ================================================
# 🔴 LIVE DATA SIMULATION - Process Candles One-by-One
# ================================================

print("🔴 LIVE DATA SIMULATION MODE")
print("=" * 55)

import pandas as pd

from src.prediction.predictor import SimpleModelPredictor
from src.prediction.reporter import SimpleModelReporter

# Load model from file (independent from training cell)
print("📁 Loading model from file...")
simulation_trainer = SimpleModelTrainer()

try:
    success = simulation_trainer.load_model()
    if not success:
        print("❌ Failed to load model from file")
        print("   Make sure you've trained the model first (run Cell 3)")
        simulation_trainer = None
except Exception as e:
    print(f"❌ Error loading model: {e}")
    import traceback
    traceback.print_exc()
    simulation_trainer = None

if simulation_trainer is not None and simulation_trainer.is_trained:
    print("✅ Model loaded successfully!")
    
    # Display model info
    info = simulation_trainer.get_model_info()
    print(f"   Model: {info['model_type']} | Accuracy: {info['accuracy']:.1%} | Features: {info['features']}")
    
    # Configuration
    test_symbol = 'BTCUSDT'
    MAX_CANDLES = 250  # Number of candles to simulate
    test_data_folder = 'data_test'
    
    # Threshold settings
    buy_threshold = 0.10
    sell_threshold = 0.10
    
    print(f"\n🎯 Symbol: {test_symbol}")
    print(f"📊 Simulating: {MAX_CANDLES} candles (live mode)")
    print(f"🔄 Processing: One candle at a time with cumulative history")
    print(f"⚙️  Thresholds: BUY≥{buy_threshold:.0%}, SELL≥{sell_threshold:.0%}\n")
    
    # Create predictor instance
    predictor = SimpleModelPredictor(simulation_trainer)
    
    # Get symbol files using the predictor's method
    test_files = predictor._get_symbol_files(test_symbol, test_data_folder)
    
    if not test_files or '15m' not in test_files:
        print(f"❌ No 15m data file found for {test_symbol}")
    else:
        # Load 15m data using predictor's method
        test_data = predictor._load_json_data(test_files['15m'])
        
        if test_data is None:
            print("❌ Could not load test data")
        else:
            print(f"✅ Loaded {len(test_data)} total candles")
            print(f"📅 Range: {test_data['datetime'].min()} to {test_data['datetime'].max()}")
            
            # Get the last MAX_CANDLES for simulation
            if len(test_data) < MAX_CANDLES:
                print(f"⚠️  Only {len(test_data)} candles available, using all")
                simulation_candles = test_data
            else:
                simulation_candles = test_data.tail(MAX_CANDLES).reset_index(drop=True)
            
            print(f"\n🎬 Starting live simulation with {len(simulation_candles)} candles...")
            print(f"📍 Simulation period: {simulation_candles['datetime'].min()} to {simulation_candles['datetime'].max()}")
            
            # Load levels once (using predictor's trader)
            level_files = {tf: path for tf, path in test_files.items() if tf in ['M', 'W', 'D', '1h']}
            
            if level_files:
                print(f"📊 Loading levels from: {list(level_files.keys())}")
                success = predictor.trader.update_levels(level_files, force_update=True)
                if success:
                    total_levels = sum(len(levels) for levels in predictor.trader.current_levels.values())
                    print(f"✅ Loaded {total_levels} support/resistance levels\n")
                else:
                    print("⚠️  Failed to load levels\n")
            
            print("=" * 55)
            print("🔴 LIVE SIMULATION STARTING...")
            print("=" * 55)
            
            # Get model components
            model_data = simulation_trainer.model_data
            trained_model = model_data['model']
            label_encoder = model_data['label_encoder']
            feature_columns = model_data['feature_columns']
            
            # Initialize results storage
            live_predictions = []
            
            # Process each candle one by one
            for i in range(len(simulation_candles)):
                row = simulation_candles.iloc[i]
                
                # Cumulative history: all candles up to and including current
                historical_data = simulation_candles.iloc[:i+1]
                
                try:
                    current_price = float(row['close'])
                    current_volume = float(row['volume'])
                    
                    # Create features using predictor's trader feature engineer
                    features = predictor.trader.feature_engineer.create_level_features(
                        current_price, current_volume, predictor.trader.current_levels
                    )
                    
                    # Add memory features if enabled
                    if simulation_trainer.enable_memory_features:
                        recent_perf = simulation_trainer.trade_memory.get_recent_performance()
                        bounce_perf = simulation_trainer.trade_memory.get_bounce_performance()
                        consecutive = simulation_trainer.trade_memory.get_consecutive_performance()
                        
                        features.update({
                            'memory_win_rate': recent_perf['win_rate'],
                            'memory_avg_pnl': recent_perf['avg_pnl'],
                            'memory_total_trades': recent_perf['total_trades'],
                            'bounce_win_rate': bounce_perf['bounce_win_rate'],
                            'bounce_avg_pnl': bounce_perf['bounce_avg_pnl'],
                            'bounce_trade_count': bounce_perf['bounce_trades'],
                            'consecutive_wins': max(0, consecutive),
                            'consecutive_losses': max(0, -consecutive),
                            'market_volatility_regime': 0.5,
                            'trend_strength': 0.0,
                        })
                    
                    # Convert to DataFrame and align with training features
                    feature_df = pd.DataFrame([features])
                    
                    # Ensure exact feature match with training
                    for col in feature_columns:
                        if col not in feature_df.columns:
                            feature_df[col] = 0.0
                    feature_df = feature_df[feature_columns]
                    
                    # Make prediction
                    probabilities = trained_model.predict_proba(feature_df)[0]
                    classes = label_encoder.classes_
                    prob_dict = {classes[j]: probabilities[j] for j in range(len(classes))}
                    
                    buy_prob = prob_dict.get('buy', 0)
                    sell_prob = prob_dict.get('sell', 0)
                    hold_prob = prob_dict.get('hold', 0)
                    
                    # Apply thresholds
                    if buy_prob > buy_threshold and buy_prob > sell_prob:
                        final_prediction = 'BUY'
                        final_confidence = buy_prob
                    elif sell_prob > sell_threshold and sell_prob > buy_prob:
                        final_prediction = 'SELL'
                        final_confidence = sell_prob
                    else:
                        final_prediction = 'HOLD'
                        final_confidence = hold_prob
                    
                    # Store result
                    live_predictions.append({
                        'candle_index': i,
                        'datetime': row['datetime'],
                        'open': row['open'],
                        'high': row['high'],
                        'low': row['low'],
                        'close': current_price,
                        'volume': current_volume,
                        'prediction': final_prediction,
                        'confidence': final_confidence,
                        'buy_prob': buy_prob,
                        'sell_prob': sell_prob,
                        'hold_prob': hold_prob,
                        'historical_candles': len(historical_data)
                    })
                    
                    # Print live update
                    print(f"📊 Processed {i+1}/{len(simulation_candles)} candles | " +
                              f"Latest: {final_prediction} @ ${current_price:.2f} ({final_confidence:.1%})")
                
                except Exception as e:
                    print(f"❌ Error processing candle {i}: {e}")
                    import traceback
                    traceback.print_exc()
                    continue
            
            # Convert results to DataFrame
            results_df = pd.DataFrame(live_predictions)
            
            print("\n" + "=" * 55)
            print("✅ LIVE SIMULATION COMPLETE")
            print("=" * 55)
            
            # Print summary statistics
            print(f"\n📊 SIMULATION RESULTS:")
            print(f"   Total Candles Processed: {len(results_df)}")
            
            signal_counts = results_df['prediction'].value_counts()
            print(f"\n🎯 SIGNAL DISTRIBUTION:")
            for signal in ['BUY', 'SELL', 'HOLD']:
                count = signal_counts.get(signal, 0)
                pct = (count / len(results_df) * 100) if len(results_df) > 0 else 0
                print(f"   {signal}: {count} ({pct:.1f}%)")
            
            # High confidence signals
            high_conf_buy = results_df[(results_df['prediction'] == 'BUY') & (results_df['confidence'] >= 0.6)]
            high_conf_sell = results_df[(results_df['prediction'] == 'SELL') & (results_df['confidence'] >= 0.6)]
            
            print(f"\n⭐ HIGH CONFIDENCE SIGNALS (≥60%):")
            print(f"   BUY: {len(high_conf_buy)}")
            print(f"   SELL: {len(high_conf_sell)}")
            
            # Show all trading signals (non-HOLD)
            trade_signals = results_df[results_df['prediction'] != 'HOLD']
            print(f"\n📋 ALL TRADING SIGNALS ({len(trade_signals)} total):")
            if len(trade_signals) > 0:
                for _, row in trade_signals.iterrows():
                    emoji = "🟢" if row['prediction'] == 'BUY' else "🔴"
                    print(f"   {emoji} {row['datetime'].strftime('%Y-%m-%d %H:%M')} | " +
                          f"{row['prediction']:4s} @ ${row['close']:8.2f} | " +
                          f"Conf: {row['confidence']:5.1%} | " +
                          f"[B:{row['buy_prob']:.1%} S:{row['sell_prob']:.1%} H:{row['hold_prob']:.1%}]")
            else:
                print("   No trading signals generated")
            
            # Average confidence by signal type
            print(f"\n📈 AVERAGE CONFIDENCE BY SIGNAL:")
            for signal in ['BUY', 'SELL', 'HOLD']:
                signal_data = results_df[results_df['prediction'] == signal]
                if len(signal_data) > 0:
                    avg_conf = signal_data['confidence'].mean()
                    print(f"   {signal}: {avg_conf:.1%}")
            
            print("\n✅ Simulation complete! Each candle was processed with full historical context.")
            
            # ================================================
            # 📊 VISUALIZE RESULTS - Using SimpleModelReporter
            # ================================================

            if len(results_df) > 0:
                
                print("📊 GENERATING COMPREHENSIVE REPORT WITH PLOTLY CHART...")
                print("=" * 55)
                
                # Prepare data in format expected by SimpleModelReporter
                # The reporter expects columns: datetime, open, high, low, close, volume, action, confidence, buy_prob, sell_prob, hold_prob, reasoning
                
                # Convert prediction column from 'BUY'/'SELL'/'HOLD' to 'buy'/'sell'/'hold' (lowercase)
                report_df = results_df.copy()
                report_df['action'] = report_df['prediction'].str.lower()
                
                # Add reasoning column (SimpleModelReporter expects this)
                report_df['reasoning'] = report_df.apply(
                    lambda row: f"Live Simulation: {row['prediction']} ({row['confidence']:.1%}) " +
                                f"[B:{row['buy_prob']:.1%} S:{row['sell_prob']:.1%} H:{row['hold_prob']:.1%}]",
                    axis=1
                )
                
                # Create reporter instance
                reporter = SimpleModelReporter(simulation_trainer)
                
                # Generate full report with interactive Plotly chart and detailed analysis
                # This uses all the existing logic from simple_model_trainer.py
                reporter.generate_full_report(
                    signals_df=report_df,
                    symbol=test_symbol,
                    buy_threshold=buy_threshold,
                    sell_threshold=sell_threshold,
                    aggressive_threshold=buy_threshold  # Using same threshold
                )
    
else:
    print("❌ No simulation results available to visualize")
    print("   Please run Cell 5 first to generate predictions")




🔴 LIVE DATA SIMULATION MODE
📁 Loading model from file...
📁 Loading model from: src/models/simple_trading_model.joblib
✅ Model loaded: XGBoost-GPU (Accuracy: 98.2%)
✅ Model loaded successfully!
   Model: XGBoost-GPU | Accuracy: 98.2% | Features: 60

🎯 Symbol: BTCUSDT
📊 Simulating: 250 candles (live mode)
🔄 Processing: One candle at a time with cumulative history
⚙️  Thresholds: BUY≥10%, SELL≥10%

✅ Loaded 87891 total candles
📅 Range: 2023-04-03 13:45:00 to 2025-10-05 02:15:00

🎬 Starting live simulation with 250 candles...
📍 Simulation period: 2025-10-02 12:00:00 to 2025-10-05 02:15:00
📊 Loading levels from: ['1h', 'D', 'W', 'M']
🔄 Updating multi-timeframe levels...
Volume Profile: Generated 867 volume profile ranges
Volume Profile: Generated 867 volume profile ranges
✅ Extracted 953 levels from D timeframe
  Period 2020-03: SKIPPED - insufficient data
Volume Profile: Generated 198 volume profile ranges
✅ Extracted 238 levels from W timeframe
Volume Profile: Generated 18 volume profile 


🔍 DETAILED ANALYSIS:
🤖 Model: XGBoost-GPU (Accuracy: 98.2%)
📊 Features: 60 | Classes: ['buy', 'hold', 'sell']
🎯 Symbol: BTCUSDT | Predictions: 250
⚙️  Thresholds: BUY≥10%, SELL≥10%, Aggressive≥10%

🔧 Feature Consistency Check:
   Training features: 60
   Key features: ['distance_to_support', 'distance_to_resistance', 'support_strength', 'resistance_strength', 'levels_within_0.1pct']...

📊 Signal Distribution:
   BUY: 0 (0.0%)
   SELL: 0 (0.0%)
   HOLD: 250 (100.0%)

🚀 Trade Opportunities: 0 signals (0.0% of time)

💪 High Confidence Predictions (>60%): 250
   2025-10-02 12:00: HOLD @ $118805.8000 (100.0%)
   2025-10-02 12:15: HOLD @ $119033.0000 (99.9%)
   2025-10-02 12:30: HOLD @ $119157.0000 (99.6%)
   2025-10-02 12:45: HOLD @ $119374.0000 (99.6%)
   2025-10-02 13:00: HOLD @ $119066.8000 (99.9%)
   ... and 245 more

🎯 Average Probabilities:
   BUY: 0.1% | SELL: 0.1% | HOLD: 99.8%

📈 Confidence Metrics:
   Average: 99.8% | Max: 100.0% | Min: 99.0%

✅ TESTING COMPLETE!
