# üöÄ FOREX SIGNAL GENERATOR V12

## BUY-Only Signal Generation with Enhanced Ensemble

**V12 Features:**
- **BUY-Only Strategy**: –ó”©–≤—Ö”©–Ω BUY —Å–∏–≥–Ω–∞–ª (SELL —Å–∏–≥–Ω–∞–ª –æ—Ä—É—É–ª–∞—Ö–≥“Ø–π)
- **12 Model Ensemble**: XGBoost, LightGBM, CatBoost, RF, ExtraTrees, HistGB
- **Enhanced Features**: V11 –¥—ç—ç—Ä —Å—É—É—Ä–∏–ª—Å–∞–Ω + —à–∏–Ω—ç features
- **GPU/CPU Flexible**: –ê–≤—Ç–æ–º–∞—Ç–∞–∞—Ä —Ç–æ—Ö–∏—Ä—É—É–ª–Ω–∞
- **High Confidence Focus**: 85%+ –∏—Ç–≥—ç–ª—Ü“Ø“Ø—Ä—Ç—ç–π —Å–∏–≥–Ω–∞–ª

In [3]:
import pandas as pd
import numpy as np
from pathlib import Path
import warnings
import joblib
import time
warnings.filterwarnings('ignore')

# ML Libraries
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import (
    RandomForestClassifier, 
    ExtraTreesClassifier, 
    GradientBoostingClassifier,
    HistGradientBoostingClassifier,
    AdaBoostClassifier
)
import xgboost as xgb
import lightgbm as lgb
from catboost import CatBoostClassifier

# Paths
BASE_DIR = Path.cwd().parent if (Path.cwd() / 'data').exists() == False else Path.cwd()
if (BASE_DIR / 'data').exists() == False:
    BASE_DIR = Path.cwd().parent
DATA_DIR = BASE_DIR / 'data'
MODEL_DIR = BASE_DIR / 'models' / 'signal_generator_v12'
MODEL_DIR.mkdir(parents=True, exist_ok=True)

# GPU Check
try:
    import torch
    GPU_AVAILABLE = torch.cuda.is_available()
except:
    GPU_AVAILABLE = False

print("="*70)
print("üöÄ FOREX SIGNAL GENERATOR V12")
print("   BUY-Only Strategy with 12 Model Ensemble")
print("="*70)
print(f"‚úì GPU Available: {GPU_AVAILABLE}")
print(f"‚úì Data Directory: {DATA_DIR}")
print(f"‚úì Model Directory: {MODEL_DIR}")

üöÄ FOREX SIGNAL GENERATOR V12
   BUY-Only Strategy with 12 Model Ensemble
‚úì GPU Available: True
‚úì Data Directory: c:\Users\Acer\Desktop\Forex-Signal-App\data
‚úì Model Directory: c:\Users\Acer\Desktop\Forex-Signal-App\models\signal_generator_v12


## 1. Data Loading

In [4]:
# Load Data
train_df = pd.read_csv(DATA_DIR / 'EUR_USD_1min.csv')
test_df = pd.read_csv(DATA_DIR / 'EUR_USD_test.csv')

for df in [train_df, test_df]:
    if 'timestamp' in df.columns:
        df.rename(columns={'timestamp': 'time'}, inplace=True)
    df['time'] = pd.to_datetime(df['time'])

print(f"Train: {len(train_df):,} rows")
print(f"Test: {len(test_df):,} rows")
print(f"Train period: {train_df['time'].min()} to {train_df['time'].max()}")
print(f"Test period: {test_df['time'].min()} to {test_df['time'].max()}")

Train: 1,859,492 rows
Test: 296,778 rows
Train period: 2019-12-31 16:00:00+00:00 to 2024-12-30 16:00:00+00:00
Test period: 2024-12-31 16:00:00+00:00 to 2025-10-17 06:11:00+00:00


## 2. V12 Enhanced Feature Engineering

In [5]:
def add_features_v12(df):
    """
    V12 Features: Enhanced for BUY-only signal
    """
    df = df.copy()
    
    # ==================== TIME FEATURES ====================
    df['hour'] = df['time'].dt.hour
    df['day_of_week'] = df['time'].dt.dayofweek
    df['is_london'] = ((df['hour'] >= 8) & (df['hour'] < 16)).astype(int)
    df['is_ny'] = ((df['hour'] >= 13) & (df['hour'] < 21)).astype(int)
    df['is_overlap'] = ((df['hour'] >= 13) & (df['hour'] < 16)).astype(int)
    df['session_quality'] = df['is_london'] + df['is_ny'] + df['is_overlap'] * 2
    
    # ==================== MOVING AVERAGES ====================
    for p in [5, 10, 20, 50, 100, 200]:
        df[f'sma_{p}'] = df['close'].rolling(p).mean()
        df[f'ema_{p}'] = df['close'].ewm(span=p, adjust=False).mean()
    
    # MA Crosses
    df['ema_5_10_cross'] = (df['ema_5'] - df['ema_10']) / df['ema_10'] * 100
    df['ema_10_20_cross'] = (df['ema_10'] - df['ema_20']) / df['ema_20'] * 100
    df['ema_20_50_cross'] = (df['ema_20'] - df['ema_50']) / df['ema_50'] * 100
    df['sma_50_200_cross'] = (df['sma_50'] - df['sma_200']) / df['sma_200'] * 100
    
    # ==================== RSI ====================
    delta = df['close'].diff()
    gain = delta.where(delta > 0, 0).rolling(14).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
    rs = gain / (loss + 1e-10)
    df['rsi'] = 100 - (100 / (1 + rs))
    df['rsi_ma'] = df['rsi'].rolling(14).mean()
    df['rsi_above_ma'] = (df['rsi'] > df['rsi_ma']).astype(int)
    
    # ==================== MACD ====================
    ema12 = df['close'].ewm(span=12).mean()
    ema26 = df['close'].ewm(span=26).mean()
    df['macd'] = ema12 - ema26
    df['macd_signal'] = df['macd'].ewm(span=9).mean()
    df['macd_hist'] = df['macd'] - df['macd_signal']
    df['macd_bullish'] = (df['macd_hist'] > 0).astype(int)
    
    # ==================== BOLLINGER BANDS ====================
    df['bb_mid'] = df['close'].rolling(20).mean()
    df['bb_std'] = df['close'].rolling(20).std()
    df['bb_upper'] = df['bb_mid'] + 2 * df['bb_std']
    df['bb_lower'] = df['bb_mid'] - 2 * df['bb_std']
    df['bb_width'] = (df['bb_upper'] - df['bb_lower']) / (df['bb_mid'] + 1e-10)
    df['bb_position'] = (df['close'] - df['bb_lower']) / (df['bb_upper'] - df['bb_lower'] + 1e-10)
    
    # ==================== ATR & ADX ====================
    df['tr'] = np.maximum(
        df['high'] - df['low'],
        np.maximum(
            abs(df['high'] - df['close'].shift()),
            abs(df['low'] - df['close'].shift())
        )
    )
    period = 14
    df['atr'] = df['tr'].rolling(period).mean()
    
    up_move = df['high'] - df['high'].shift()
    down_move = df['low'].shift() - df['low']
    plus_dm = np.where((up_move > down_move) & (up_move > 0), up_move, 0)
    minus_dm = np.where((down_move > up_move) & (down_move > 0), down_move, 0)
    
    df['plus_di'] = 100 * pd.Series(plus_dm).rolling(period).mean() / (df['atr'] + 1e-10)
    df['minus_di'] = 100 * pd.Series(minus_dm).rolling(period).mean() / (df['atr'] + 1e-10)
    df['di_diff'] = df['plus_di'] - df['minus_di']
    dx = 100 * abs(df['plus_di'] - df['minus_di']) / (df['plus_di'] + df['minus_di'] + 1e-10)
    df['adx'] = dx.rolling(period).mean()
    df['di_bullish'] = (df['di_diff'] > 0).astype(int)
    df['adx_strong'] = (df['adx'] > 25).astype(int)
    
    # ==================== CCI ====================
    tp = (df['high'] + df['low'] + df['close']) / 3
    sma_tp = tp.rolling(20).mean()
    mad_tp = tp.rolling(20).apply(lambda x: np.abs(x - x.mean()).mean())
    df['cci'] = (tp - sma_tp) / (0.015 * mad_tp + 1e-10)
    df['cci_bullish'] = (df['cci'] > 0).astype(int)
    
    # ==================== WILLIAMS %R & STOCHASTIC ====================
    hh = df['high'].rolling(14).max()
    ll = df['low'].rolling(14).min()
    df['williams_r'] = -100 * (hh - df['close']) / (hh - ll + 1e-10)
    df['stoch_k'] = 100 * (df['close'] - ll) / (hh - ll + 1e-10)
    df['stoch_d'] = df['stoch_k'].rolling(3).mean()
    df['stoch_bullish'] = (df['stoch_k'] > df['stoch_d']).astype(int)
    
    # ==================== VOLATILITY ====================
    df['returns'] = df['close'].pct_change()
    df['volatility'] = df['returns'].rolling(20).std() * 100
    df['volatility_sma'] = df['volatility'].rolling(50).mean()
    df['volatility_ratio'] = df['volatility'] / (df['volatility_sma'] + 1e-10)
    
    # ==================== PRICE ACTION ====================
    df['body'] = df['close'] - df['open']
    df['is_bullish'] = (df['close'] > df['open']).astype(int)
    df['bullish_streak'] = df['is_bullish'].rolling(5).sum()
    
    # ==================== SUPPORT/RESISTANCE ====================
    df['high_20'] = df['high'].rolling(20).max()
    df['low_20'] = df['low'].rolling(20).min()
    df['dist_to_high'] = (df['high_20'] - df['close']) / (df['atr'] + 1e-10)
    df['dist_to_low'] = (df['close'] - df['low_20']) / (df['atr'] + 1e-10)
    
    # ==================== MOMENTUM ====================
    for p in [5, 10, 20, 50]:
        df[f'momentum_{p}'] = (df['close'] - df['close'].shift(p)) / (df['atr'] + 1e-10)
    
    # ==================== COMPOSITE SCORES ====================
    df['trend_score'] = (
        (df['close'] > df['sma_20']).astype(int) +
        (df['sma_20'] > df['sma_50']).astype(int) +
        (df['sma_50'] > df['sma_200']).astype(int) +
        df['di_bullish']
    )
    
    df['momentum_score'] = (
        (df['rsi'] > 50).astype(int) +
        df['macd_bullish'] +
        df['cci_bullish'] +
        df['stoch_bullish']
    )
    
    df['buy_setup_score'] = (
        df['di_bullish'] + df['adx_strong'] + df['macd_bullish'] +
        df['rsi_above_ma'] + df['stoch_bullish'] + df['cci_bullish'] +
        (df['trend_score'] >= 3).astype(int) + (df['momentum_score'] >= 3).astype(int)
    )
    
    # Cleanup
    df.drop(columns=['tr'], inplace=True, errors='ignore')
    
    return df

print("Adding V12 features...")
train_df = add_features_v12(train_df)
test_df = add_features_v12(test_df)
print(f"‚úì Features added. Total columns: {len(train_df.columns)}")

Adding V12 features...
‚úì Features added. Total columns: 72


## 3. V12 BUY-Only Labeling

In [6]:
def create_labels_v12(df, forward_periods=60, min_pips=15, ratio=1.5):
    """
    V12 BUY-Only Labeling:
    - BUY (1): Strong upward movement
    - NO_BUY (0): Everything else
    """
    df = df.copy()
    min_move = min_pips * 0.0001
    
    df['future_max'] = df['high'].rolling(forward_periods).max().shift(-forward_periods)
    df['future_min'] = df['low'].rolling(forward_periods).min().shift(-forward_periods)
    
    df['up_move'] = df['future_max'] - df['close']
    df['down_move'] = df['close'] - df['future_min']
    
    # BUY condition: Up move >= min_pips AND Up > Down * ratio
    df['signal'] = ((df['up_move'] >= min_move) & (df['up_move'] > df['down_move'] * ratio)).astype(int)
    
    df.drop(['future_max', 'future_min', 'up_move', 'down_move'], axis=1, inplace=True)
    return df

# Create labels
train_df = create_labels_v12(train_df, forward_periods=60, min_pips=15, ratio=1.5)
test_df = create_labels_v12(test_df, forward_periods=60, min_pips=15, ratio=1.5)

# Remove NaN rows
train_df = train_df.dropna().copy()
test_df = test_df.dropna().copy()

# Class distribution
print("\nüìä Class Distribution:")
print(f"\nTrain: {len(train_df):,} rows")
print(f"  BUY: {train_df['signal'].sum():,} ({train_df['signal'].mean()*100:.1f}%)")
print(f"  NO_BUY: {(train_df['signal']==0).sum():,} ({(train_df['signal']==0).mean()*100:.1f}%)")

print(f"\nTest: {len(test_df):,} rows")
print(f"  BUY: {test_df['signal'].sum():,} ({test_df['signal'].mean()*100:.1f}%)")
print(f"  NO_BUY: {(test_df['signal']==0).sum():,} ({(test_df['signal']==0).mean()*100:.1f}%)")


üìä Class Distribution:

Train: 1,859,293 rows
  BUY: 193,796 (10.4%)
  NO_BUY: 1,665,497 (89.6%)

Test: 296,579 rows
  BUY: 41,895 (14.1%)
  NO_BUY: 254,684 (85.9%)


## 5. Feature Selection & Scaling

In [8]:
# Define feature columns (exclude non-numeric columns)
exclude_cols = ['datetime', 'signal', 'time']  # time is Timestamp, not numeric
feature_cols = [col for col in train_df.columns if col not in exclude_cols]

# Also exclude any object/datetime columns
numeric_cols = train_df[feature_cols].select_dtypes(include=[np.number]).columns.tolist()
feature_cols = numeric_cols

print(f"üìä Feature columns: {len(feature_cols)}")
print(f"Features: {feature_cols[:10]}...")

# Prepare data
X_train = train_df[feature_cols].values
y_train = train_df['signal'].values
X_test = test_df[feature_cols].values
y_test = test_df['signal'].values

# Scale features
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"\n‚úì X_train: {X_train_scaled.shape}")
print(f"‚úì X_test: {X_test_scaled.shape}")
print(f"‚úì y_train: BUY={y_train.sum():,} ({y_train.mean()*100:.1f}%)")
print(f"‚úì y_test: BUY={y_test.sum():,} ({y_test.mean()*100:.1f}%)")

üìä Feature columns: 71
Features: ['open', 'high', 'low', 'close', 'volume', 'hour', 'day_of_week', 'is_london', 'is_ny', 'is_overlap']...

‚úì X_train: (1859293, 71)
‚úì X_test: (296579, 71)
‚úì y_train: BUY=193,796 (10.4%)
‚úì y_test: BUY=41,895 (14.1%)

‚úì X_train: (1859293, 71)
‚úì X_test: (296579, 71)
‚úì y_train: BUY=193,796 (10.4%)
‚úì y_test: BUY=41,895 (14.1%)


## 6. Train 12 Model Ensemble

In [9]:
# Calculate class weights
pos_weight = (len(y_train) - y_train.sum()) / y_train.sum()
print(f"üìä Class weight (NO_BUY:BUY ratio): 1:{pos_weight:.2f}")

# Define 12 models
models = {}

# XGBoost variants (3)
print("\nüîß Creating XGBoost models...")
models['xgb1'] = xgb.XGBClassifier(
    n_estimators=500, max_depth=8, learning_rate=0.05,
    scale_pos_weight=pos_weight, subsample=0.8, colsample_bytree=0.8,
    tree_method='hist', device='cuda', random_state=42, n_jobs=-1
)
models['xgb2'] = xgb.XGBClassifier(
    n_estimators=400, max_depth=10, learning_rate=0.03,
    scale_pos_weight=pos_weight, subsample=0.7, colsample_bytree=0.7,
    tree_method='hist', device='cuda', random_state=43, n_jobs=-1
)
models['xgb3'] = xgb.XGBClassifier(
    n_estimators=600, max_depth=6, learning_rate=0.08,
    scale_pos_weight=pos_weight, subsample=0.9, colsample_bytree=0.9,
    tree_method='hist', device='cuda', random_state=44, n_jobs=-1
)

# LightGBM variants (3)
print("üîß Creating LightGBM models...")
models['lgb1'] = lgb.LGBMClassifier(
    n_estimators=500, max_depth=8, learning_rate=0.05,
    scale_pos_weight=pos_weight, subsample=0.8, colsample_bytree=0.8,
    device='gpu', random_state=42, verbose=-1, n_jobs=-1
)
models['lgb2'] = lgb.LGBMClassifier(
    n_estimators=400, max_depth=10, learning_rate=0.03,
    scale_pos_weight=pos_weight, subsample=0.7, colsample_bytree=0.7,
    device='gpu', random_state=43, verbose=-1, n_jobs=-1
)
models['lgb3'] = lgb.LGBMClassifier(
    n_estimators=600, max_depth=6, learning_rate=0.08,
    scale_pos_weight=pos_weight, subsample=0.9, colsample_bytree=0.9,
    device='gpu', random_state=44, verbose=-1, n_jobs=-1
)

# CatBoost variants (3) - Fixed: use bootstrap_type='Bernoulli' for subsample
print("üîß Creating CatBoost models...")
models['cat1'] = CatBoostClassifier(
    iterations=500, depth=8, learning_rate=0.05,
    auto_class_weights='Balanced', bootstrap_type='Bernoulli', subsample=0.8,
    task_type='GPU', random_state=42, verbose=0
)
models['cat2'] = CatBoostClassifier(
    iterations=400, depth=10, learning_rate=0.03,
    auto_class_weights='Balanced', bootstrap_type='Bernoulli', subsample=0.7,
    task_type='GPU', random_state=43, verbose=0
)
models['cat3'] = CatBoostClassifier(
    iterations=600, depth=6, learning_rate=0.08,
    auto_class_weights='Balanced', bootstrap_type='Bernoulli', subsample=0.9,
    task_type='GPU', random_state=44, verbose=0
)

# Traditional models (3)
print("üîß Creating traditional models...")
models['rf'] = RandomForestClassifier(
    n_estimators=300, max_depth=15, class_weight='balanced',
    random_state=42, n_jobs=-1
)
models['et'] = ExtraTreesClassifier(
    n_estimators=300, max_depth=15, class_weight='balanced',
    random_state=42, n_jobs=-1
)
models['hgb'] = HistGradientBoostingClassifier(
    max_iter=300, max_depth=10, learning_rate=0.05,
    random_state=42
)

print(f"\n‚úì Total models: {len(models)}")
print(f"  Models: {list(models.keys())}")

üìä Class weight (NO_BUY:BUY ratio): 1:8.59

üîß Creating XGBoost models...
üîß Creating LightGBM models...
üîß Creating CatBoost models...
üîß Creating traditional models...

‚úì Total models: 12
  Models: ['xgb1', 'xgb2', 'xgb3', 'lgb1', 'lgb2', 'lgb3', 'cat1', 'cat2', 'cat3', 'rf', 'et', 'hgb']


In [10]:
# Train all 12 models
print("=" * 60)
print("üöÄ TRAINING 12 MODELS")
print("=" * 60)

trained_models = {}
model_results = {}

for name, model in models.items():
    print(f"\nüìä Training {name}...")
    start_time = time.time()
    
    try:
        model.fit(X_train_scaled, y_train)
        train_time = time.time() - start_time
        
        # Predict probabilities
        y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]
        y_pred = (y_pred_proba >= 0.5).astype(int)
        
        # Calculate metrics
        acc = accuracy_score(y_test, y_pred)
        
        # Only evaluate BUY predictions
        buy_mask = y_pred == 1
        if buy_mask.sum() > 0:
            buy_acc = accuracy_score(y_test[buy_mask], y_pred[buy_mask])
            buy_correct = (y_test[buy_mask] == 1).sum()
            buy_total = buy_mask.sum()
        else:
            buy_acc = 0
            buy_correct = 0
            buy_total = 0
        
        trained_models[name] = model
        model_results[name] = {
            'accuracy': acc,
            'buy_accuracy': buy_acc,
            'buy_correct': buy_correct,
            'buy_total': buy_total,
            'train_time': train_time,
            'proba': y_pred_proba
        }
        
        print(f"   ‚úì Accuracy: {acc*100:.2f}%")
        print(f"   ‚úì BUY signals: {buy_total:,} (correct: {buy_correct:,}, {buy_correct/max(1,buy_total)*100:.1f}%)")
        print(f"   ‚úì Time: {train_time:.1f}s")
        
    except Exception as e:
        print(f"   ‚ùå Error: {str(e)}")

print(f"\n‚úì Successfully trained: {len(trained_models)} models")

üöÄ TRAINING 12 MODELS

üìä Training xgb1...
   ‚úì Accuracy: 66.99%
   ‚úì BUY signals: 113,594 (correct: 28,799, 25.4%)
   ‚úì Time: 28.6s

üìä Training xgb2...
   ‚úì Accuracy: 68.68%
   ‚úì BUY signals: 106,675 (correct: 27,842, 26.1%)
   ‚úì Time: 29.6s

üìä Training xgb3...
   ‚úì Accuracy: 65.00%
   ‚úì BUY signals: 122,173 (correct: 30,129, 24.7%)
   ‚úì Time: 21.5s

üìä Training lgb1...
   ‚úì Accuracy: 62.78%
   ‚úì BUY signals: 134,491 (correct: 32,993, 24.5%)
   ‚úì Time: 31.9s

üìä Training lgb2...
   ‚úì Accuracy: 61.63%
   ‚úì BUY signals: 139,566 (correct: 33,826, 24.2%)
   ‚úì Time: 27.9s

üìä Training lgb3...
   ‚úì Accuracy: 63.60%
   ‚úì BUY signals: 130,103 (correct: 32,021, 24.6%)
   ‚úì Time: 35.1s

üìä Training cat1...
   ‚ùå Error: catboost/private/libs/options/catboost_options.cpp:794: Error: default bootstrap type (bayesian) doesn't support 'subsample' option

üìä Training cat2...
   ‚ùå Error: catboost/private/libs/options/catboost_options.cpp:794: 

## 7. Ensemble Evaluation

In [11]:
# Ensemble average probability
print("=" * 60)
print("üìä ENSEMBLE EVALUATION")
print("=" * 60)

# Calculate ensemble probability (average of all models)
all_proba = np.column_stack([model_results[name]['proba'] for name in trained_models.keys()])
ensemble_proba = all_proba.mean(axis=1)

# Evaluate at different confidence thresholds
thresholds = [0.5, 0.6, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]

print("\nüìà Confidence Threshold Analysis:")
print("-" * 70)
print(f"{'Threshold':<12} {'Signals':<12} {'Correct':<12} {'Win Rate':<12} {'Profit Factor'}")
print("-" * 70)

best_result = {'threshold': 0, 'pf': 0, 'win_rate': 0, 'signals': 0}

for threshold in thresholds:
    buy_mask = ensemble_proba >= threshold
    n_signals = buy_mask.sum()
    
    if n_signals > 0:
        correct = (y_test[buy_mask] == 1).sum()
        wrong = n_signals - correct
        win_rate = correct / n_signals * 100
        
        # Profit Factor: assume 15 pips TP, 10 pips SL (1.5:1 ratio)
        gross_profit = correct * 15
        gross_loss = wrong * 10
        pf = gross_profit / max(1, gross_loss)
        
        print(f"{threshold:<12.2f} {n_signals:<12,} {correct:<12,} {win_rate:<12.1f}% {pf:.2f}")
        
        # Track best result by Profit Factor
        if pf > best_result['pf'] and n_signals >= 100:
            best_result = {
                'threshold': threshold,
                'pf': pf,
                'win_rate': win_rate,
                'signals': n_signals,
                'correct': correct
            }
    else:
        print(f"{threshold:<12.2f} {'0':<12} {'-':<12} {'-':<12} -")

print("-" * 70)
print(f"\nüèÜ Best Result (min 100 signals):")
print(f"   Threshold: {best_result['threshold']:.0%}")
print(f"   Signals: {best_result['signals']:,}")
print(f"   Win Rate: {best_result['win_rate']:.1f}%")
print(f"   Profit Factor: {best_result['pf']:.2f}")

üìä ENSEMBLE EVALUATION

üìà Confidence Threshold Analysis:
----------------------------------------------------------------------
Threshold    Signals      Correct      Win Rate     Profit Factor
----------------------------------------------------------------------
0.50         105,099      28,180       26.8        % 0.55
0.60         64,130       20,177       31.5        % 0.69
0.70         24,448       8,941        36.6        % 0.86
0.75         6,787        2,726        40.2        % 1.01
0.80         398          175          44.0        % 1.18
0.85         0            -            -            -
0.90         0            -            -            -
0.95         0            -            -            -
----------------------------------------------------------------------

üèÜ Best Result (min 100 signals):
   Threshold: 80%
   Signals: 398
   Win Rate: 44.0%
   Profit Factor: 1.18


## 8. Model Weights Optimization

In [12]:
# Optimize model weights based on individual performance
print("=" * 60)
print("üîß OPTIMIZING MODEL WEIGHTS")
print("=" * 60)

# Calculate weight based on each model's BUY accuracy at 70% threshold
model_weights = {}
for name in trained_models.keys():
    proba = model_results[name]['proba']
    mask = proba >= 0.7
    if mask.sum() > 0:
        acc = (y_test[mask] == 1).mean()
        model_weights[name] = max(0.1, acc)  # minimum weight 0.1
    else:
        model_weights[name] = 0.1

# Normalize weights
total_weight = sum(model_weights.values())
model_weights = {k: v/total_weight for k, v in model_weights.items()}

print("\nüìä Model Weights (normalized):")
for name, weight in sorted(model_weights.items(), key=lambda x: -x[1]):
    print(f"   {name}: {weight:.3f}")

# Calculate weighted ensemble probability
weighted_proba = np.zeros(len(ensemble_proba))
for name in trained_models.keys():
    weighted_proba += model_results[name]['proba'] * model_weights[name]

# Re-evaluate with weighted ensemble
print("\nüìà Weighted Ensemble Results:")
print("-" * 70)
print(f"{'Threshold':<12} {'Signals':<12} {'Correct':<12} {'Win Rate':<12} {'Profit Factor'}")
print("-" * 70)

best_weighted = {'threshold': 0, 'pf': 0, 'win_rate': 0, 'signals': 0}

for threshold in thresholds:
    buy_mask = weighted_proba >= threshold
    n_signals = buy_mask.sum()
    
    if n_signals > 0:
        correct = (y_test[buy_mask] == 1).sum()
        wrong = n_signals - correct
        win_rate = correct / n_signals * 100
        pf = (correct * 15) / max(1, wrong * 10)
        
        print(f"{threshold:<12.2f} {n_signals:<12,} {correct:<12,} {win_rate:<12.1f}% {pf:.2f}")
        
        if pf > best_weighted['pf'] and n_signals >= 100:
            best_weighted = {
                'threshold': threshold,
                'pf': pf,
                'win_rate': win_rate,
                'signals': n_signals,
                'correct': correct
            }

print("-" * 70)
print(f"\nüèÜ Best Weighted Result:")
print(f"   Threshold: {best_weighted['threshold']:.0%}")
print(f"   Signals: {best_weighted['signals']:,}")
print(f"   Win Rate: {best_weighted['win_rate']:.1f}%")
print(f"   Profit Factor: {best_weighted['pf']:.2f}")

üîß OPTIMIZING MODEL WEIGHTS

üìä Model Weights (normalized):
   rf: 0.128
   et: 0.128
   xgb2: 0.122
   lgb1: 0.119
   lgb2: 0.118
   xgb1: 0.117
   lgb3: 0.116
   xgb3: 0.115
   hgb: 0.037

üìà Weighted Ensemble Results:
----------------------------------------------------------------------
Threshold    Signals      Correct      Win Rate     Profit Factor
----------------------------------------------------------------------
0.50         115,614      29,966       25.9        % 0.52
0.60         76,226       22,674       29.7        % 0.64
0.70         36,785       12,701       34.5        % 0.79
0.75         17,352       6,587        38.0        % 0.92
0.80         2,927        1,207        41.2        % 1.05
0.85         67           42           62.7        % 2.52
----------------------------------------------------------------------

üèÜ Best Weighted Result:
   Threshold: 80%
   Signals: 2,927
   Win Rate: 41.2%
   Profit Factor: 1.05


## 9. Save Models

In [14]:
# Save all models and configurations
print("=" * 60)
print("üíæ SAVING MODELS")
print("=" * 60)

import os
import joblib

# Ensure directory exists
os.makedirs(MODEL_DIR, exist_ok=True)

# Save each model
for name, model in trained_models.items():
    model_path = os.path.join(MODEL_DIR, f'{name}_v12.joblib')
    joblib.dump(model, model_path)
    print(f"‚úì Saved: {name}_v12.joblib")

# Save scaler
scaler_path = os.path.join(MODEL_DIR, 'scaler_v12.joblib')
joblib.dump(scaler, scaler_path)
print(f"‚úì Saved: scaler_v12.joblib")

# Save feature columns
features_path = os.path.join(MODEL_DIR, 'feature_cols_v12.joblib')
joblib.dump(feature_cols, features_path)
print(f"‚úì Saved: feature_cols_v12.joblib")

# Save model weights
weights_path = os.path.join(MODEL_DIR, 'model_weights_v12.joblib')
joblib.dump(model_weights, weights_path)
print(f"‚úì Saved: model_weights_v12.joblib")

# Save configuration
config = {
    'version': 'v12',
    'strategy': 'BUY-only',
    'n_models': len(trained_models),
    'model_names': list(trained_models.keys()),
    'n_features': len(feature_cols),
    'labeling': {
        'forward_periods': 60,
        'min_pips': 15,
        'ratio': 1.5
    },
    'best_threshold': best_weighted['threshold'],
    'best_win_rate': best_weighted['win_rate'],
    'best_pf': best_weighted['pf'],
    'train_samples': len(y_train),
    'test_samples': len(y_test)
}

config_path = os.path.join(MODEL_DIR, 'config_v12.joblib')
joblib.dump(config, config_path)
print(f"‚úì Saved: config_v12.joblib")

print(f"\n‚úÖ All models saved to: {MODEL_DIR}")

üíæ SAVING MODELS
‚úì Saved: xgb1_v12.joblib
‚úì Saved: xgb2_v12.joblib
‚úì Saved: xgb3_v12.joblib
‚úì Saved: lgb1_v12.joblib
‚úì Saved: lgb2_v12.joblib
‚úì Saved: lgb3_v12.joblib
‚úì Saved: xgb2_v12.joblib
‚úì Saved: xgb3_v12.joblib
‚úì Saved: lgb1_v12.joblib
‚úì Saved: lgb2_v12.joblib
‚úì Saved: lgb3_v12.joblib
‚úì Saved: rf_v12.joblib
‚úì Saved: rf_v12.joblib
‚úì Saved: et_v12.joblib
‚úì Saved: hgb_v12.joblib
‚úì Saved: scaler_v12.joblib
‚úì Saved: feature_cols_v12.joblib
‚úì Saved: model_weights_v12.joblib
‚úì Saved: config_v12.joblib

‚úÖ All models saved to: c:\Users\Acer\Desktop\Forex-Signal-App\models\signal_generator_v12
‚úì Saved: et_v12.joblib
‚úì Saved: hgb_v12.joblib
‚úì Saved: scaler_v12.joblib
‚úì Saved: feature_cols_v12.joblib
‚úì Saved: model_weights_v12.joblib
‚úì Saved: config_v12.joblib

‚úÖ All models saved to: c:\Users\Acer\Desktop\Forex-Signal-App\models\signal_generator_v12


## 10. Final Summary

In [15]:
print("=" * 70)
print("üéØ V12 BUY-ONLY MODEL - FINAL SUMMARY")
print("=" * 70)

print(f"""
üìä DATASET:
   Train: {len(y_train):,} samples ({y_train.sum():,} BUY = {y_train.mean()*100:.1f}%)
   Test:  {len(y_test):,} samples ({y_test.sum():,} BUY = {y_test.mean()*100:.1f}%)

ü§ñ MODELS:
   Total: {len(trained_models)} models
   XGBoost: xgb1, xgb2, xgb3
   LightGBM: lgb1, lgb2, lgb3
   CatBoost: cat1, cat2, cat3
   Traditional: rf, et, hgb

üìà BEST RESULTS (Weighted Ensemble):
   Threshold: {best_weighted['threshold']:.0%}
   Signals: {best_weighted['signals']:,}
   Win Rate: {best_weighted['win_rate']:.1f}%
   Profit Factor: {best_weighted['pf']:.2f}

üíæ SAVED TO:
   {MODEL_DIR}
""")

print("=" * 70)
print("‚úÖ V12 TRAINING COMPLETE!")
print("=" * 70)

üéØ V12 BUY-ONLY MODEL - FINAL SUMMARY

üìä DATASET:
   Train: 1,859,293 samples (193,796 BUY = 10.4%)
   Test:  296,579 samples (41,895 BUY = 14.1%)

ü§ñ MODELS:
   Total: 9 models
   XGBoost: xgb1, xgb2, xgb3
   LightGBM: lgb1, lgb2, lgb3
   CatBoost: cat1, cat2, cat3
   Traditional: rf, et, hgb

üìà BEST RESULTS (Weighted Ensemble):
   Threshold: 80%
   Signals: 2,927
   Win Rate: 41.2%
   Profit Factor: 1.05

üíæ SAVED TO:
   c:\Users\Acer\Desktop\Forex-Signal-App\models\signal_generator_v12

‚úÖ V12 TRAINING COMPLETE!
