# üí≥ Fraud Detection - Model Training (RAPIDS GPU-Accelerated)

This notebook trains multiple machine learning models for fraud detection using **RAPIDS cuML** for GPU acceleration where available.

**Models Trained:**
1. **Logistic Regression** - GPU-accelerated baseline (cuML) or CPU fallback
2. **Decision Tree** - Interpretable tree-based model (CPU - sklearn)
3. **Random Forest** - GPU-accelerated ensemble (cuML) or CPU fallback
4. **XGBoost** - GPU gradient boosting (`tree_method='gpu_hist'`)
5. **LightGBM** - Fast gradient boosting (CPU - no GPU support in sklearn API)
6. **Gradient Boosting** - Scikit-learn ensemble method (CPU)

**‚ö° GPU Acceleration:**
- **RAPIDS cuML** for Logistic Regression and Random Forest (10-100x faster)
- **XGBoost GPU** for gradient boosting (`gpu_hist`)
- **CPU fallback** for models without GPU support (Decision Tree, LightGBM, Gradient Boosting)

**Class Imbalance Handling:**
- SMOTE (Synthetic Minority Over-sampling) with 0.3 ratio
- Class weights balancing
- Stratified train-validation split

**Requirements:**
- NVIDIA GPU with CUDA support (optional - auto-fallback to CPU)
- RAPIDS cuML: `conda install -c rapidsai -c conda-forge -c nvidia rapids=23.10 python=3.10 cudatoolkit=11.8`

**Output:** Trained models saved as PKL files for evaluation

## 1. Setup and Check GPU Availability

In [None]:
import os
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from datetime import datetime
import time

warnings.filterwarnings('ignore')

# Check GPU availability
print("="*80)
print("GPU AVAILABILITY CHECK")
print("="*80)

try:
    import cudf
    import cuml
    import cupy as cp
    from cuml.linear_model import LogisticRegression as cuLogisticRegression
    from cuml.ensemble import RandomForestClassifier as cuRandomForestClassifier
    
    rapids_available = True
    print("‚úì RAPIDS cuML available")
    print(f"‚úì cuDF version: {cudf.__version__}")
    print(f"‚úì cuML version: {cuml.__version__}")
    
    # Check GPU
    gpu_count = cp.cuda.runtime.getDeviceCount()
    print(f"‚úì GPUs available: {gpu_count}")
    
    if gpu_count > 0:
        gpu_name = cp.cuda.runtime.getDeviceProperties(0)['name'].decode()
        gpu_mem = cp.cuda.runtime.getDeviceProperties(0)['totalGlobalMem'] / 1e9
        print(f"‚úì GPU 0: {gpu_name}")
        print(f"‚úì GPU Memory: {gpu_mem:.1f} GB")
        
except ImportError as e:
    rapids_available = False
    print("‚ùå RAPIDS not available")
    print("\nüì¶ Installation required:")
    print("conda create -n rapids-env -c rapidsai -c conda-forge -c nvidia rapids=23.10 python=3.10 cudatoolkit=11.8")
    print("\nFalling back to CPU training...")

# Standard ML libraries (fallback and non-GPU models)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

# XGBoost with GPU support
try:
    import xgboost as xgb
    xgboost_available = True
    print("‚úì XGBoost available")
except ImportError:
    xgboost_available = False
    print("‚ö† XGBoost not available. Install with: pip install xgboost")

# LightGBM
try:
    import lightgbm as lgb
    lightgbm_available = True
    print("‚úì LightGBM available")
except ImportError:
    lightgbm_available = False
    print("‚ö† LightGBM not available. Install with: pip install lightgbm")

# SMOTE for imbalance
from imblearn.over_sampling import SMOTE

# Metrics
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, confusion_matrix, classification_report
)

# Settings
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-darkgrid')

print("\n‚úì All libraries imported successfully")
print(f"\nüöÄ GPU Acceleration: {'ENABLED' if rapids_available else 'DISABLED (CPU mode)'}")
print("="*80)

## 2. Load Data (GPU-Accelerated with cuDF)

In [None]:
print("Loading preprocessed data from EDA...\n")

if rapids_available:
    # Load with cuDF for GPU acceleration
    print("üìä Loading data on GPU with cuDF...")
    train_data_gpu = cudf.read_csv('train_transaction_scaled.csv')
    
    print(f"‚úì Data loaded on GPU: {train_data_gpu.shape}")
    print(f"‚úì GPU memory usage: {train_data_gpu.memory_usage(deep=True).sum() / 1e6:.2f} MB")
    
    # Convert to pandas for compatibility with some operations
    train_data = train_data_gpu.to_pandas()
    
else:
    # CPU fallback
    print("üìä Loading data with pandas (CPU)...")
    train_data = pd.read_csv('train_transaction_scaled.csv')

print(f"\n‚úì Dataset shape: {train_data.shape}")
print(f"  Columns: {train_data.shape[1]}")
print(f"  Rows: {train_data.shape[0]:,}")

## 3. Prepare Features and Target

In [None]:
# Separate features and target
if 'isFraud' in train_data.columns:
    X = train_data.drop(columns=['isFraud'])
    y = train_data['isFraud']
else:
    raise ValueError("Target variable 'isFraud' not found in dataset")

# Remove ID columns if present
id_cols = ['TransactionID', 'TransactionDT']
X = X.drop(columns=[col for col in id_cols if col in X.columns], errors='ignore')

print(f"\nüìä Dataset Overview:")
print(f"   Features: {X.shape[1]}")
print(f"   Samples: {X.shape[0]:,}")
print(f"\nüéØ Target Distribution:")
print(f"   Not Fraud: {(y == 0).sum():,} ({(y == 0).sum()/len(y)*100:.2f}%)")
print(f"   Fraud: {(y == 1).sum():,} ({(y == 1).sum()/len(y)*100:.2f}%)")
print(f"   Imbalance Ratio: 1:{(y == 0).sum() // (y == 1).sum()}")

# Stratified train-validation split (80-20)
print(f"\nüìÇ Creating stratified train-validation split (80-20)...")
X_train, X_val, y_train, y_val = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42, 
    stratify=y
)

print(f"‚úì Train set: {X_train.shape[0]:,} samples")
print(f"‚úì Validation set: {X_val.shape[0]:,} samples")
print(f"\nTrain fraud rate: {y_train.mean()*100:.2f}%")
print(f"Validation fraud rate: {y_val.mean()*100:.2f}%")

## 4. Handle Class Imbalance with SMOTE

In [None]:
print(f"{'='*80}")
print("APPLYING SMOTE TO BALANCE CLASSES")
print(f"{'='*80}\n")

print("Before SMOTE:")
print(f"Training samples: {len(X_train):,}")
print(f"Fraud cases: {y_train.sum():,} ({(y_train.sum()/len(y_train))*100:.2f}%)")
print(f"Not Fraud cases: {(y_train == 0).sum():,} ({((y_train == 0).sum()/len(y_train))*100:.2f}%)")

# Apply SMOTE with optimal sampling strategy for fraud detection
# Using 0.3 ratio - fraud becomes 30% of majority (better than 50% for high imbalance)
smote = SMOTE(random_state=42, sampling_strategy=0.3, k_neighbors=5)
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)

print("\nAfter SMOTE:")
print(f"Training samples: {len(X_train_smote):,}")
print(f"Fraud cases: {y_train_smote.sum():,} ({(y_train_smote.sum()/len(y_train_smote))*100:.2f}%)")
print(f"Not Fraud cases: {(y_train_smote == 0).sum():,} ({((y_train_smote == 0).sum()/len(y_train_smote))*100:.2f}%)")

print("\n‚úì SMOTE applied successfully")
print("Note: Using 0.3 ratio for better generalization on highly imbalanced data")

# Visualize class distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Before SMOTE
axes[0].bar(['Not Fraud', 'Fraud'], [(y_train == 0).sum(), y_train.sum()], 
            color=['green', 'red'], edgecolor='black', alpha=0.7)
axes[0].set_title('Before SMOTE', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Count')
axes[0].grid(True, alpha=0.3, axis='y')
for i, v in enumerate([(y_train == 0).sum(), y_train.sum()]):
    axes[0].text(i, v, f'{v:,}', ha='center', va='bottom', fontweight='bold')

# After SMOTE
axes[1].bar(['Not Fraud', 'Fraud'], [(y_train_smote == 0).sum(), y_train_smote.sum()], 
            color=['green', 'red'], edgecolor='black', alpha=0.7)
axes[1].set_title('After SMOTE (30% Ratio)', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Count')
axes[1].grid(True, alpha=0.3, axis='y')
for i, v in enumerate([(y_train_smote == 0).sum(), y_train_smote.sum()]):
    axes[1].text(i, v, f'{v:,}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## 5. Model Training Helper Function

In [None]:
def train_and_evaluate_model(model, model_name, X_train, y_train, X_val, y_val, use_gpu=False):
    """
    Train a model and return comprehensive evaluation metrics including confusion matrix
    
    Parameters:
    - model: The model to train
    - model_name: Name for display
    - X_train, y_train: Training data
    - X_val, y_val: Validation data
    - use_gpu: Whether GPU is being used
    
    Returns:
    - Dictionary with model and metrics
    """
    print(f"\n{'='*80}")
    print(f"TRAINING: {model_name}")
    if use_gpu:
        print(f"üöÄ GPU ACCELERATION ENABLED")
    print(f"{'='*80}\n")
    
    # Train
    start_time = time.time()
    model.fit(X_train, y_train)
    training_time = time.time() - start_time
    
    print(f"‚úì Model trained in {training_time:.2f} seconds")
    
    # Predict
    y_pred = model.predict(X_val)
    
    # Get probabilities if available
    try:
        if hasattr(model, 'predict_proba'):
            y_pred_proba = model.predict_proba(X_val)[:, 1]
        elif hasattr(model, 'decision_function'):
            y_pred_proba = model.decision_function(X_val)
        else:
            y_pred_proba = None
    except:
        y_pred_proba = None
    
    # Calculate metrics
    accuracy = accuracy_score(y_val, y_pred)
    precision = precision_score(y_val, y_pred, zero_division=0)
    recall = recall_score(y_val, y_pred, zero_division=0)
    f1 = f1_score(y_val, y_pred, zero_division=0)
    cm = confusion_matrix(y_val, y_pred)
    
    print(f"\nüìä Validation Results:")
    print(f"   Accuracy:  {accuracy:.4f}")
    print(f"   Precision: {precision:.4f} (of predicted frauds, how many are correct)")
    print(f"   Recall:    {recall:.4f} (of actual frauds, how many we caught)")
    print(f"   F1-Score:  {f1:.4f} (harmonic mean of precision & recall)")
    
    if y_pred_proba is not None:
        try:
            roc_auc = roc_auc_score(y_val, y_pred_proba)
            print(f"   ROC-AUC:   {roc_auc:.4f}")
        except:
            roc_auc = None
    else:
        roc_auc = None
    
    print(f"\nüìà Confusion Matrix:")
    print(f"   TN: {cm[0,0]:6,}  |  FP: {cm[0,1]:6,}")
    print(f"   FN: {cm[1,0]:6,}  |  TP: {cm[1,1]:6,}")
    
    return {
        'model': model,
        'model_name': model_name,
        'training_time': training_time,
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'roc_auc': roc_auc,
        'confusion_matrix': cm,
        'y_pred': y_pred,
        'y_pred_proba': y_pred_proba,
        'gpu_accelerated': use_gpu
    }

print("‚úì Training function defined")

## 6. Model 1: Logistic Regression (GPU-Accelerated or CPU)

In [None]:
if rapids_available:
    # cuML GPU-accelerated Logistic Regression
    lr_model = cuLogisticRegression(
        max_iter=1000,
        solver='qn',  # Quasi-Newton (GPU optimized)
        verbose=0
    )
    
    # Convert to cuDF for GPU training
    X_train_gpu = cudf.DataFrame(X_train_smote)
    y_train_gpu = cudf.Series(y_train_smote)
    X_val_gpu = cudf.DataFrame(X_val)
    y_val_gpu = cudf.Series(y_val)
    
    lr_results = train_and_evaluate_model(
        lr_model, 
        'Logistic Regression (cuML GPU)', 
        X_train_gpu, y_train_gpu, 
        X_val_gpu, y_val_gpu.to_numpy(),
        use_gpu=True
    )
    
    # Save model (convert to CPU for compatibility)
    with open('model_logistic_regression_rapids.pkl', 'wb') as f:
        pickle.dump(lr_results['model'], f)
    print("\n‚úì Model saved: model_logistic_regression_rapids.pkl")
    
else:
    # CPU fallback
    lr_model = LogisticRegression(
        max_iter=1000,
        class_weight='balanced',
        random_state=42,
        n_jobs=-1
    )
    
    lr_results = train_and_evaluate_model(
        lr_model, 
        'Logistic Regression (CPU)', 
        X_train_smote, y_train_smote, 
        X_val, y_val
    )
    
    with open('model_logistic_regression_cpu.pkl', 'wb') as f:
        pickle.dump(lr_results['model'], f)
    print("\n‚úì Model saved: model_logistic_regression_cpu.pkl")

## 7. Model 2: Decision Tree (Interpretable - CPU Only)

In [None]:
# Decision Tree - Interpretable model with pruning to prevent overfitting
# Note: cuML doesn't have DecisionTree, using sklearn
dt_model = DecisionTreeClassifier(
    random_state=42,
    max_depth=15,  # Increased from 10 for better performance
    min_samples_split=50,  # Reduced for more splits
    min_samples_leaf=25,  # Minimum samples per leaf
    class_weight='balanced',
    criterion='gini'  # or 'entropy'
)

dt_results = train_and_evaluate_model(
    dt_model, "Decision Tree",
    X_train_smote, y_train_smote, X_val, y_val,
    use_gpu=False  # CPU only
)

print(f"\nTree Statistics:")
print(f"   Tree depth: {dt_model.get_depth()}")
print(f"   Number of leaves: {dt_model.get_n_leaves()}")

# Save model
with open('model_decision_tree_rapids.pkl', 'wb') as f:
    pickle.dump(dt_model, f)
print("\n‚úì Model saved: model_decision_tree_rapids.pkl")

## 8. Model 3: Random Forest (GPU-Accelerated or CPU)

In [None]:
if rapids_available:
    # cuML GPU-accelerated Random Forest
    rf_model = cuRandomForestClassifier(
        n_estimators=200,
        max_depth=20,
        max_features='sqrt',
        min_samples_split=10,
        random_state=42,
        n_streams=4  # GPU parallelism
    )
    
    # Use GPU data
    rf_results = train_and_evaluate_model(
        rf_model, 
        'Random Forest (cuML GPU)', 
        X_train_gpu, y_train_gpu, 
        X_val_gpu, y_val_gpu.to_numpy(),
        use_gpu=True
    )
    
    print(f"\nModel Statistics:")
    print(f"   Number of trees: {rf_model.n_estimators}")
    print(f"   Max features per split: sqrt({X_train.shape[1]}) = {int(np.sqrt(X_train.shape[1]))}")
    
    with open('model_random_forest_rapids.pkl', 'wb') as f:
        pickle.dump(rf_results['model'], f)
    print("\n‚úì Model saved: model_random_forest_rapids.pkl")
    
else:
    # CPU fallback - Best balance of performance and interpretability
    rf_model = RandomForestClassifier(
        n_estimators=200,  # More trees for better performance
        random_state=42,
        max_depth=20,  # Deeper trees
        min_samples_split=20,  # More aggressive splitting
        min_samples_leaf=10,
        max_features='sqrt',  # Good default for classification
        class_weight='balanced',
        n_jobs=-1,  # Use all CPU cores
        bootstrap=True
    )
    
    rf_results = train_and_evaluate_model(
        rf_model, 
        'Random Forest (CPU)', 
        X_train_smote, y_train_smote, 
        X_val, y_val,
        use_gpu=False
    )
    
    print(f"\nModel Statistics:")
    print(f"   Number of trees: {rf_model.n_estimators}")
    print(f"   Max features per split: sqrt({X_train.shape[1]}) = {int(np.sqrt(X_train.shape[1]))}")
    
    with open('model_random_forest_rapids.pkl', 'wb') as f:
        pickle.dump(rf_results['model'], f)
    print("\n‚úì Model saved: model_random_forest_rapids.pkl")

## 9. Model 4: XGBoost (GPU-Accelerated or CPU)

In [None]:
if xgboost_available:
    # XGBoost - Excellent for imbalanced classification
    # Calculate scale_pos_weight for imbalance
    scale_pos_weight = (y_train_smote == 0).sum() / (y_train_smote == 1).sum()
    
    if rapids_available:
        # XGBoost with GPU acceleration
        xgb_model = xgb.XGBClassifier(
            n_estimators=200,
            max_depth=10,
            learning_rate=0.1,
            subsample=0.8,  # Row sampling
            colsample_bytree=0.8,  # Column sampling
            gamma=0,  # Minimum loss reduction
            min_child_weight=3,
            scale_pos_weight=scale_pos_weight,  # Handle imbalance
            tree_method='gpu_hist',  # GPU acceleration
            predictor='gpu_predictor',  # GPU prediction
            gpu_id=0,
            random_state=42,
            eval_metric='logloss',
            use_label_encoder=False
        )
        
        xgb_results = train_and_evaluate_model(
            xgb_model, 
            'XGBoost (GPU)', 
            X_train_smote, y_train_smote, 
            X_val, y_val,
            use_gpu=True
        )
        
        print(f"\nModel Statistics:")
        print(f"   Number of boosting rounds: {xgb_model.n_estimators}")
        print(f"   Scale pos weight: {scale_pos_weight:.2f}")
        
        xgb_model.save_model('model_xgboost_rapids.json')
        print("\n‚úì Model saved: model_xgboost_rapids.json")
        
    else:
        # CPU fallback
        xgb_model = xgb.XGBClassifier(
            n_estimators=200,
            max_depth=10,
            learning_rate=0.1,
            subsample=0.8,  # Row sampling
            colsample_bytree=0.8,  # Column sampling
            gamma=0,  # Minimum loss reduction
            min_child_weight=3,
            scale_pos_weight=scale_pos_weight,  # Handle imbalance
            tree_method='hist',
            random_state=42,
            n_jobs=-1,
            eval_metric='logloss',
            use_label_encoder=False
        )
        
        xgb_results = train_and_evaluate_model(
            xgb_model, 
            'XGBoost (CPU)', 
            X_train_smote, y_train_smote, 
            X_val, y_val,
            use_gpu=False
        )
        
        print(f"\nModel Statistics:")
        print(f"   Number of boosting rounds: {xgb_model.n_estimators}")
        print(f"   Scale pos weight: {scale_pos_weight:.2f}")
        
        xgb_model.save_model('model_xgboost_rapids.json')
        print("\n‚úì Model saved: model_xgboost_rapids.json")
else:
    print("‚ö† XGBoost not available. Skipping...")
    xgb_results = None

## 10. Model 5: LightGBM (Fast Gradient Boosting - CPU Only)

In [None]:
if lightgbm_available:
    # LightGBM - Fast and efficient gradient boosting
    # Note: sklearn API doesn't support GPU, use CPU
    lgb_model = lgb.LGBMClassifier(
        n_estimators=200,
        max_depth=15,
        learning_rate=0.1,
        num_leaves=31,  # Should be < 2^max_depth
        subsample=0.8,
        colsample_bytree=0.8,
        min_child_samples=20,
        class_weight='balanced',
        random_state=42,
        n_jobs=-1,
        verbose=-1
    )
    
    lgb_results = train_and_evaluate_model(
        lgb_model, "LightGBM",
        X_train_smote, y_train_smote, X_val, y_val,
        use_gpu=False  # CPU only for sklearn API
    )
    
    print(f"\nModel Statistics:")
    print(f"   Number of boosting rounds: {lgb_model.n_estimators}")
    print(f"   Number of leaves: {lgb_model.num_leaves}")
    
    # Save model
    with open('model_lightgbm_rapids.pkl', 'wb') as f:
        pickle.dump(lgb_model, f)
    print("\n‚úì Model saved: model_lightgbm_rapids.pkl")
else:
    print("‚ö† LightGBM not available. Skipping...")
    lgb_results = None

## 11. Model 6: Gradient Boosting (Scikit-learn - CPU Only)

In [None]:
# Gradient Boosting - Powerful ensemble method
# Note: scikit-learn doesn't have GPU support
gb_model = GradientBoostingClassifier(
    n_estimators=200,
    max_depth=10,
    learning_rate=0.1,
    subsample=0.8,  # Stochastic gradient boosting
    min_samples_split=20,
    min_samples_leaf=10,
    max_features='sqrt',
    random_state=42,
    validation_fraction=0.1,  # For early stopping monitoring
    n_iter_no_change=10  # Early stopping
)

gb_results = train_and_evaluate_model(
    gb_model, "Gradient Boosting",
    X_train_smote, y_train_smote, X_val, y_val,
    use_gpu=False  # CPU only
)

print(f"\nModel Statistics:")
print(f"   Number of boosting stages: {gb_model.n_estimators}")
print(f"   Effective estimators used: {gb_model.n_estimators_}")

# Save model
with open('model_gradient_boosting_rapids.pkl', 'wb') as f:
    pickle.dump(gb_model, f)
print("\n‚úì Model saved: model_gradient_boosting_rapids.pkl")

## 12. Save All Models and Results

In [None]:
# Compile all results
all_results = [lr_results, dt_results, rf_results]

if xgboost_available and xgb_results:
    all_results.append(xgb_results)

if lightgbm_available and lgb_results:
    all_results.append(lgb_results)

all_results.append(gb_results)

# Create summary DataFrame
results_summary = pd.DataFrame([{
    'Model': r['model_name'],
    'GPU Accelerated': 'üöÄ Yes' if r['gpu_accelerated'] else 'No',
    'Training Time (s)': r['training_time'],
    'Accuracy': r['accuracy'],
    'Precision': r['precision'],
    'Recall': r['recall'],
    'F1-Score': r['f1'],
    'ROC-AUC': r['roc_auc'] if r['roc_auc'] else 'N/A'
} for r in all_results])

print(f"\n{'='*100}")
print("TRAINING SUMMARY - ALL MODELS")
print(f"{'='*100}\n")
print(results_summary.to_string(index=False))

# Save results
results_summary.to_csv('training_results_summary_rapids.csv', index=False)
print(f"\n‚úì Training summary saved: training_results_summary_rapids.csv")

# Save all results for evaluation notebook
with open('all_model_results_rapids.pkl', 'wb') as f:
    pickle.dump(all_results, f)
print(f"‚úì All model results saved: all_model_results_rapids.pkl")

# Save train/val split for evaluation
with open('train_val_split_rapids.pkl', 'wb') as f:
    pickle.dump({
        'X_train': X_train,
        'X_val': X_val,
        'y_train': y_train,
        'y_val': y_val,
        'X_train_smote': X_train_smote,
        'y_train_smote': y_train_smote
    }, f)
print(f"‚úì Train/val split saved: train_val_split_rapids.pkl")

## 13. Training Complete - Next Steps

In [None]:
print(f"\n{'='*100}")
print("‚úÖ RAPIDS GPU-ACCELERATED TRAINING COMPLETE!" if rapids_available else "‚úÖ MODEL TRAINING COMPLETE!")
print(f"{'='*100}\n")

print(f"üìä Trained {len(all_results)} models:")
for i, r in enumerate(all_results, 1):
    gpu_badge = 'üöÄ ' if r['gpu_accelerated'] else ''
    print(f"   {i}. {gpu_badge}{r['model_name']}")
    print(f"      ‚Ä¢ F1-Score: {r['f1']:.4f}")
    print(f"      ‚Ä¢ Recall: {r['recall']:.4f}")
    print(f"      ‚Ä¢ Training Time: {r['training_time']:.2f}s")

# Find best model by F1-score
best_model = max(all_results, key=lambda x: x['f1'])
print(f"\nüèÜ Best Model (by F1-Score): {best_model['model_name']}")
print(f"   F1-Score: {best_model['f1']:.4f}")
if best_model['gpu_accelerated']:
    print(f"   üöÄ GPU-Accelerated")

print(f"\nüìÅ Generated Files:")
print(f"   Models:")
print(f"   ‚Ä¢ model_logistic_regression_rapids.pkl")
print(f"   ‚Ä¢ model_decision_tree_rapids.pkl")
print(f"   ‚Ä¢ model_random_forest_rapids.pkl")
if xgboost_available:
    print(f"   ‚Ä¢ model_xgboost_rapids.json")
if lightgbm_available:
    print(f"   ‚Ä¢ model_lightgbm_rapids.pkl")
print(f"   ‚Ä¢ model_gradient_boosting_rapids.pkl")

print(f"\n   Results:")
print(f"   ‚Ä¢ training_results_summary_rapids.csv")
print(f"   ‚Ä¢ all_model_results_rapids.pkl")
print(f"   ‚Ä¢ train_val_split_rapids.pkl")

if rapids_available:
    gpu_count = sum(1 for r in all_results if r['gpu_accelerated'])
    print(f"\n‚ö° GPU ACCELERATION SUMMARY:")
    print(f"   ‚Ä¢ Models trained on GPU: {gpu_count}/{len(all_results)}")
    print(f"   ‚Ä¢ GPU-accelerated models: {', '.join([r['model_name'] for r in all_results if r['gpu_accelerated']])}")
    print(f"   ‚Ä¢ Expected speedup: 10-100x on large datasets")
    print(f"   ‚Ä¢ Memory efficiency: Improved with cuDF")

print(f"\nüí° Next Steps:")
print(f"   1. Open Model_Evaluation.ipynb for detailed model comparison")
print(f"   2. Analyze confusion matrices, ROC curves, and feature importance")
print(f"   3. Select the best model based on business requirements")
print(f"   4. Apply the best model to test_transaction.csv")

print(f"\n{'='*100}")
print(f"Proceed to Model_Evaluation.ipynb for comprehensive evaluation!")
print(f"{'='*100}")