# ü§ñ Model Training: Building Our Machine Learning Models

Welcome to the **Model Training** phase! üöÄ This is where we transform our prepared data into intelligent predictive models. Think of this as training a team of specialists, each with different strengths and approaches.

## üéØ What is Model Training?

Model training is like teaching different students the same subject using their preferred learning styles:
- **üìä Logistic Regression**: The mathematician who finds linear relationships
- **üå≥ Random Forest**: The committee that votes on decisions  
- **‚ö° XGBoost**: The iterative learner who learns from mistakes
- **üß† Neural Network**: The pattern recognizer with multiple layers

### üèÜ **Our Model Training Strategy:**

#### 1Ô∏è‚É£ **Multiple Algorithm Testing**
- Train 4-6 different algorithms
- Compare their strengths and weaknesses
- Find the best performer for our specific data

#### 2Ô∏è‚É£ **Proper Validation**
- Use cross-validation for robust evaluation
- Separate train/validation/test sets
- Ensure models generalize well

#### 3Ô∏è‚É£ **Hyperparameter Tuning**
- Optimize model settings for best performance
- Use systematic search methods
- Balance performance vs. overfitting

#### 4Ô∏è‚É£ **Imbalanced Data Handling**
- Test with balanced and original datasets
- Use appropriate evaluation metrics
- Apply class weights where beneficial

### üìö **What We'll Accomplish:**

#### üîß **Model Implementation**
- Logistic Regression (baseline linear model)
- Random Forest (ensemble tree method)
- Gradient Boosting (XGBoost)
- Support Vector Machine (SVM)

#### üìä **Performance Analysis**
- Accuracy, Precision, Recall, F1-Score
- ROC-AUC and Precision-Recall AUC
- Confusion matrices and classification reports
- Feature importance analysis

#### üéõÔ∏è **Optimization**
- Grid search for hyperparameters
- Cross-validation for robust evaluation
- Model comparison and selection

---

## üöÄ Ready to Train?

By the end of this notebook, you'll have:
- ‚úÖ **Trained multiple ML models** with proper validation
- ‚úÖ **Optimized hyperparameters** for best performance  
- ‚úÖ **Compared model performance** comprehensively
- ‚úÖ **Selected the best model** for your use case
- ‚úÖ **Understood feature importance** and model interpretability

Let's build some intelligent models! üèóÔ∏èü§ñ

In [15]:
# üì¶ Step 1: Import All Machine Learning Libraries
print("üì¶ IMPORTING MACHINE LEARNING LIBRARIES...")
print("="*42)

# Core data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set up beautiful visualizations
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

# Machine Learning Models
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

# XGBoost - Advanced gradient boosting
try:
    import xgboost as xgb
    from xgboost import XGBClassifier
    print("‚úÖ XGBoost available")
except ImportError:
    print("‚ö†Ô∏è XGBoost not available - will skip XGBoost models")
    xgb = None

# Model Selection and Validation
from sklearn.model_selection import (
    train_test_split,
    cross_val_score,
    GridSearchCV,
    RandomizedSearchCV,
    StratifiedKFold
)

# Evaluation Metrics
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    classification_report,
    confusion_matrix,
    roc_curve,
    precision_recall_curve,
    auc
)

# Data Preprocessing
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.utils.class_weight import compute_class_weight

# Utilities
from pathlib import Path
import sys
import json
import pickle
from datetime import datetime
import time

# üìÅ Step 2: Set up project structure
print("\nüìÅ SETTING UP PROJECT STRUCTURE...")
print("="*35)

current_dir = Path.cwd()
project_root = current_dir.parent
src_path = project_root / 'src'
sys.path.append(str(src_path))

# Create directories for saving models
models_dir = project_root / 'models'
models_dir.mkdir(exist_ok=True)

results_dir = project_root / 'results'
results_dir.mkdir(exist_ok=True)

print(f"üìÇ Project root: {project_root}")
print(f"üìÇ Models directory: {models_dir}")
print(f"üìÇ Results directory: {results_dir}")

# üéØ Step 3: Set training parameters
print(f"\nüéØ SETTING TRAINING PARAMETERS...")
print("="*30)

# Global parameters
RANDOM_STATE = 42
TEST_SIZE = 0.2
VALIDATION_SIZE = 0.2
CV_FOLDS = 5

# Performance tracking
model_performance = {}
training_times = {}

print(f"üé≤ Random state: {RANDOM_STATE}")
print(f"üìä Test set size: {TEST_SIZE*100:.0f}%")
print(f"üìä Validation size: {VALIDATION_SIZE*100:.0f}%")
print(f"üîÑ Cross-validation folds: {CV_FOLDS}")

print("\n‚úÖ Setup complete! Ready to train models.")

üì¶ IMPORTING MACHINE LEARNING LIBRARIES...
‚úÖ XGBoost available

üìÅ SETTING UP PROJECT STRUCTURE...
üìÇ Project root: c:\Users\DELL\Desktop\AI-Project\AI-Project
üìÇ Models directory: c:\Users\DELL\Desktop\AI-Project\AI-Project\models
üìÇ Results directory: c:\Users\DELL\Desktop\AI-Project\AI-Project\results

üéØ SETTING TRAINING PARAMETERS...
üé≤ Random state: 42
üìä Test set size: 20%
üìä Validation size: 20%
üîÑ Cross-validation folds: 5

‚úÖ Setup complete! Ready to train models.


## üìä Step 1: Data Preparation for Training

Before training our models, we need to properly prepare our data. This includes loading our processed features, splitting data, and ensuring everything is ready for machine learning algorithms.

### üîß **Data Preparation Steps:**
- **üìÅ Load processed data** from feature engineering
- **üéØ Identify features and target** variables
- **‚úÇÔ∏è Split data** into train/validation/test sets
- **üìè Scale features** where necessary
- **‚öñÔ∏è Handle class imbalance** appropriately

### üìã **Data Splitting Strategy:**
- **60% Training**: To teach the models
- **20% Validation**: To tune hyperparameters  
- **20% Testing**: To evaluate final performance

### üé≤ **Why Proper Splitting Matters:**
- **Prevents data leakage** between sets
- **Ensures fair evaluation** of model performance
- **Enables reliable hyperparameter tuning**
- **Provides unbiased final assessment**

In [16]:
# üìä Step 1: Load and prepare data for training
print("üìä LOADING AND PREPARING DATA...")
print("="*32)

# Load processed data
processed_dir = project_root / 'data' / 'processed'
data_loaded = False

# Try to load feature engineered data
try:
    df_path = processed_dir / 'feature_engineered_data.csv'
    if df_path.exists():
        df = pd.read_csv(df_path)
        print(f"‚úÖ Loaded feature engineered data: {df.shape}")
        data_loaded = True
        
        # Try to load feature info
        feature_info_path = processed_dir / 'feature_info.json'
        if feature_info_path.exists():
            with open(feature_info_path, 'r') as f:
                feature_info = json.load(f)
            print(f"‚úÖ Loaded feature information")
        else:
            feature_info = None
            
except Exception as e:
    print(f"‚ùå Error loading processed data: {e}")

# Fallback to raw data if needed
if not data_loaded:
    print("üîÑ Falling back to raw data loading...")
    # Add raw data loading logic here if needed

if data_loaded:
    # üéØ Step 2: Identify features and target
    print(f"\nüéØ IDENTIFYING FEATURES AND TARGET...")
    print("="*35)
    
    # Get target column
    if feature_info and 'target_column' in feature_info:
        target_col = feature_info['target_column']
    else:
        # Auto-detect target
        target_candidates = ['Attrition', 'attrition', 'Left', 'left']
        target_col = None
        for col in df.columns:
            if col in target_candidates or any(candidate.lower() in col.lower() for candidate in target_candidates):
                target_col = col
                break
        
        if target_col is None:
            binary_cols = [col for col in df.columns if df[col].nunique() == 2]
            if binary_cols:
                target_col = binary_cols[0]
    
    if target_col:
        print(f"üéØ Target variable: {target_col}")
        
        # Get features for ML
        if feature_info and 'ml_features' in feature_info:
            feature_columns = feature_info['ml_features']
        else:
            # Use all numerical columns except target
            feature_columns = df.select_dtypes(include=[np.number]).columns.tolist()
            if target_col in feature_columns:
                feature_columns.remove(target_col)
        
        print(f"üìä Features for ML: {len(feature_columns)}")
        
        # Prepare X and y
        X = df[feature_columns]
        y = df[target_col]
        
        print(f"‚úÖ Data prepared: X{X.shape}, y{y.shape}")
        
        # Check class distribution
        class_distribution = y.value_counts()
        print(f"\nüìä Class distribution:")
        for class_val, count in class_distribution.items():
            percentage = (count / len(y)) * 100
            print(f"  ‚Ä¢ {class_val}: {count:,} ({percentage:.1f}%)")
        
        # ‚úÇÔ∏è Step 3: Split data into train/validation/test
        print(f"\n‚úÇÔ∏è SPLITTING DATA...")
        print("="*18)
        
        # First split: separate test set
        X_temp, X_test, y_temp, y_test = train_test_split(
            X, y, test_size=TEST_SIZE, 
            random_state=RANDOM_STATE, 
            stratify=y
        )
        
        # Second split: separate train and validation
        X_train, X_val, y_train, y_val = train_test_split(
            X_temp, y_temp, 
            test_size=VALIDATION_SIZE/(1-TEST_SIZE),  # Adjust for remaining data
            random_state=RANDOM_STATE, 
            stratify=y_temp
        )
        
        print(f"üìä Data splits:")
        print(f"  ‚Ä¢ Training: {X_train.shape[0]:,} samples ({X_train.shape[0]/len(X)*100:.1f}%)")
        print(f"  ‚Ä¢ Validation: {X_val.shape[0]:,} samples ({X_val.shape[0]/len(X)*100:.1f}%)")
        print(f"  ‚Ä¢ Test: {X_test.shape[0]:,} samples ({X_test.shape[0]/len(X)*100:.1f}%)")
        
        # Verify stratification worked
        print(f"\n‚úÖ Class distribution maintained:")
        for dataset_name, y_split in [("Train", y_train), ("Val", y_val), ("Test", y_test)]:
            dist = y_split.value_counts(normalize=True) * 100
            print(f"  ‚Ä¢ {dataset_name}: {dist.iloc[0]:.1f}% / {dist.iloc[1]:.1f}%")
        
        # üìè Step 4: Scale features if needed
        print(f"\nüìè FEATURE SCALING...")
        print("="*18)
        
        # Check if we need scaling (for algorithms that require it)
        # Only calculate ranges for numerical columns
        numerical_cols = X_train.select_dtypes(include=[np.number]).columns
        if len(numerical_cols) > 0:
            feature_ranges = X_train[numerical_cols].max() - X_train[numerical_cols].min()
            max_range = feature_ranges.max()
            min_range = feature_ranges.min()
            range_ratio = max_range / min_range if min_range > 0 else float('inf')
        else:
            range_ratio = 1.0
            max_range = 0.0
            min_range = 0.0
        
        print(f"üìä Feature scale analysis:")
        print(f"  ‚Ä¢ Max range: {max_range:.2f}")
        print(f"  ‚Ä¢ Min range: {min_range:.2f}")
        print(f"  ‚Ä¢ Range ratio: {range_ratio:.1f}:1")
        
        if range_ratio > 10:
            print(f"üîß Applying StandardScaler...")
            scaler = StandardScaler()
            X_train_scaled = scaler.fit_transform(X_train)
            X_val_scaled = scaler.transform(X_val)
            X_test_scaled = scaler.transform(X_test)
            
            # Convert back to DataFrames
            X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
            X_val_scaled = pd.DataFrame(X_val_scaled, columns=X_val.columns, index=X_val.index)
            X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)
            
            print(f"‚úÖ Features scaled successfully")
            scaling_applied = True
        else:
            print(f"‚ÑπÔ∏è Scaling not necessary - features are similarly scaled")
            X_train_scaled = X_train.copy()
            X_val_scaled = X_val.copy()
            X_test_scaled = X_test.copy()
            scaler = None
            scaling_applied = False
        
        print(f"\nüéØ READY FOR MODEL TRAINING!")
        print("="*30)
        print(f"‚úÖ Features prepared: {X_train.shape[1]} features")
        print(f"‚úÖ Data split: Train/Val/Test ready")
        print(f"‚úÖ Scaling: {'Applied' if scaling_applied else 'Not needed'}")
        print(f"‚úÖ Target: {target_col} identified")
        
    else:
        print("‚ùå Could not identify target variable")
        
else:
    print("‚ùå No data available for training")

üìä LOADING AND PREPARING DATA...
‚úÖ Loaded feature engineered data: (11413, 28)
‚úÖ Loaded feature information

üéØ IDENTIFYING FEATURES AND TARGET...
üéØ Target variable: quit
üìä Features for ML: 20
‚úÖ Data prepared: X(11413, 20), y(11413,)

üìä Class distribution:
  ‚Ä¢ 0: 9,430 (82.6%)
  ‚Ä¢ 1: 1,983 (17.4%)

‚úÇÔ∏è SPLITTING DATA...
üìä Data splits:
  ‚Ä¢ Training: 6,847 samples (60.0%)
  ‚Ä¢ Validation: 2,283 samples (20.0%)
  ‚Ä¢ Test: 2,283 samples (20.0%)

‚úÖ Class distribution maintained:
  ‚Ä¢ Train: 82.6% / 17.4%
  ‚Ä¢ Val: 82.6% / 17.4%
  ‚Ä¢ Test: 82.6% / 17.4%

üìè FEATURE SCALING...
üìä Feature scale analysis:
  ‚Ä¢ Max range: 4.38
  ‚Ä¢ Min range: 0.00
  ‚Ä¢ Range ratio: inf:1
üîß Applying StandardScaler...
‚úÖ Features scaled successfully

üéØ READY FOR MODEL TRAINING!
‚úÖ Features prepared: 20 features
‚úÖ Data split: Train/Val/Test ready
‚úÖ Scaling: Applied
‚úÖ Target: quit identified


## ü§ñ Step 2: Model Definition and Training

Now let's define our ensemble of machine learning models! We'll train multiple algorithms to find the best performer for our specific dataset and problem.

### üéØ **Our Model Arsenal:**

#### üìä **Logistic Regression**
- **Type**: Linear classifier
- **Strengths**: Fast, interpretable, probabilistic outputs
- **Best for**: Linear relationships, baseline model
- **Scaling needed**: Yes

#### üå≥ **Random Forest**
- **Type**: Ensemble of decision trees
- **Strengths**: Handles non-linear patterns, feature importance
- **Best for**: Robust performance, mixed data types
- **Scaling needed**: No

#### ‚ö° **XGBoost**
- **Type**: Gradient boosting ensemble
- **Strengths**: High performance, handles missing values
- **Best for**: Competitions, complex patterns
- **Scaling needed**: No

#### üß† **Support Vector Machine**
- **Type**: Margin-based classifier
- **Strengths**: Good with high dimensions, kernel tricks
- **Best for**: Text data, complex boundaries
- **Scaling needed**: Yes

### üîß **Training Strategy:**
- **Cross-validation** for robust performance estimation
- **Hyperparameter tuning** for optimal settings
- **Class weight balancing** for imbalanced data
- **Performance tracking** for comprehensive comparison

In [17]:
# Import required libraries for model saving
import joblib

# ü§ñ Step 1: Define our model ensemble
if 'X_train' in locals() and X_train is not None:
    print("ü§ñ DEFINING MODEL ENSEMBLE...")
    print("="*27)
    
    # Calculate class weights for imbalanced data
    classes = np.unique(y_train)
    class_weights = compute_class_weight('balanced', classes=classes, y=y_train)
    class_weight_dict = dict(zip(classes, class_weights))
    
    print(f"‚öñÔ∏è Class weights calculated: {class_weight_dict}")
    
    # Define models with their configurations
    models = {
        'Logistic Regression': {
            'model': LogisticRegression(
                random_state=RANDOM_STATE,
                class_weight='balanced',
                max_iter=1000
            ),
            'needs_scaling': True,
            'description': 'Linear classifier with balanced class weights'
        },
        
        'Random Forest': {
            'model': RandomForestClassifier(
                n_estimators=100,
                random_state=RANDOM_STATE,
                class_weight='balanced',
                n_jobs=-1
            ),
            'needs_scaling': False,
            'description': 'Ensemble of 100 decision trees'
        },
        
        'Gradient Boosting': {
            'model': GradientBoostingClassifier(
                n_estimators=100,
                random_state=RANDOM_STATE,
                learning_rate=0.1
            ),
            'needs_scaling': False,
            'description': 'Sequential boosting algorithm'
        },
        
        'Support Vector Machine': {
            'model': SVC(
                random_state=RANDOM_STATE,
                class_weight='balanced',
                probability=True  # Enable probability estimates
            ),
            'needs_scaling': True,
            'description': 'Margin-based classifier with RBF kernel'
        }
    }
    
    # Add XGBoost if available
    if xgb is not None:
        models['XGBoost'] = {
            'model': XGBClassifier(
                n_estimators=100,
                random_state=RANDOM_STATE,
                eval_metric='logloss',
                scale_pos_weight=class_weights[1]/class_weights[0]  # Handle imbalance
            ),
            'needs_scaling': False,
            'description': 'Advanced gradient boosting'
        }
    
    print(f"üìä Models defined: {len(models)} algorithms")
    for name, config in models.items():
        scaling = "Requires scaling" if config['needs_scaling'] else "No scaling needed"
        print(f"  ‚Ä¢ {name}: {config['description']} ({scaling})")
    
    # üöÄ Step 2: Train all models
    print(f"\nüöÄ TRAINING ALL MODELS...")
    print("="*24)
    
    trained_models = {}
    training_results = {}
    
    for model_name, config in models.items():
        print(f"\nüîß Training {model_name}...")
        
        start_time = time.time()
        
        try:
            # Choose appropriate dataset (scaled or original)
            if config['needs_scaling']:
                X_train_use = X_train_scaled
                X_val_use = X_val_scaled
                X_test_use = X_test_scaled
                print(f"  üìè Using scaled features")
            else:
                X_train_use = X_train
                X_val_use = X_val
                X_test_use = X_test
                print(f"  üìä Using original features")
            
            # Train the model
            model = config['model']
            model.fit(X_train_use, y_train)
            
            # Record training time
            training_time = time.time() - start_time
            
            # Make predictions on validation set
            y_val_pred = model.predict(X_val_use)
            y_val_proba = model.predict_proba(X_val_use)[:, 1] if hasattr(model, 'predict_proba') else None
            
            # Calculate validation metrics
            val_accuracy = accuracy_score(y_val, y_val_pred)
            val_precision = precision_score(y_val, y_val_pred, average='binary')
            val_recall = recall_score(y_val, y_val_pred, average='binary')
            val_f1 = f1_score(y_val, y_val_pred, average='binary')
            
            if y_val_proba is not None:
                val_roc_auc = roc_auc_score(y_val, y_val_proba)
            else:
                val_roc_auc = None
            
            # Store results
            trained_models[model_name] = {
                'model': model,
                'needs_scaling': config['needs_scaling'],
                'scaler': scaler if config['needs_scaling'] else None
            }
            
            training_results[model_name] = {
                'training_time': training_time,
                'val_accuracy': val_accuracy,
                'val_precision': val_precision,
                'val_recall': val_recall,
                'val_f1': val_f1,
                'val_roc_auc': val_roc_auc,
                'predictions': y_val_pred,
                'probabilities': y_val_proba
            }

            print(f"  ‚úÖ Success! Training time: {training_time:.2f}s")
            roc_auc_display=f"{val_roc_auc:.4f}" if val_roc_auc is not None else 'N/A'
            print(f"     Validation F1: {val_f1:.4f}, ROC-AUC: {roc_auc_display}")

        except Exception as e:
            print(f"  ‚ùå Failed: {str(e)}")
            training_results[model_name] = None
    
    # üìä Step 3: Cross-validation for robust evaluation
    print(f"\nüìä CROSS-VALIDATION EVALUATION...")
    print("="*32)
    
    cv_results = {}
    cv_scores = ['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    
    for model_name, model_info in trained_models.items():
        if model_info is not None:
            print(f"\nüîÑ CV for {model_name}...")
            
            model = model_info['model']
            X_use = X_train_scaled if model_info['needs_scaling'] else X_train
            
            cv_results[model_name] = {}
            
            for score in cv_scores:
                try:
                    scores = cross_val_score(
                        model, X_use, y_train, 
                        cv=StratifiedKFold(n_splits=CV_FOLDS, shuffle=True, random_state=RANDOM_STATE),
                        scoring=score,
                        n_jobs=-1
                    )
                    
                    cv_results[model_name][score] = {
                        'mean': scores.mean(),
                        'std': scores.std(),
                        'scores': scores
                    }
                    
                    print(f"  ‚Ä¢ {score}: {scores.mean():.4f} (¬±{scores.std():.4f})")
                    
                except Exception as e:
                    print(f"  ‚ö†Ô∏è {score}: Failed ({str(e)})")
                    cv_results[model_name][score] = None
    
    # üìã Step 4: Performance summary
    print(f"\nüìã TRAINING SUMMARY:")
    print("="*20)
    
    print(f"‚úÖ Models successfully trained: {len([r for r in training_results.values() if r is not None])}")
    print(f"‚ùå Models failed: {len([r for r in training_results.values() if r is None])}")
    
    # Find best performing model (by F1 score on validation)
    best_f1 = 0
    best_model_name = None
    
    for model_name, results in training_results.items():
        if results is not None and results['val_f1'] > best_f1:
            best_f1 = results['val_f1']
            best_model_name = model_name
    
    if best_model_name:
        print(f"üèÜ Best model (by validation F1): {best_model_name} (F1: {best_f1:.4f})")
    
    print(f"\nüéØ Ready for detailed evaluation and model selection!")

else:
    print("‚ö†Ô∏è No training data available. Please run data preparation first.")

    # üìÅ Step 5: Save trained models and data for evaluation
print(f"\nüíæ SAVING MODELS AND DATA FOR EVALUATION...")
print("="*40)

# Save trained models
for model_name, model_info in trained_models.items():
    if model_info is not None:
        try:
            # Save the model
            model_path = models_dir / f"{model_name.replace(' ', '_').lower()}_model.pkl"
            joblib.dump(model_info, model_path)
            print(f"‚úÖ Saved {model_name} to {model_path}")
        except Exception as e:
            print(f"‚ùå Failed to save {model_name}: {e}")

# Save test data and other variables
data_to_save = {
    'X_test': X_test,
    'y_test': y_test,
    'X_train': X_train,
    'y_train': y_train,
    'X_test_scaled': X_test_scaled,
    'feature_columns': feature_columns,
    'target_col': target_col
}

try:
    data_path = models_dir / 'evaluation_data.pkl'
    joblib.dump(data_to_save, data_path)
    print(f"‚úÖ Saved evaluation data to {data_path}")
except Exception as e:
    print(f"‚ùå Failed to save evaluation data: {e}")

print("\nüéâ All models and data saved! Ready for evaluation.")

ü§ñ DEFINING MODEL ENSEMBLE...
‚öñÔ∏è Class weights calculated: {0: 0.605072463768116, 1: 2.8793103448275863}
üìä Models defined: 5 algorithms
  ‚Ä¢ Logistic Regression: Linear classifier with balanced class weights (Requires scaling)
  ‚Ä¢ Random Forest: Ensemble of 100 decision trees (No scaling needed)
  ‚Ä¢ Gradient Boosting: Sequential boosting algorithm (No scaling needed)
  ‚Ä¢ Support Vector Machine: Margin-based classifier with RBF kernel (Requires scaling)
  ‚Ä¢ XGBoost: Advanced gradient boosting (No scaling needed)

üöÄ TRAINING ALL MODELS...

üîß Training Logistic Regression...
  üìè Using scaled features
  ‚úÖ Success! Training time: 0.01s
     Validation F1: 0.6043, ROC-AUC: 0.8508

üîß Training Random Forest...
  üìä Using original features
  ‚úÖ Success! Training time: 0.24s
     Validation F1: 0.9407, ROC-AUC: 0.9834

üîß Training Gradient Boosting...
  üìä Using original features
  ‚úÖ Success! Training time: 0.63s
     Validation F1: 0.9354, ROC-AUC: 0.9865