# DistilBERT Underperformance Diagnostic

**Problem**: DistilBERT experiment (exp_004) achieved only 0.6312 AUC vs expected 0.65-0.68
**Expected gain**: +0.03-0.05 over TF-IDF baseline (0.6253)
**Actual result**: -0.0022 (slightly worse)

**Key observations from execution logs:**
- Early stopping triggered at very low iterations: 5, 5, 9, 118, 3
- Fold 4 performed well (0.6578, 118 iterations)
- Fold 5 performed poorly (0.5980, only 3 iterations)
- High variance in training behavior suggests potential issues

**Hypotheses to investigate:**
1. Feature scaling/normalization issues with DistilBERT embeddings
2. LightGBM hyperparameters not optimal for dense neural features
3. Data leakage or validation strategy problems
4. DistilBERT embedding quality/distribution issues
5. Class imbalance handling problems
6. Early stopping too aggressive for this feature type

In [3]:
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
import torch
from transformers import DistilBertTokenizer, DistilBertModel
import warnings
warnings.filterwarnings('ignore')

# Set seed
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

print("Libraries loaded successfully")

Libraries loaded successfully


## 1. Load Data and Reproduce DistilBERT Features

In [4]:
# Load data
train_path = '/home/data/train.json'
with open(train_path, 'r') as f:
    train_data = json.load(f)
train_df = pd.DataFrame(train_data)

test_path = '/home/data/test.json'
with open(test_path, 'r') as f:
    test_data = json.load(f)
test_df = pd.DataFrame(test_data)

print(f"Training samples: {len(train_df)}")
print(f"Test samples: {len(test_df)}")
print(f"Target distribution: {train_df['requester_received_pizza'].mean():.4f}")

# Combine text fields
train_df['combined_text'] = train_df['request_title'].fillna('') + ' ' + train_df['request_text_edit_aware'].fillna('')
test_df['combined_text'] = test_df['request_title'].fillna('') + ' ' + test_df['request_text_edit_aware'].fillna('')

# Load meta-features from the experiment
meta_features = [
    'total_text_length', 'title_word_count', 'total_word_count', 'word_count', 'text_length',
    'requester_number_of_posts_at_request', 'requester_number_of_comments_at_request',
    'requester_upvotes_minus_downvotes_at_request', 'requester_upvotes_plus_downvotes_at_request',
    'requester_account_age_in_days_at_request', 'requester_days_since_first_post_on_raop_at_request',
    'request_hour', 'request_minute', 'request_day_of_week', 'request_day_of_month',
    'request_month', 'request_year', 'requester_account_age_in_days_at_request_bin',
    'requester_upvotes_plus_downvotes_at_request_bin'
]

print(f"Meta-features to use: {len(meta_features)}")

Training samples: 2878
Test samples: 1162
Target distribution: 0.2484
Meta-features to use: 19


## 2. Extract DistilBERT Embeddings

In [5]:
# Load DistilBERT model and tokenizer
print("Loading DistilBERT...")
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
model.eval()

if torch.cuda.is_available():
    model = model.cuda()
    print("Using GPU for DistilBERT")
else:
    print("Using CPU for DistilBERT")

# Extract embeddings
batch_size = 16
def extract_distilbert_features(texts, max_length=256):
    """Extract [CLS] token embeddings from DistilBERT"""
    all_features = []
    
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        
        # Tokenize
        inputs = tokenizer(
            batch_texts,
            padding=True,
            truncation=True,
            max_length=max_length,
            return_tensors='pt'
        )
        
        if torch.cuda.is_available():
            inputs = {k: v.cuda() for k, v in inputs.items()}
        
        # Get embeddings
        with torch.no_grad():
            outputs = model(**inputs)
            # Use [CLS] token (first token) embedding
            cls_embeddings = outputs.last_hidden_state[:, 0, :].cpu().numpy()
        
        all_features.append(cls_embeddings)
        
        if i % (batch_size * 10) == 0:
            print(f"Processed {i}/{len(texts)} texts")
    
    return np.vstack(all_features)

print("Extracting DistilBERT features from training data...")
train_distilbert = extract_distilbert_features(train_df['combined_text'].tolist())

print("\nExtracting DistilBERT features from test data...")
test_distilbert = extract_distilbert_features(test_df['combined_text'].tolist())

print(f"\nDistilBERT feature shape: {train_distilbert.shape}")
print(f"Sample values (first 10 dims): {train_distilbert[0, :10]}")

Loading DistilBERT...


Using GPU for DistilBERT
Extracting DistilBERT features from training data...


Processed 0/2878 texts


Processed 160/2878 texts


Processed 320/2878 texts


Processed 480/2878 texts


Processed 640/2878 texts


Processed 800/2878 texts


Processed 960/2878 texts


Processed 1120/2878 texts


Processed 1280/2878 texts


Processed 1440/2878 texts


Processed 1600/2878 texts


Processed 1760/2878 texts


Processed 1920/2878 texts


Processed 2080/2878 texts


Processed 2240/2878 texts


Processed 2400/2878 texts


Processed 2560/2878 texts


Processed 2720/2878 texts



Extracting DistilBERT features from test data...
Processed 0/1162 texts


Processed 160/1162 texts


Processed 320/1162 texts


Processed 480/1162 texts


Processed 640/1162 texts


Processed 800/1162 texts


Processed 960/1162 texts


Processed 1120/1162 texts

DistilBERT feature shape: (2878, 768)
Sample values (first 10 dims): [ 0.21099712 -0.01889989  0.02943027 -0.1861856   0.01320272 -0.2866645
  0.12797116  0.503011   -0.06609333 -0.5012868 ]


## 3. Analyze DistilBERT Embedding Distribution

In [8]:
# Analyze embedding statistics
print("=== DistilBERT Embedding Distribution Analysis ===\n")

# Basic statistics
print("Embedding statistics:")
print(f"Mean: {train_distilbert.mean():.4f}")
print(f"Std: {train_distilbert.std():.4f}")
print(f"Min: {train_distilbert.min():.4f}")
print(f"Max: {train_distilbert.max():.4f}")
print(f"Range: {train_distilbert.max() - train_distilbert.min():.4f}")

# Per-dimension statistics
print(f"\nPer-dimension statistics:")
print(f"Mean of means: {train_distilbert.mean(axis=0).mean():.4f}")
print(f"Std of means: {train_distilbert.mean(axis=0).std():.4f}")
print(f"Mean of stds: {train_distilbert.std(axis=0).mean():.4f}")
print(f"Std of stds: {train_distilbert.std(axis=0).std():.4f}")

# Check for constant or near-constant dimensions
constant_dims = np.where(train_distilbert.std(axis=0) < 0.01)[0]
print(f"\nNear-constant dimensions (std < 0.01): {len(constant_dims)}")
if len(constant_dims) > 0:
    print(f"First 10 constant dims: {constant_dims[:10]}")

# Check distribution shape
print(f"\nDistribution shape analysis:")
print(f"Skewness: {pd.Series(train_distilbert.flatten()).skew():.4f}")
print(f"Kurtosis: {pd.Series(train_distilbert.flatten()).kurtosis():.4f}")

# Check for outliers using IQR method
Q1 = np.percentile(train_distilbert, 25)
Q3 = np.percentile(train_distilbert, 75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = np.sum((train_distilbert < lower_bound) | (train_distilbert > upper_bound))
print(f"Outliers (IQR method): {outliers} ({outliers/train_distilbert.size*100:.2f}%)")

=== DistilBERT Embedding Distribution Analysis ===

Embedding statistics:
Mean: -0.0086
Std: 0.4326
Min: -7.3812
Max: 3.9338
Range: 11.3150

Per-dimension statistics:
Mean of means: -0.0086
Std of means: 0.4197
Mean of stds: 0.1025
Std of stds: 0.0227

Near-constant dimensions (std < 0.01): 0

Distribution shape analysis:
Skewness: -4.2466
Kurtosis: 59.2398
Outliers (IQR method): 56654 (2.56%)


## 4. Compare Meta-Features vs DistilBERT Features

In [11]:
# Check available columns and use basic meta-features
print("Available columns in train_df:")
available_cols = train_df.columns.tolist()
print(f"Train columns: {len(available_cols)}")

# Use basic meta-features that exist - only columns that are actually present
basic_meta_features = []
for col in available_cols:
    if col not in ['request_id', 'request_title', 'request_text_edit_aware', 'combined_text', 'requester_received_pizza']:
        if train_df[col].dtype in ['int64', 'float64']:
            basic_meta_features.append(col)

print(f"\nUsing {len(basic_meta_features)} numeric meta-features:")
print(basic_meta_features)

# Prepare meta-features
meta_features_df = train_df[basic_meta_features].copy()

# For test, only use columns that exist in both
test_available_cols = test_df.columns.tolist()
common_features = [col for col in basic_meta_features if col in test_available_cols]
print(f"\nCommon features in both train and test: {len(common_features)}")

meta_features_test = test_df[common_features].copy()

# Handle missing values
meta_features_df = meta_features_df.fillna(0)
meta_features_test = meta_features_test.fillna(0)

print("\n=== Feature Scale Comparison ===\n")

# Meta-features statistics
meta_means = meta_features_df.mean()
meta_stds = meta_features_df.std()
meta_ranges = meta_features_df.max() - meta_features_df.min()

print("Meta-features statistics:")
print(f"Mean range: {meta_ranges.mean():.2f}")
print(f"Mean std: {meta_stds.mean():.2f}")
print(f"Max value: {meta_features_df.max().max():.2f}")
print(f"Min value: {meta_features_df.min().min():.2f}")

# DistilBERT statistics
print(f"\nDistilBERT embedding statistics:")
print(f"Mean range: {(train_distilbert.max(axis=0) - train_distilbert.min(axis=0)).mean():.4f}")
print(f"Mean std: {train_distilbert.std(axis=0).mean():.4f}")
print(f"Max value: {train_distilbert.max():.4f}")
print(f"Min value: {train_distilbert.min():.4f}")

# Scale comparison
meta_scale = meta_features_df.max().max() - meta_features_df.min().min()
distilbert_scale = train_distilbert.max() - train_distilbert.min()
print(f"\n=== SCALE MISMATCH ===")
print(f"Meta-features scale: {meta_scale:.2f}")
print(f"DistilBERT scale: {distilbert_scale:.4f}")
print(f"Ratio (meta/distilbert): {meta_scale/distilbert_scale:.2f}x")
print(f"This means meta-features are {meta_scale/distilbert_scale:.0f}x larger in scale!")

Available columns in train_df:
Train columns: 33

Using 22 numeric meta-features:
['number_of_downvotes_of_request_at_retrieval', 'number_of_upvotes_of_request_at_retrieval', 'request_number_of_comments_at_retrieval', 'requester_account_age_in_days_at_request', 'requester_account_age_in_days_at_retrieval', 'requester_days_since_first_post_on_raop_at_request', 'requester_days_since_first_post_on_raop_at_retrieval', 'requester_number_of_comments_at_request', 'requester_number_of_comments_at_retrieval', 'requester_number_of_comments_in_raop_at_request', 'requester_number_of_comments_in_raop_at_retrieval', 'requester_number_of_posts_at_request', 'requester_number_of_posts_at_retrieval', 'requester_number_of_posts_on_raop_at_request', 'requester_number_of_posts_on_raop_at_retrieval', 'requester_number_of_subreddits_at_request', 'requester_upvotes_minus_downvotes_at_request', 'requester_upvotes_minus_downvotes_at_retrieval', 'requester_upvotes_plus_downvotes_at_request', 'requester_upvotes_p

## 5. Test Different LightGBM Hyperparameters

In [12]:
# Prepare combined features with PROPER SCALING
print("=== Testing Different Scaling Approaches ===\n")

from sklearn.preprocessing import StandardScaler, RobustScaler, MaxAbsScaler

# Approach 1: No scaling (original - BAD)
print("Approach 1: No scaling (original experiment)")
X_combined_raw = np.hstack([meta_features_df.values, train_distilbert])
print(f"Shape: {X_combined_raw.shape}")
print(f"Meta-feature scale: {meta_features_df.max().max():.2f}")
print(f"DistilBERT scale: {train_distilbert.max():.4f}")
print(f"Scale ratio: {meta_features_df.max().max() / train_distilbert.max():.0f}x\n")

# Approach 2: Standardize meta-features only
print("Approach 2: Standardize meta-features only")
scaler_meta = StandardScaler()
meta_scaled = scaler_meta.fit_transform(meta_features_df.values)
X_combined_meta_scaled = np.hstack([meta_scaled, train_distilbert])
print(f"Meta-features after scaling - Mean: {meta_scaled.mean():.4f}, Std: {meta_scaled.std():.4f}")
print(f"DistilBERT unchanged - Mean: {train_distilbert.mean():.4f}, Std: {train_distilbert.std():.4f}\n")

# Approach 3: Standardize both
print("Approach 3: Standardize both meta-features AND DistilBERT")
scaler_both = StandardScaler()
# Combine and scale together
X_combined_both_scaled = scaler_both.fit_transform(np.hstack([meta_features_df.values, train_distilbert]))
print(f"Combined features after scaling - Mean: {X_combined_both_scaled.mean():.4f}, Std: {X_combined_both_scaled.std():.4f}\n")

# Approach 4: RobustScaler (handles outliers better)
print("Approach 4: RobustScaler for meta-features (handles outliers)")
scaler_robust = RobustScaler()
meta_robust = scaler_robust.fit_transform(meta_features_df.values)
X_combined_robust = np.hstack([meta_robust, train_distilbert])
print(f"Meta-features after RobustScaler - Median: {np.median(meta_robust):.4f}, IQR: {np.percentile(meta_robust, 75) - np.percentile(meta_robust, 25):.4f}\n")

# Prepare target
y = train_df['requester_received_pizza'].values

# Test different hyperparameter configurations
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=RANDOM_SEED)

print("=== Testing LightGBM with Different Configurations ===\n")

# Configuration 1: Original parameters (should fail)
print("Config 1: Original parameters (from experiment)")
print("- num_leaves: 31")
print("- learning_rate: 0.1") 
print("- n_estimators: 1000")
print("- scale_pos_weight: auto")
print("- early_stopping: 50 rounds")
print("- NO SCALING (scale mismatch problem)\n")

# Configuration 2: Fixed scaling + adjusted early stopping
print("Config 2: Standardize both + reduced early stopping")
print("- StandardScaler on ALL features")
print("- early_stopping_rounds: 10 (instead of 50)")
print("- Other params same\n")

# Configuration 3: Optimized for dense features
print("Config 3: Optimized for dense neural features")
print("- StandardScaler on ALL features") 
print("- num_leaves: 63 (more capacity for dense features)")
print("- learning_rate: 0.05 (slower learning)")
print("- early_stopping: 30 rounds")
print("- feature_fraction: 0.8 (feature sampling)\n")

# Configuration 4: Separate scaling
print("Config 4: Separate scaling strategies")
print("- RobustScaler for meta-features (handles outliers)")
print("- DistilBERT embeddings unchanged (already normalized)")
print("- num_leaves: 31")
print("- learning_rate: 0.1")
print("- early_stopping: 30 rounds\n")

# Now let's actually test these configurations
results = []

for config_name, X_features in [
    ("No Scaling (Original)", X_combined_raw),
    ("Meta-Only Scaling", X_combined_meta_scaled), 
    ("Both Scaled", X_combined_both_scaled),
    ("RobustScaler", X_combined_robust)
]:
    print(f"=== Testing: {config_name} ===")
    
    fold_scores = []
    fold_iterations = []
    
    for fold, (train_idx, val_idx) in enumerate(cv.split(X_features, y)):
        X_train, X_val = X_features[train_idx], X_features[val_idx]
        y_train, y_val = y[train_idx], y[val_idx]
        
        # Calculate scale_pos_weight
        scale_pos_weight = (len(y_train) - sum(y_train)) / sum(y_train)
        
        model = lgb.LGBMClassifier(
            n_estimators=1000,
            learning_rate=0.1,
            num_leaves=31,
            scale_pos_weight=scale_pos_weight,
            random_state=RANDOM_SEED,
            n_jobs=-1
        )
        
        model.fit(
            X_train, y_train,
            eval_set=[(X_val, y_val)],
            eval_metric='auc',
            callbacks=[
                lgb.early_stopping(30, verbose=False),
                lgb.log_evaluation(0)
            ]
        )
        
        # Predict and evaluate
        val_pred = model.predict_proba(X_val)[:, 1]
        score = roc_auc_score(y_val, val_pred)
        
        fold_scores.append(score)
        fold_iterations.append(model.best_iteration_)
        
        print(f"  Fold {fold + 1}: AUC = {score:.4f}, Iterations = {model.best_iteration_}")
    
    mean_score = np.mean(fold_scores)
    std_score = np.std(fold_scores)
    mean_iterations = np.mean(fold_iterations)
    
    print(f"  Mean AUC: {mean_score:.4f} ± {std_score:.4f}")
    print(f"  Mean iterations: {mean_iterations:.1f}")
    print()
    
    results.append({
        'config': config_name,
        'mean_auc': mean_score,
        'std_auc': std_score,
        'mean_iterations': mean_iterations
    })

# Summary
print("=== SUMMARY OF RESULTS ===")
results_df = pd.DataFrame(results).sort_values('mean_auc', ascending=False)
print(results_df.to_string(index=False))

=== Testing Different Scaling Approaches ===

Approach 1: No scaling (original experiment)
Shape: (2878, 790)
Meta-feature scale: 1381373472.00
DistilBERT scale: 3.9338
Scale ratio: 351153431x

Approach 2: Standardize meta-features only
Meta-features after scaling - Mean: -0.0000, Std: 1.0000
DistilBERT unchanged - Mean: -0.0086, Std: 0.4326

Approach 3: Standardize both meta-features AND DistilBERT
Combined features after scaling - Mean: 0.0000, Std: 1.0000

Approach 4: RobustScaler for meta-features (handles outliers)
Meta-features after RobustScaler - Median: 0.0000, IQR: 0.6979

=== Testing LightGBM with Different Configurations ===

Config 1: Original parameters (from experiment)
- num_leaves: 31
- learning_rate: 0.1
- n_estimators: 1000
- scale_pos_weight: auto
- early_stopping: 50 rounds
- NO SCALING (scale mismatch problem)

Config 2: Standardize both + reduced early stopping
- StandardScaler on ALL features
- early_stopping_rounds: 10 (instead of 50)
- Other params same

Confi

[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036024 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199296
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 1: AUC = 0.8857, Iterations = 48
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.038739 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199294
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 2: AUC = 0.8559, Iterations = 22
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.040076 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199368
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 3: AUC = 0.8781, Iterations = 43
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.044239 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199289


[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 4: AUC = 0.8526, Iterations = 24
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.039439 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199293
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 5: AUC = 0.8861, Iterations = 32
  Mean AUC: 0.8717 ± 0.0146
  Mean iterations: 33.8

=== Testing: Meta-Only Scaling ===
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.040146 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199307
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 1: AUC = 0.8907, Iterations = 43
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036544 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199307
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 2: AUC = 0.8585, Iterations = 19
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036701 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199390
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 3: AUC = 0.8805, Iterations = 37
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036822 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199302
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 4: AUC = 0.8495, Iterations = 22
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036776 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199306
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 5: AUC = 0.8877, Iterations = 21
  Mean AUC: 0.8734 ± 0.0164
  Mean iterations: 28.4

=== Testing: Both Scaled ===
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036731 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199307
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 1: AUC = 0.8868, Iterations = 50
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.049670 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199307
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790


[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 2: AUC = 0.8648, Iterations = 31
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036209 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199390
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 3: AUC = 0.8750, Iterations = 36
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.036600 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199302
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 4: AUC = 0.8479, Iterations = 24
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.035854 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199306
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 5: AUC = 0.8819, Iterations = 41
  Mean AUC: 0.8713 ± 0.0138
  Mean iterations: 36.4

=== Testing: RobustScaler ===
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.035943 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199295
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 1: AUC = 0.8850, Iterations = 50
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.035815 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199295
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 2: AUC = 0.8528, Iterations = 25
[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.035954 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199372
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 3: AUC = 0.8809, Iterations = 41
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.035928 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 199290
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 4: AUC = 0.8548, Iterations = 29
[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.053841 seconds.
You can set `force_col_wise=true` to remove the overhead.


[LightGBM] [Info] Total Bins 199293
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 790
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 5: AUC = 0.8805, Iterations = 25
  Mean AUC: 0.8708 ± 0.0140
  Mean iterations: 34.0

=== SUMMARY OF RESULTS ===
               config  mean_auc  std_auc  mean_iterations
    Meta-Only Scaling  0.873371 0.016430             28.4
No Scaling (Original)  0.871676 0.014564             33.8
          Both Scaled  0.871275 0.013829             36.4
         RobustScaler  0.870791 0.013971             34.0
