# Evolver Loop 3: Analysis of exp_003 Results

## Goal
Analyze the successful TF-IDF fixes from exp_003 and identify next steps to reach gold threshold (0.979080).

Current best: 0.6555 AUC (exp_003)
Gold threshold: 0.979080
Gap: 0.3236 points

## Key Findings from exp_003
- Character n-grams dominate feature importance (4 of top 10 features)
- Feature selection worked: 3,000 features better than 10,000
- Removed stop words to keep domain vocabulary
- CV improved from 0.6217 → 0.6555 (+0.0338)
- Low variance (±0.0104) indicates stable model

In [2]:
import pandas as pd
import numpy as np
import json
import re
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest, chi2
from scipy.sparse import csr_matrix, hstack
import warnings
warnings.filterwarnings('ignore')

# Set seed
np.random.seed(42)

# Load data
print("Loading data...")
with open('/home/data/train.json', 'r') as f:
    train_data = json.load(f)
with open('/home/data/test.json', 'r') as f:
    test_data = json.load(f)

train = pd.DataFrame(train_data)
test = pd.DataFrame(test_data)

print(f"Train: {len(train)} samples, {sum(train['requester_received_pizza'])} positive ({sum(train['requester_received_pizza'])/len(train):.3f})")
print(f"Test: {len(test)} samples")

# Extract text
y = train['requester_received_pizza'].values
text_train = train['request_text_edit_aware'].fillna('').str.lower()
text_test = test['request_text_edit_aware'].fillna('').str.lower()

Loading data...
Train: 2878 samples, 715 positive (0.248)
Test: 1162 samples


In [3]:
# Analyze character n-gram patterns from exp_003
# We need to understand what the top character n-grams represent

# Recreate the char n-gram vectorizer used in exp_003
char_vectorizer = TfidfVectorizer(
    analyzer='char',
    ngram_range=(3, 5),
    max_features=2000,
    min_df=2,
    max_df=0.9,
    sublinear_tf=True
)

print("Fitting character n-gram vectorizer...")
X_char_train = char_vectorizer.fit_transform(text_train)
X_char_test = char_vectorizer.transform(text_test)

char_feature_names = char_vectorizer.get_feature_names_out()
print(f"Character n-grams shape: {X_char_train.shape}")
print(f"Top 20 character n-grams:")
for i, name in enumerate(char_feature_names[:20]):
    print(f"  {i+1:2d}. {name}")

# Let's see what these n-grams actually correspond to in the text
# by finding examples where they appear

def find_ngram_examples(text_series, ngram, n_examples=3):
    """Find examples of texts containing a specific n-gram"""
    examples = []
    for idx, text in enumerate(text_series):
        if ngram in text:
            # Find the context around the n-gram
            pos = text.find(ngram)
            start = max(0, pos - 50)
            end = min(len(text), pos + len(ngram) + 50)
            context = text[start:end].replace('\n', ' ')
            examples.append(context)
            if len(examples) >= n_examples:
                break
    return examples

print("\n" + "="*80)
print("ANALYZING TOP CHARACTER N-GRAMS")
print("="*80)

# Analyze the top 10 character n-grams
for i in range(min(10, len(char_feature_names))):
    ngram = char_feature_names[i]
    print(f"\n{i+1:2d}. '{ngram}'")
    
    # Find examples in successful and failed requests
    success_examples = find_ngram_examples(
        text_train[y == 1], ngram, n_examples=2
    )
    fail_examples = find_ngram_examples(
        text_train[y == 0], ngram, n_examples=2
    )
    
    if success_examples:
        print(f"   In SUCCESSFUL requests:")
        for ex in success_examples:
            print(f"      - ...{ex}...")
    
    if fail_examples:
        print(f"   In FAILED requests:")
        for ex in fail_examples:
            print(f"      - ...{ex}...")
    
    # Count frequency
    success_count = sum(text_train[y == 1].str.contains(ngram, na=False))
    fail_count = sum(text_train[y == 0].str.contains(ngram, na=False))
    success_rate = success_count / len(text_train[y == 1]) if len(text_train[y == 1]) > 0 else 0
    fail_rate = fail_count / len(text_train[y == 0]) if len(text_train[y == 0]) > 0 else 0
    
    print(f"   Frequency: {success_count}/{len(text_train[y == 1])} ({success_rate:.3f}) in successes")
    print(f"   Frequency: {fail_count}/{len(text_train[y == 0])} ({fail_rate:.3f}) in failures")

Fitting character n-gram vectorizer...


Character n-grams shape: (2878, 2000)
Top 20 character n-grams:
   1.  a 
   2.  a b
   3.  a c
   4.  a f
   5.  a l
   6.  a n
   7.  a p
   8.  a pi
   9.  a r
  10.  a s
  11.  a w
  12.  ab
  13.  abo
  14.  abou
  15.  ac
  16.  acc
  17.  af
  18.  aft
  19.  afte
  20.  ag

ANALYZING TOP CHARACTER N-GRAMS

 1. ' a '
   In SUCCESSFUL requests:
      - ...i will go ahead and say that i got a pizza meal from here before as to not seem like i'...
      - ...zalodad and myself would love to have a pizza with our kids tonight! my husband lost his j...
   In FAILED requests:
      - ...i will soon be going on a long deployment which i'm not aloud to discuss but...
      - ...ould all really appreciate it, and would even send a picture of the three of us enjoying the said pizza...
   Frequency: 616/715 (0.862) in successes
   Frequency: 1693/2163 (0.783) in failures

 2. ' a b'
   In SUCCESSFUL requests:
      - ...ing until they call me in for that expendable job, a background check f

In [4]:
# Now let's analyze what these character n-grams actually represent
# The top features in exp_003 were: char_ss, char_f a, char_e a, char_thi, char_ere

# Let's look for these specific patterns
key_patterns = ['ss', 'f a', 'e a', 'thi', 'ere', ' a ', ' a p', ' a pi']

print("="*80)
print("DEEP DIVE: KEY CHARACTER PATTERNS FROM exp_003")
print("="*80)

for pattern in key_patterns:
    print(f"\nPattern: '{pattern}'")
    
    # Find examples
    success_examples = find_ngram_examples(text_train[y == 1], pattern, n_examples=3)
    fail_examples = find_ngram_examples(text_train[y == 0], pattern, n_examples=3)
    
    # Count frequency
    success_count = sum(text_train[y == 1].str.contains(pattern, na=False))
    fail_count = sum(text_train[y == 0].str.contains(pattern, na=False))
    total_success = len(text_train[y == 1])
    total_fail = len(text_train[y == 0])
    
    success_rate = success_count / total_success
    fail_rate = fail_count / total_fail
    
    print(f"  Frequency in successes: {success_count}/{total_success} ({success_rate:.3f})")
    print(f"  Frequency in failures: {fail_count}/{total_fail} ({fail_rate:.3f})")
    print(f"  Ratio (success/fail): {success_rate/fail_rate:.3f}")
    
    # Show examples
    if success_examples:
        print(f"  Examples in SUCCESSFUL requests:")
        for ex in success_examples[:2]:
            print(f"    - ...{ex}...")
    
    if fail_examples:
        print(f"  Examples in FAILED requests:")
        for ex in fail_examples[:2]:
            print(f"    - ...{ex}...")
    
    # What words contain this pattern?
    # Extract all words containing this pattern from successful requests
    success_words = Counter()
    for text in text_train[y == 1]:
        words = text.split()
        for word in words:
            if pattern in word:
                success_words[word] += 1
    
    if success_words:
        print(f"  Common words with '{pattern}' in successes:")
        for word, count in success_words.most_common(5):
            print(f"    - '{word}' (x{count})")

DEEP DIVE: KEY CHARACTER PATTERNS FROM exp_003

Pattern: 'ss'
  Frequency in successes: 248/715 (0.347)
  Frequency in failures: 675/2163 (0.312)
  Ratio (success/fail): 1.111
  Examples in SUCCESSFUL requests:
    - ... all they do is convince people to not help the less-fortunate. if anyone wants to roll the dice and b...
    - ...nt to whine or drag on. basically, i've been jobless for a few months now. my sister is my roommate; i...
  Examples in FAILED requests:
    - ... on a long deployment which i'm not aloud to discuss but willing to give some info if you ask. just wa...
    - ...ever had to do this before and it's pretty embarrassing.  i literally have just change in my bank acco...
  Common words with 'ss' in successes:
    - 'less' (x17)
    - 'kindness' (x14)
    - 'guess' (x13)
    - 'pass' (x10)
    - 'boss' (x9)

Pattern: 'f a'
  Frequency in successes: 193/715 (0.270)
  Frequency in failures: 357/2163 (0.165)
  Ratio (success/fail): 1.635
  Examples in SUCCESSFUL reque

In [5]:
# Let's analyze the memory usage patterns from exp_003
# This will help us understand the OOM warning and plan future experiments

import sys
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb

# Recreate the feature engineering from exp_003
print("="*80)
print("MEMORY ANALYSIS: Recreating exp_003 feature engineering")
print("="*80)

# 1. TF-IDF word features (3000 features)
print("\n1. Creating TF-IDF word features...")
word_vectorizer = TfidfVectorizer(
    stop_words='english',
    max_features=3000,
    min_df=2,
    max_df=0.9,
    sublinear_tf=True
)
X_word_train = word_vectorizer.fit_transform(text_train)
X_word_test = word_vectorizer.transform(text_test)
print(f"   Word TF-IDF shape: {X_word_train.shape}")
print(f"   Memory usage: {X_word_train.data.nbytes / 1024**2:.2f} MB")

# 2. Character n-grams (2000 features)
print("\n2. Creating character n-gram features...")
char_vectorizer = TfidfVectorizer(
    analyzer='char',
    ngram_range=(3, 5),
    max_features=2000,
    min_df=2,
    max_df=0.9,
    sublinear_tf=True
)
X_char_train = char_vectorizer.fit_transform(text_train)
X_char_test = char_vectorizer.transform(text_test)
print(f"   Char n-gram shape: {X_char_train.shape}")
print(f"   Memory usage: {X_char_train.data.nbytes / 1024**2:.2f} MB")

# 3. Psycholinguistic features
print("\n3. Creating psycholinguistic features...")
psych_features = []
for text in text_train:
    features = {}
    
    # Basic patterns
    features['exclamation_count'] = text.count('!')
    features['question_count'] = text.count('?')
    features['caps_count'] = sum(1 for c in text if c.isupper())
    features['length'] = len(text)
    features['word_count'] = len(text.split())
    
    # Enhanced patterns from exp_003
    features['imgur_link'] = 1 if 'imgur.com' in text else 0
    features['please_count'] = text.count('please')
    features['thanks_count'] = text.count('thank')
    features['family_words'] = sum(text.count(w) for w in ['family', 'kids', 'child', 'children', 'mom', 'dad'])
    features['hardship_words'] = sum(text.count(w) for w in ['broke', 'poor', 'unemployed', 'jobless', 'struggling'])
    
    # Phrase patterns (from exp_003)
    features['would_be'] = text.count('would be')
    features['could_be'] = text.count('could be')
    features['appreciate'] = text.count('appreciate')
    features['any_kind'] = text.count('any kind')
    features['any_place'] = text.count('any place')
    
    psych_features.append(features)

X_psych_train = pd.DataFrame(psych_features)
X_psych_test = pd.DataFrame([
    {
        'exclamation_count': text.count('!'),
        'question_count': text.count('?'),
        'caps_count': sum(1 for c in text if c.isupper()),
        'length': len(text),
        'word_count': len(text.split()),
        'imgur_link': 1 if 'imgur.com' in text else 0,
        'please_count': text.count('please'),
        'thanks_count': text.count('thank'),
        'family_words': sum(text.count(w) for w in ['family', 'kids', 'child', 'children', 'mom', 'dad']),
        'hardship_words': sum(text.count(w) for w in ['broke', 'poor', 'unemployed', 'jobless', 'struggling']),
        'would_be': text.count('would be'),
        'could_be': text.count('could be'),
        'appreciate': text.count('appreciate'),
        'any_kind': text.count('any kind'),
        'any_place': text.count('any place')
    }
    for text in text_test
])

print(f"   Psych features shape: {X_psych_train.shape}")
print(f"   Memory usage: {sys.getsizeof(X_psych_train.values) / 1024**2:.2f} MB")

# Combine all features
print("\n4. Combining all features...")
X_combined_train = hstack([X_word_train, X_char_train, X_psych_train.values])
X_combined_test = hstack([X_word_test, X_char_test, X_psych_test.values])
print(f"   Combined shape: {X_combined_train.shape}")
print(f"   Combined memory usage: {X_combined_train.data.nbytes / 1024**2:.2f} MB")

# Check sparsity
sparsity = (X_combined_train.nnz / (X_combined_train.shape[0] * X_combined_train.shape[1])) * 100
print(f"   Sparsity: {sparsity:.2f}%")

print("\n" + "="*80)
print("MEMORY SUMMARY")
print("="*80)
print(f"Total training features: {X_combined_train.shape[1]:,}")
print(f"Total memory usage: {X_combined_train.data.nbytes / 1024**2:.2f} MB")
print(f"Samples: {X_combined_train.shape[0]:,}")
print(f"Sparsity: {sparsity:.2f}% (lower is sparser)")

MEMORY ANALYSIS: Recreating exp_003 feature engineering

1. Creating TF-IDF word features...
   Word TF-IDF shape: (2878, 3000)
   Memory usage: 0.58 MB

2. Creating character n-gram features...


   Char n-gram shape: (2878, 2000)
   Memory usage: 8.63 MB

3. Creating psycholinguistic features...
   Psych features shape: (2878, 15)
   Memory usage: 0.00 MB

4. Combining all features...
   Combined shape: (2878, 5015)
   Combined memory usage: 9.29 MB
   Sparsity: 8.44%

MEMORY SUMMARY
Total training features: 5,015
Total memory usage: 9.29 MB
Samples: 2,878
Sparsity: 8.44% (lower is sparser)


In [7]:
# Now let's test different model configurations to understand performance vs memory tradeoffs
# This will help us plan the next experiments

print("="*80)
print("MODEL PERFORMANCE ANALYSIS")
print("="*80)

# Test different LGBM configurations
configs = [
    {"name": "exp_003_config", "n_estimators": 500, "learning_rate": 0.05, "num_leaves": 31, "max_depth": -1},
    {"name": "light_config", "n_estimators": 300, "learning_rate": 0.1, "num_leaves": 20, "max_depth": 5},
    {"name": "deep_config", "n_estimators": 800, "learning_rate": 0.03, "num_leaves": 50, "max_depth": 8},
    {"name": "balanced_config", "n_estimators": 600, "learning_rate": 0.05, "num_leaves": 31, "max_depth": 6},
]

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for config in configs:
    print(f"\nTesting: {config['name']}")
    print("-" * 40)
    
    fold_scores = []
    fold_times = []
    
    for fold, (train_idx, val_idx) in enumerate(cv.split(X_combined_train, y)):
        start_time = pd.Timestamp.now()
        
        # Convert to CSR for indexing
        X_train_csr = X_combined_train.tocsr()
        X_tr, X_val = X_train_csr[train_idx], X_train_csr[val_idx]
        y_tr, y_val = y[train_idx], y[val_idx]
        
        model = lgb.LGBMClassifier(
            n_estimators=config['n_estimators'],
            learning_rate=config['learning_rate'],
            num_leaves=config['num_leaves'],
            max_depth=config['max_depth'],
            random_state=42,
            n_jobs=4
        )
        
        model.fit(X_tr, y_tr)
        preds = model.predict_proba(X_val)[:, 1]
        score = roc_auc_score(y_val, preds)
        
        end_time = pd.Timestamp.now()
        fold_time = (end_time - start_time).total_seconds()
        
        fold_scores.append(score)
        fold_times.append(fold_time)
        
        print(f"  Fold {fold+1}: AUC = {score:.4f}, Time = {fold_time:.1f}s")
    
    mean_score = np.mean(fold_scores)
    std_score = np.std(fold_scores)
    mean_time = np.mean(fold_times)
    
    print(f"  Mean: AUC = {mean_score:.4f} ± {std_score:.4f}")
    print(f"  Avg Time: {mean_time:.1f}s per fold")
    
    # Estimate memory usage during training
    # LGBM uses roughly 2-3x the data size in memory
    est_memory_mb = (X_combined_train.data.nbytes / 1024**2) * 2.5
    print(f"  Est. Memory: {est_memory_mb:.1f} MB")

print("\n" + "="*80)
print("KEY INSIGHTS")
print("="*80)
print("1. Current config (exp_003): ~500 trees, 0.05 LR, 31 leaves")
print("2. Memory usage is manageable (~9MB features, ~23MB during training)")
print("3. We can safely increase model complexity or feature count")
print("4. Character n-grams are most memory-intensive (8.6MB vs 0.6MB for words)")
print("5. Room to add more features without OOM risk")

MODEL PERFORMANCE ANALYSIS

Testing: exp_003_config
----------------------------------------


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.066121 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291224
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2621
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 1: AUC = 0.6177, Time = 24.7s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.063838 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 289135
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 2: AUC = 0.6031, Time = 24.8s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.064597 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291181
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738


  Fold 3: AUC = 0.5679, Time = 23.6s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.066524 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291739
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2620
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 4: AUC = 0.5915, Time = 26.2s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.064673 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 290361
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2616
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


  Fold 5: AUC = 0.5655, Time = 26.6s
  Mean: AUC = 0.5891 ± 0.0202
  Avg Time: 25.2s per fold
  Est. Memory: 23.2 MB

Testing: light_config
----------------------------------------


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.073837 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291224
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2621
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738










































  Fold 1: AUC = 0.6230, Time = 4.9s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.056250 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 289135
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738










































  Fold 2: AUC = 0.5909, Time = 4.8s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.049297 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291181
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738










































  Fold 3: AUC = 0.5788, Time = 4.7s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.066195 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291739
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2620
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316












































  Fold 4: AUC = 0.5774, Time = 4.9s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.048386 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 290361
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2616
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316








































  Fold 5: AUC = 0.5728, Time = 4.5s
  Mean: AUC = 0.5886 ± 0.0182
  Avg Time: 4.7s per fold
  Est. Memory: 23.2 MB

Testing: deep_config
----------------------------------------


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.048195 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291224
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2621
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738
































































































































































































  Fold 1: AUC = 0.5930, Time = 21.1s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.071015 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 289135
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738
































































































































































































  Fold 2: AUC = 0.6076, Time = 21.2s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.065509 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291181
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738




























































































































































































  Fold 3: AUC = 0.5723, Time = 20.5s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.064748 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291739
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2620
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316












































































































































































































  Fold 4: AUC = 0.5851, Time = 22.4s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.070699 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 290361
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2616
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316




















































































































































































































  Fold 5: AUC = 0.5680, Time = 23.4s
  Mean: AUC = 0.5852 ± 0.0143
  Avg Time: 21.7s per fold
  Est. Memory: 23.2 MB

Testing: balanced_config
----------------------------------------


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.070970 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291224
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2621
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738










































































































  Fold 1: AUC = 0.6116, Time = 11.6s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.068933 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 289135
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738






































































































  Fold 2: AUC = 0.6031, Time = 11.3s


[LightGBM] [Info] Number of positive: 572, number of negative: 1730
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.070642 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291181
[LightGBM] [Info] Number of data points in the train set: 2302, number of used features: 2614
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248480 -> initscore=-1.106738
[LightGBM] [Info] Start training from score -1.106738






































































































  Fold 3: AUC = 0.5661, Time = 11.1s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.055213 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 291739
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2620
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316


















































































































  Fold 4: AUC = 0.5863, Time = 12.4s


[LightGBM] [Info] Number of positive: 572, number of negative: 1731
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.069796 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 290361
[LightGBM] [Info] Number of data points in the train set: 2303, number of used features: 2616
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.248372 -> initscore=-1.107316
[LightGBM] [Info] Start training from score -1.107316
















































































































  Fold 5: AUC = 0.5763, Time = 12.3s
  Mean: AUC = 0.5887 ± 0.0168
  Avg Time: 11.7s per fold
  Est. Memory: 23.2 MB

KEY INSIGHTS
1. Current config (exp_003): ~500 trees, 0.05 LR, 31 leaves
2. Memory usage is manageable (~9MB features, ~23MB during training)
3. We can safely increase model complexity or feature count
4. Character n-grams are most memory-intensive (8.6MB vs 0.6MB for words)
5. Room to add more features without OOM risk
