# Loop 4 LB Feedback Analysis

## Submission Results
- exp_003 (Tuned CatBoost): CV 0.8195 → LB 0.8045 (gap: +0.0150)
- exp_000 (XGBoost Baseline): CV 0.8067 → LB 0.7971 (gap: +0.0097)

## Key Questions
1. Why did the CV-LB gap increase from 0.97% to 1.50%?
2. What approaches can reduce this gap?
3. What's the best path to beat top LB (~0.8066)?

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# CV-LB Gap Analysis
submissions = [
    {'exp': 'exp_000', 'model': 'XGBoost Baseline', 'cv': 0.80674, 'lb': 0.79705},
    {'exp': 'exp_003', 'model': 'Tuned CatBoost', 'cv': 0.81951, 'lb': 0.80453}
]

df = pd.DataFrame(submissions)
df['gap'] = df['cv'] - df['lb']
df['gap_pct'] = df['gap'] / df['cv'] * 100

print("CV-LB Gap Analysis:")
print(df.to_string(index=False))
print(f"\nAverage gap: {df['gap'].mean():.5f} ({df['gap_pct'].mean():.2f}%)")
print(f"\nObservation: Gap increased from {df.iloc[0]['gap']:.5f} to {df.iloc[1]['gap']:.5f}")
print(f"This suggests the tuned model is slightly overfitting to CV folds.")

CV-LB Gap Analysis:
    exp            model      cv      lb     gap  gap_pct
exp_000 XGBoost Baseline 0.80674 0.79705 0.00969 1.201130
exp_003   Tuned CatBoost 0.81951 0.80453 0.01498 1.827922

Average gap: 0.01233 (1.51%)

Observation: Gap increased from 0.00969 to 0.01498
This suggests the tuned model is slightly overfitting to CV folds.


In [2]:
# What would different CV scores predict for LB?
print("\nLB Prediction Calibration:")
print("="*50)

# Using average gap
avg_gap = df['gap'].mean()
print(f"Using average gap of {avg_gap:.5f}:")
for cv in [0.82, 0.825, 0.83]:
    predicted_lb = cv - avg_gap
    print(f"  CV {cv:.3f} → Predicted LB {predicted_lb:.4f}")

# Using conservative gap (from tuned model)
conservative_gap = df.iloc[1]['gap']
print(f"\nUsing conservative gap of {conservative_gap:.5f} (from tuned model):")
for cv in [0.82, 0.825, 0.83]:
    predicted_lb = cv - conservative_gap
    print(f"  CV {cv:.3f} → Predicted LB {predicted_lb:.4f}")

print(f"\nTo beat top LB of ~0.8066, we need:")
print(f"  - CV of {0.8066 + avg_gap:.4f} (using avg gap)")
print(f"  - CV of {0.8066 + conservative_gap:.4f} (using conservative gap)")


LB Prediction Calibration:
Using average gap of 0.01233:
  CV 0.820 → Predicted LB 0.8077
  CV 0.825 → Predicted LB 0.8127
  CV 0.830 → Predicted LB 0.8177

Using conservative gap of 0.01498 (from tuned model):
  CV 0.820 → Predicted LB 0.8050
  CV 0.825 → Predicted LB 0.8100
  CV 0.830 → Predicted LB 0.8150

To beat top LB of ~0.8066, we need:
  - CV of 0.8189 (using avg gap)
  - CV of 0.8216 (using conservative gap)


In [3]:
# Analyze what's working and what's not
print("\nExperiment Trajectory Analysis:")
print("="*50)

experiments = [
    {'exp': 'exp_000', 'model': 'XGBoost Baseline', 'cv': 0.80674, 'lb': 0.79705, 'features': 35},
    {'exp': 'exp_001', 'model': 'XGBoost + Features', 'cv': 0.80927, 'lb': None, 'features': 56},
    {'exp': 'exp_002', 'model': '3-Model Ensemble', 'cv': 0.81353, 'lb': None, 'features': 56},
    {'exp': 'exp_003', 'model': 'Tuned CatBoost', 'cv': 0.81951, 'lb': 0.80453, 'features': 56}
]

print("\nCV Improvements:")
for i in range(1, len(experiments)):
    prev = experiments[i-1]
    curr = experiments[i]
    cv_delta = curr['cv'] - prev['cv']
    print(f"  {prev['exp']} → {curr['exp']}: {cv_delta:+.5f} ({cv_delta/prev['cv']*100:+.2f}%)")

print("\nLB Improvement:")
lb_delta = 0.80453 - 0.79705
print(f"  exp_000 → exp_003: {lb_delta:+.5f} ({lb_delta/0.79705*100:+.2f}%)")
print(f"\nKey insight: LB improved by {lb_delta:.4f} while CV improved by {0.81951-0.80674:.4f}")
print(f"LB improvement rate: {lb_delta/(0.81951-0.80674)*100:.1f}% of CV improvement")


Experiment Trajectory Analysis:

CV Improvements:
  exp_000 → exp_001: +0.00253 (+0.31%)
  exp_001 → exp_002: +0.00426 (+0.53%)
  exp_002 → exp_003: +0.00598 (+0.74%)

LB Improvement:
  exp_000 → exp_003: +0.00748 (+0.94%)

Key insight: LB improved by 0.0075 while CV improved by 0.0128
LB improvement rate: 58.6% of CV improvement


In [4]:
# Unexplored approaches analysis
print("\nUnexplored Approaches:")
print("="*50)

approaches = [
    ('Threshold tuning', 'Quick win, default 0.5 may not be optimal', 'High'),
    ('CatBoost native categoricals', 'Use cat_features instead of label encoding', 'Medium'),
    ('Feature selection', 'Remove low-importance features (56 may have noise)', 'Medium'),
    ('Stacking with meta-learner', 'Use OOF predictions as features for LR', 'Medium'),
    ('Blend baseline + tuned', 'Average to reduce variance', 'Low'),
    ('Pseudo-labeling', 'Use high-confidence test predictions', 'Low'),
    ('Different CV strategy', 'Use different seeds to reduce fold overfitting', 'Medium')
]

print(f"{'Approach':<30} {'Rationale':<50} {'Priority'}")
print("-"*90)
for approach, rationale, priority in approaches:
    print(f"{approach:<30} {rationale:<50} {priority}")


Unexplored Approaches:
Approach                       Rationale                                          Priority
------------------------------------------------------------------------------------------
Threshold tuning               Quick win, default 0.5 may not be optimal          High
CatBoost native categoricals   Use cat_features instead of label encoding         Medium
Feature selection              Remove low-importance features (56 may have noise) Medium
Stacking with meta-learner     Use OOF predictions as features for LR             Medium
Blend baseline + tuned         Average to reduce variance                         Low
Pseudo-labeling                Use high-confidence test predictions               Low
Different CV strategy          Use different seeds to reduce fold overfitting     Medium


In [5]:
# Priority recommendations
print("\n" + "="*60)
print("PRIORITY RECOMMENDATIONS FOR NEXT LOOP")
print("="*60)

print("""
1. THRESHOLD TUNING (Immediate)
   - We have OOF predictions from tuned CatBoost
   - Default threshold of 0.5 may not be optimal
   - Target distribution is 50.36% transported
   - Potential gain: 0.1-0.3% on CV, may translate to LB

2. CATBOOST NATIVE CATEGORICAL HANDLING (High Priority)
   - Currently using label encoding for all categoricals
   - CatBoost's cat_features parameter can improve performance
   - This is a known best practice we haven't tried
   - Potential gain: 0.1-0.5%

3. BLEND BASELINE + TUNED CATBOOST (Medium Priority)
   - Baseline CatBoost: CV 0.81836, std 0.00431 (lower variance)
   - Tuned CatBoost: CV 0.81951, std 0.00685 (higher variance)
   - Blending may reduce the CV-LB gap
   - Potential gain: Better LB generalization

4. STACKING WITH META-LEARNER (Medium Priority)
   - Use OOF predictions from XGBoost, LightGBM, CatBoost
   - Train logistic regression on OOF predictions
   - May capture complementary patterns
   - Potential gain: 0.2-0.5%

5. FEATURE SELECTION (Lower Priority)
   - 56 features may include noise
   - Try removing bottom 10-20% by importance
   - May reduce overfitting and improve LB
""")

print("\nTarget: Beat LB 0.8066 (top solutions)")
print(f"Current best LB: 0.8045 (gap to target: {0.8066-0.8045:.4f})")
print(f"Remaining submissions: 8")


PRIORITY RECOMMENDATIONS FOR NEXT LOOP

1. THRESHOLD TUNING (Immediate)
   - We have OOF predictions from tuned CatBoost
   - Default threshold of 0.5 may not be optimal
   - Target distribution is 50.36% transported
   - Potential gain: 0.1-0.3% on CV, may translate to LB

2. CATBOOST NATIVE CATEGORICAL HANDLING (High Priority)
   - Currently using label encoding for all categoricals
   - CatBoost's cat_features parameter can improve performance
   - This is a known best practice we haven't tried
   - Potential gain: 0.1-0.5%

3. BLEND BASELINE + TUNED CATBOOST (Medium Priority)
   - Baseline CatBoost: CV 0.81836, std 0.00431 (lower variance)
   - Tuned CatBoost: CV 0.81951, std 0.00685 (higher variance)
   - Blending may reduce the CV-LB gap
   - Potential gain: Better LB generalization

4. STACKING WITH META-LEARNER (Medium Priority)
   - Use OOF predictions from XGBoost, LightGBM, CatBoost
   - Train logistic regression on OOF predictions
   - May capture complementary patterns
  