# Loop 11 Submission Decision Analysis

Analyzing whether to submit exp_011 based on CV-LB relationship.

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

# Historical CV-LB data
submissions = [
    {'exp': 'exp_000', 'cv': 0.8067, 'lb': 0.7971},
    {'exp': 'exp_003', 'cv': 0.8195, 'lb': 0.8045},
    {'exp': 'exp_004', 'cv': 0.8193, 'lb': 0.8041},
    {'exp': 'exp_006', 'cv': 0.8171, 'lb': 0.8010},
]

df = pd.DataFrame(submissions)
print("Historical Submissions:")
print(df)
print(f"\nCV-LB correlation: {df['cv'].corr(df['lb']):.4f}")

Historical Submissions:
       exp      cv      lb
0  exp_000  0.8067  0.7971
1  exp_003  0.8195  0.8045
2  exp_004  0.8193  0.8041
3  exp_006  0.8171  0.8010

CV-LB correlation: 0.9572


In [2]:
# Linear regression to predict LB from CV
from scipy.stats import linregress

slope, intercept, r_value, p_value, std_err = linregress(df['cv'], df['lb'])
print(f"Linear model: LB = {slope:.4f} * CV + {intercept:.4f}")
print(f"R-squared: {r_value**2:.4f}")
print(f"Standard error: {std_err:.4f}")

# Predict LB for exp_011
exp_011_cv = 0.82032
predicted_lb = slope * exp_011_cv + intercept
print(f"\nexp_011 CV: {exp_011_cv:.5f}")
print(f"Predicted LB: {predicted_lb:.5f}")
print(f"Best LB so far: 0.8045")
print(f"Predicted improvement: {predicted_lb - 0.8045:.5f}")

Linear model: LB = 0.5410 * CV + 0.3604
R-squared: 0.9162
Standard error: 0.1157

exp_011 CV: 0.82032
Predicted LB: 0.80420
Best LB so far: 0.8045
Predicted improvement: -0.00030


In [3]:
# Calculate confidence interval for prediction
# Using simple approach: prediction +/- 2*std_err
ci_low = predicted_lb - 2 * std_err * (exp_011_cv - df['cv'].mean())
ci_high = predicted_lb + 2 * std_err * (exp_011_cv - df['cv'].mean())

print(f"Predicted LB: {predicted_lb:.5f}")
print(f"95% CI: [{ci_low:.5f}, {ci_high:.5f}]")
print(f"\nBest LB: 0.8045")
print(f"Probability of beating best LB: {'HIGH' if predicted_lb > 0.8045 else 'LOW'}")

Predicted LB: 0.80420
95% CI: [0.80312, 0.80528]

Best LB: 0.8045
Probability of beating best LB: LOW


In [4]:
# Key decision factors
print("=" * 50)
print("SUBMISSION DECISION ANALYSIS")
print("=" * 50)

print("\n1. CV IMPROVEMENT:")
print(f"   exp_011 CV: {exp_011_cv:.5f}")
print(f"   exp_003 CV: 0.81951")
print(f"   Improvement: +{exp_011_cv - 0.81951:.5f} (+{(exp_011_cv - 0.81951)/0.81951*100:.2f}%)")

print("\n2. PREDICTED LB:")
print(f"   Predicted: {predicted_lb:.5f}")
print(f"   Best LB: 0.8045")
print(f"   Expected change: {predicted_lb - 0.8045:+.5f}")

print("\n3. REGULARIZATION EFFECT:")
print("   - exp_011 uses stronger regularization (depth=6, l2_leaf_reg=7.0)")
print("   - Regularization IMPROVED CV (not decreased)")
print("   - This suggests we were underfitting, not overfitting")
print("   - BUT regularization might help generalization (reduce CV-LB gap)")

print("\n4. SUBMISSIONS REMAINING: 6")
print("   - Can afford to test this hypothesis")
print("   - If LB improves, regularization is helping")
print("   - If LB stays same/worsens, high CV may be due to fold variance")

print("\n5. RISK ASSESSMENT:")
print("   - Worst case: LB ~0.803 (similar to exp_006)")
print("   - Best case: LB ~0.806 (beat our best)")
print("   - Expected: LB ~0.804 (similar to best)")

SUBMISSION DECISION ANALYSIS

1. CV IMPROVEMENT:
   exp_011 CV: 0.82032
   exp_003 CV: 0.81951
   Improvement: +0.00081 (+0.10%)

2. PREDICTED LB:
   Predicted: 0.80420
   Best LB: 0.8045
   Expected change: -0.00030

3. REGULARIZATION EFFECT:
   - exp_011 uses stronger regularization (depth=6, l2_leaf_reg=7.0)
   - Regularization IMPROVED CV (not decreased)
   - This suggests we were underfitting, not overfitting
   - BUT regularization might help generalization (reduce CV-LB gap)

4. SUBMISSIONS REMAINING: 6
   - Can afford to test this hypothesis
   - If LB improves, regularization is helping
   - If LB stays same/worsens, high CV may be due to fold variance

5. RISK ASSESSMENT:
   - Worst case: LB ~0.803 (similar to exp_006)
   - Best case: LB ~0.806 (beat our best)
   - Expected: LB ~0.804 (similar to best)


In [5]:
# Final recommendation
print("\n" + "=" * 50)
print("RECOMMENDATION: SUBMIT exp_011")
print("=" * 50)
print("""
Rationale:
1. Best CV achieved (0.82032) - exceeds all previous experiments
2. Regularization might help with CV-LB gap
3. We have 6 submissions remaining - can afford to test
4. Even if LB doesn't improve, we learn about regularization effect
5. Evaluator strongly recommends submission

Expected outcome:
- LB ~0.8042 (slightly below best 0.8045)
- But regularization might surprise us with better generalization

Next steps after submission:
1. If LB improves: Continue with regularization approach
2. If LB stays same: Try GroupKFold to reduce variance
3. If LB worsens: Regularization is hurting, revert to exp_003 params
""")


RECOMMENDATION: SUBMIT exp_011

Rationale:
1. Best CV achieved (0.82032) - exceeds all previous experiments
2. Regularization might help with CV-LB gap
3. We have 6 submissions remaining - can afford to test
4. Even if LB doesn't improve, we learn about regularization effect
5. Evaluator strongly recommends submission

Expected outcome:
- LB ~0.8042 (slightly below best 0.8045)
- But regularization might surprise us with better generalization

Next steps after submission:
1. If LB improves: Continue with regularization approach
2. If LB stays same: Try GroupKFold to reduce variance
3. If LB worsens: Regularization is hurting, revert to exp_003 params



In [6]:
# Alternative: What if we DON'T submit and try GroupKFold first?
print("\n" + "=" * 50)
print("ALTERNATIVE: Try GroupKFold first, then submit")
print("=" * 50)
print("""
Pros:
- GroupKFold might give more stable CV estimates
- Could reduce fold variance (currently 4.4% range)
- Might reveal if exp_011's high CV is due to lucky folds

Cons:
- Delays LB feedback on regularization effect
- 77.3% are solo travelers - GroupKFold may not help much
- We already have a promising result to test

Verdict: Submit first, then try GroupKFold
- LB feedback is more valuable than GroupKFold CV
- We can try GroupKFold in parallel after submission
""")


ALTERNATIVE: Try GroupKFold first, then submit

Pros:
- GroupKFold might give more stable CV estimates
- Could reduce fold variance (currently 4.4% range)
- Might reveal if exp_011's high CV is due to lucky folds

Cons:
- Delays LB feedback on regularization effect
- 77.3% are solo travelers - GroupKFold may not help much
- We already have a promising result to test

Verdict: Submit first, then try GroupKFold
- LB feedback is more valuable than GroupKFold CV
- We can try GroupKFold in parallel after submission

