# Loop 14 Analysis: Strategic Assessment

## Current Situation
- **Best CV**: exp_011 (Simple Ensemble) = 0.008785
- **Best LB**: exp_007 ([32,16] MLP) = 0.0932
- **Target**: 0.0333
- **Submissions remaining**: 5

## Key Question
The compliant ensemble (exp_013, CV 0.009004) has NOT been submitted to LB yet.
We need to decide: Submit exp_013 or try something different?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.011081, 'lb': 0.09816},
    {'exp': 'exp_001', 'cv': 0.012297, 'lb': 0.10649},
    {'exp': 'exp_003', 'cv': 0.010501, 'lb': 0.09719},
    {'exp': 'exp_005', 'cv': 0.010430, 'lb': 0.09691},
    {'exp': 'exp_006', 'cv': 0.009749, 'lb': 0.09457},
    {'exp': 'exp_007', 'cv': 0.009262, 'lb': 0.09316},
    {'exp': 'exp_009', 'cv': 0.009192, 'lb': 0.09364},
]

df = pd.DataFrame(submissions)
df['ratio'] = df['lb'] / df['cv']
print('=== SUBMISSION HISTORY ===')
print(df.to_string(index=False))
print(f'\nCV-LB Correlation: {df["cv"].corr(df["lb"]):.4f}')

In [None]:
# Key insight: CV-LB ratio has been INCREASING
print('=== CV-LB RATIO TREND ===')
for _, row in df.iterrows():
    print(f"{row['exp']}: CV {row['cv']:.6f} -> LB {row['lb']:.5f} (ratio: {row['ratio']:.2f}x)")

print(f'\nBest LB: exp_007 with LB 0.0932 (CV 0.009262)')
print(f'Best CV: exp_009 with CV 0.009192 (LB 0.09364 - WORSE than exp_007!)')
print(f'\nCRITICAL: Better CV does NOT guarantee better LB at this point!')

In [None]:
# Experiments not yet submitted
print('=== EXPERIMENTS NOT SUBMITTED ===')
print('exp_010: Diverse Ensemble (MLP+LGBM+MLP) - CV 0.008829')
print('exp_011: Simple Ensemble (MLP+LGBM) - CV 0.008785 (BEST CV)')
print('exp_012: Compliant Ensemble - CV 0.009004')
print('exp_013: Weight Test (0.7/0.3) - CV 0.009012')
print('\nexp_012 is the compliant version of exp_011 with proper template structure.')

In [None]:
# Predict LB for exp_012 using linear fit
from scipy import stats

cv_vals = df['cv'].values
lb_vals = df['lb'].values

slope, intercept, r_value, p_value, std_err = stats.linregress(cv_vals, lb_vals)

print('=== LINEAR FIT: LB = slope * CV + intercept ===')
print(f'Slope: {slope:.4f}')
print(f'Intercept: {intercept:.6f}')
print(f'RÂ²: {r_value**2:.4f}')

# Predict for exp_012 (CV 0.009004)
exp_012_cv = 0.009004
predicted_lb = slope * exp_012_cv + intercept
print(f'\nexp_012 (CV {exp_012_cv}): Predicted LB = {predicted_lb:.5f}')
print(f'Best current LB: 0.09316 (exp_007)')
print(f'Predicted improvement: {(0.09316 - predicted_lb) / 0.09316 * 100:.2f}%')

In [None]:
# But wait - the linear fit may not hold!
# exp_009 had BETTER CV but WORSE LB than exp_007
print('=== WARNING: CV-LB DECORRELATION ===')
print('exp_007: CV 0.009262 -> LB 0.09316 (BEST LB)')
print('exp_009: CV 0.009192 -> LB 0.09364 (WORSE LB despite better CV)')
print('\nThe [16] model (exp_009) is OVERFITTING to CV structure.')
print('The [32,16] model (exp_007) generalizes better to LB.')

print('\n=== HYPOTHESIS ===')
print('The ensemble (exp_012) might NOT beat exp_007 on LB.')
print('Reason: Ensemble includes [32,16] MLP + LightGBM.')
print('LightGBM alone (exp_001) had LB 0.10649 - much worse than MLP.')
print('Adding LightGBM might HURT LB performance even if it helps CV.')

In [None]:
# What are our options?
print('=== STRATEGIC OPTIONS ===')
print('\nOption 1: Submit exp_012 (compliant ensemble)')
print('  - CV 0.009004 (best compliant)')
print('  - Predicted LB: ~0.091 (if linear fit holds)')
print('  - Risk: LightGBM component might hurt LB')

print('\nOption 2: Create compliant [32,16] MLP only')
print('  - exp_007 has BEST LB (0.0932)')
print('  - Need to make it template-compliant')
print('  - Lower risk - proven LB performance')

print('\nOption 3: Try different approaches')
print('  - GNN/GAT (but requires significant code changes)')
print('  - Different feature engineering')
print('  - Different ensemble compositions')

print('\n=== RECOMMENDATION ===')
print('Submit exp_012 first to test if ensemble beats [32,16] alone on LB.')
print('If not, create compliant [32,16] MLP for final submission.')

In [None]:
# Reality check: Can we beat the target?
print('=== TARGET ANALYSIS ===')
print(f'Target: 0.0333')
print(f'Best LB: 0.0932 (exp_007)')
print(f'Gap: {(0.0932 - 0.0333) / 0.0333 * 100:.1f}% above target')

print('\nTo beat target 0.0333, we need:')
print('  - LB improvement of 64.3%')
print('  - This is NOT achievable with tabular approaches')

print('\nGNN benchmark achieved 0.0039 MSE using:')
print('  - Graph Attention Networks')
print('  - Molecular graph message-passing')
print('  - Continuous mixture encoding')

print('\n=== REALISTIC GOAL ===')
print('Maximize LB score within tabular constraints.')
print('Best achievable: ~0.09 LB (current: 0.0932)')

In [None]:
# Summary
print('=== LOOP 14 SUMMARY ===')
print('\n1. exp_014 tested MLP 0.7 / LightGBM 0.3 weights')
print('   Result: CV 0.009012 (0.09% worse than 0.6/0.4)')
print('   Conclusion: 0.6/0.4 weighting is near-optimal')

print('\n2. CV-LB decorrelation is real:')
print('   - exp_009 ([16]) has best CV but worse LB than exp_007 ([32,16])')
print('   - Better CV does NOT guarantee better LB')

print('\n3. Compliant ensemble (exp_012) needs LB validation')
print('   - CV 0.009004 is good, but LB is unknown')
print('   - LightGBM component might hurt LB')

print('\n4. Target 0.0333 is unreachable with tabular approaches')
print('   - Best LB is 0.0932 (2.8x above target)')
print('   - GNN approaches needed for target')

print('\n=== NEXT STEPS ===')
print('1. Submit exp_012 to get LB feedback')
print('2. If ensemble is worse than exp_007, create compliant [32,16] MLP')
print('3. Focus on maximizing LB within tabular constraints')