# Loop 33 Analysis: Strategy Review

## Situation
- Best CV: 0.008465 (exp_026)
- Best LB: 0.0887 (exp_026)
- Target: 0.0347
- CV-LB gap: ~10.5x

## Key Questions
1. What's the CV-LB relationship across all submissions?
2. What CV would we need to hit target 0.0347?
3. What approaches haven't been tried?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887},
]

df = pd.DataFrame(submissions)
print('=== Submission History ===')
print(df.to_string(index=False))
print(f'\nCV-LB ratio: {df["lb"].mean() / df["cv"].mean():.2f}x')

In [None]:
# Linear regression to understand CV-LB relationship
slope, intercept, r_value, p_value, std_err = stats.linregress(df['cv'], df['lb'])

print(f'=== CV-LB Linear Fit ===')
print(f'LB = {slope:.4f} * CV + {intercept:.4f}')
print(f'R-squared = {r_value**2:.4f}')
print(f'\nTo hit target LB = 0.0347:')
required_cv = (0.0347 - intercept) / slope
print(f'Required CV = (0.0347 - {intercept:.4f}) / {slope:.4f} = {required_cv:.6f}')

if required_cv < 0:
    print('\nWARNING: Required CV is NEGATIVE!')
    print('This means the intercept alone is already higher than target.')
    print('The linear model predicts we CANNOT hit target with any CV improvement.')
    print('\nThis suggests we need a FUNDAMENTALLY DIFFERENT approach.')

In [None]:
# What experiments have we tried?
experiments = [
    ('exp_000', 0.011081, 'MLP Baseline'),
    ('exp_001', 0.012297, 'LightGBM'),
    ('exp_002', 0.016948, 'DRFP + PCA'),
    ('exp_003', 0.010501, 'Spange + DRFP'),
    ('exp_004', 0.051912, 'Deep Residual (FAILED)'),
    ('exp_005', 0.010430, 'Large Ensemble 15'),
    ('exp_006', 0.009749, 'Simpler [64,32]'),
    ('exp_007', 0.009262, 'Even Simpler [32,16]'),
    ('exp_008', 0.011509, 'Ridge Regression'),
    ('exp_009', 0.009192, 'Single Layer [16]'),
    ('exp_010', 0.008829, 'Diverse Ensemble'),
    ('exp_011', 0.008785, 'Simple Ensemble'),
    ('exp_012', 0.009004, 'Compliant Ensemble'),
    ('exp_022', 0.008601, 'ACS PCA Features'),
    ('exp_024', 0.008689, 'ACS PCA Fixed'),
    ('exp_025', 0.009068, 'Per-Target Models'),
    ('exp_026', 0.008465, 'Weighted Loss'),
    ('exp_027', 0.009150, 'Simple Features'),
    ('exp_028', 0.008674, 'Four-Model Ensemble'),
    ('exp_029', 0.016180, 'Normalization (FAILED)'),
    ('exp_030', 0.017057, 'Gaussian Process (FAILED)'),
    ('exp_031', 0.009984, 'CatBoost RFE'),
    ('exp_032', 0.010983, 'CatBoost 18 Features'),
]

print('=== Experiment Summary (sorted by CV) ===')
for exp, cv, name in sorted(experiments, key=lambda x: x[1]):
    status = 'BEST' if cv == 0.008465 else ''
    print(f'{exp}: CV {cv:.6f} - {name} {status}')

In [None]:
# The CRITICAL insight
print('=== CRITICAL INSIGHT ===')
print()
print('The CV-LB gap is ~10x, and the intercept of the linear fit is ~0.053.')
print('This means even with PERFECT CV (0.0), the predicted LB would be ~0.053.')
print('The target is 0.0347, which is BELOW the intercept.')
print()
print('This suggests one of two things:')
print('1. The linear relationship breaks down at lower CV values')
print('2. We need a fundamentally different approach')
print()
print('The top LB score on the leaderboard is 0.01727.')
print('If someone achieved that, they must have found a way to break the CV-LB relationship.')
print()
print('Possible explanations:')
print('- They use a different validation scheme that better matches LB')
print('- They use domain adaptation to reduce distribution shift')
print('- They use a model that generalizes better (e.g., physics-based)')
print('- They exploit some structure in the data we are missing')

In [None]:
# What approaches haven't been tried or could be improved?
print('=== UNEXPLORED OR UNDEREXPLORED APPROACHES ===')
print()
print('1. TARGET TRANSFORM (Logit) - Tried but FAILED (exp_032)')
print('   - Logit transform made things worse')
print('   - BUT: Maybe the implementation was wrong?')
print()
print('2. QUANTILE REGRESSION - NOT TRIED')
print('   - CatBoost supports quantile loss')
print('   - Could help with bounded [0,1] predictions')
print()
print('3. BETA REGRESSION - NOT TRIED')
print('   - Specifically designed for [0,1] bounded data')
print('   - Uses Beta distribution for likelihood')
print()
print('4. PHYSICS-INFORMED CONSTRAINTS - PARTIALLY TRIED')
print('   - Arrhenius features used')
print('   - But no explicit kinetic model fitting')
print()
print('5. DOMAIN ADAPTATION - NOT TRIED')
print('   - The CV-LB gap suggests distribution shift')
print('   - Could try domain adaptation techniques')
print()
print('6. DIFFERENT VALIDATION SCHEME - NOT TRIED')
print('   - Current: LOO for single, LORO for full')
print('   - Maybe the validation is too optimistic?')
print()
print('7. STACKING WITH META-LEARNER - PARTIALLY TRIED')
print('   - Simple averaging tried')
print('   - But no proper stacking with held-out predictions')