# Loop 29 Analysis: Critical Assessment After 28 Experiments

**Current State:**
- Best CV: 0.008465 (exp_026)
- Best LB: 0.0887 (exp_026)
- Target: 0.01727
- CV-LB ratio: ~10.5x
- Linear fit: LB = 4.22*CV + 0.0533 (R²=0.96)
- Submissions remaining: 3

**Latest Experiment (exp_028):**
- Four-model ensemble (MLP+LGBM+XGB+CatBoost)
- CV 0.008674 (2.47% WORSE than exp_026)
- Adding more tree models did NOT help

**Critical Question:**
What approaches remain unexplored that could break the CV-LB pattern?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# All 10 submissions with CV and LB scores
submissions = [
    {'id': 'exp_000', 'cv': 0.011081, 'lb': 0.09816},
    {'id': 'exp_001', 'cv': 0.012297, 'lb': 0.10649},
    {'id': 'exp_003', 'cv': 0.010501, 'lb': 0.09719},
    {'id': 'exp_005', 'cv': 0.01043, 'lb': 0.09691},
    {'id': 'exp_006', 'cv': 0.009749, 'lb': 0.09457},
    {'id': 'exp_007', 'cv': 0.009262, 'lb': 0.09316},
    {'id': 'exp_009', 'cv': 0.009192, 'lb': 0.09364},
    {'id': 'exp_012', 'cv': 0.009004, 'lb': 0.09134},
    {'id': 'exp_024', 'cv': 0.008689, 'lb': 0.08929},
    {'id': 'exp_026', 'cv': 0.008465, 'lb': 0.08875},
]

df = pd.DataFrame(submissions)
print('All submissions:')
print(df.to_string(index=False))

# Linear fit
cv = df['cv'].values
lb = df['lb'].values
slope, intercept, r_value, p_value, std_err = stats.linregress(cv, lb)
print(f'\nLinear fit: LB = {slope:.4f} * CV + {intercept:.5f}')
print(f'R² = {r_value**2:.4f}')
print(f'\nTarget: 0.01727')
print(f'Best LB: {df["lb"].min():.5f}')
print(f'Gap to target: {df["lb"].min() / 0.01727:.2f}x')

In [None]:
# What CV would we need to reach the target?
target = 0.01727
required_cv = (target - intercept) / slope
print(f'=== Target Analysis ===')
print(f'Target LB: {target}')
print(f'Required CV (from linear fit): {required_cv:.6f}')
print(f'This is NEGATIVE - impossible with current approach!')
print(f'\nThe intercept ({intercept:.5f}) is {intercept/target:.2f}x higher than target.')
print(f'Even with CV=0, predicted LB would be {intercept:.5f}')

In [None]:
# What experiments have been tried?
experiments_tried = [
    ('exp_000', 'MLP [128,128,64], Spange only, HuberLoss, 3 models'),
    ('exp_001', 'LightGBM, Spange only'),
    ('exp_002', 'DRFP with PCA (100 components) - FAILED'),
    ('exp_003', 'MLP [256,128,64], Spange+DRFP, HuberLoss, 5 models'),
    ('exp_004', 'Deep Residual MLP - FAILED'),
    ('exp_005', 'MLP [256,128,64], Spange+DRFP, 15 models'),
    ('exp_006', 'MLP [64,32], Spange+DRFP, 5 models'),
    ('exp_007', 'MLP [32,16], Spange+DRFP, 5 models'),
    ('exp_008', 'MLP [16], single layer'),
    ('exp_009', 'Ridge Regression'),
    ('exp_010', 'MLP [16], single layer'),
    ('exp_011', 'Diverse ensemble'),
    ('exp_012', 'MLP [32,16] + LightGBM ensemble'),
    ('exp_013', 'Compliant ensemble'),
    ('exp_014', 'Ensemble weights tuning'),
    ('exp_015', 'Three-model ensemble'),
    ('exp_016', 'Final summary'),
    ('exp_017', 'Attention model'),
    ('exp_018', 'Fragprints features'),
    ('exp_019', 'ACS PCA features'),
    ('exp_023', 'ACS PCA compliant'),
    ('exp_024', 'ACS PCA fixed'),
    ('exp_025', 'Per-target models - FAILED'),
    ('exp_026', 'Weighted loss [1,1,2] - BEST'),
    ('exp_027', 'Simple features (23) - FAILED'),
    ('exp_028', 'Four-model ensemble (MLP+LGBM+XGB+CatBoost) - FAILED'),
]

print('=== Experiments Tried (28 total) ===')
for exp_id, desc in experiments_tried:
    print(f'{exp_id}: {desc}')

In [None]:
# What approaches have NOT been tried?
print('=== UNEXPLORED Approaches ===')
unexplored = [
    ('Gaussian Process Regression', 'Competition description mentions GP for imputation'),
    ('Physics constraint (SM+P2+P3≈1)', 'Mass balance constraint for regularization'),
    ('Higher SM weights [1,1,3] or [1,1,4]', 'SM is still the bottleneck'),
    ('Stacking meta-learner', 'Train a meta-model on base predictions'),
    ('Learned loss weights (homoscedastic)', 'Kendall et al. uncertainty weighting'),
    ('Post-processing normalization', 'Normalize predictions to sum to 1'),
    ('Domain adaptation', 'Handle distribution shift explicitly'),
    ('Adversarial validation', 'Identify features causing distribution shift'),
]

for approach, rationale in unexplored:
    print(f'\n{approach}:')
    print(f'  Rationale: {rationale}')

In [None]:
# Key insight from "mixall" kernel
print('=== Key Insight from Kaggle Kernels ===')
print('\n"mixall" kernel uses GroupKFold(n_splits=5) instead of Leave-One-Out!')
print('This is a DIFFERENT CV scheme than what we use.')
print('\nOur CV scheme:')
print('- Single solvents: Leave-one-solvent-out (24 folds)')
print('- Mixtures: Leave-one-ramp-out (13 folds)')
print('- Total: 37 folds')
print('\nPossible LB CV scheme:')
print('- GroupKFold (5 folds)')
print('- Different random seed')
print('- Different data ordering')
print('\nThis could explain the CV-LB gap!')

In [None]:
# Another key insight: Post-processing normalization
print('=== Post-Processing Normalization ===')
print('\nFrom "mr0106/catechol" kernel:')
print('```python')
print('# Post-processing: Chemical constraints (Clip and Normalize)')
print('# Ensure outputs are between 0 and 1')
print('preds = np.clip(preds, 0, 1)')
print('')
print('# Normalize rows so the sum of products equals 1 (or 100%)')
print('row_sums = preds.sum(axis=1)[:, np.newaxis]')
print('row_sums[row_sums == 0] = 1 # Avoid division by zero')
print('preds = preds / row_sums')
print('```')
print('\nThis enforces the physical constraint that SM + P2 + P3 = 1!')
print('We have NOT tried this post-processing step.')

In [None]:
# Priority ranking for next experiments
print('=== PRIORITY RANKING ===')
print('\n1. HIGHEST PRIORITY: Post-Processing Normalization')
print('   - Enforce SM + P2 + P3 = 1 constraint')
print('   - Simple to implement, no retraining needed')
print('   - Used by other competitors')
print('   - Physical constraint for regularization')

print('\n2. HIGH PRIORITY: Higher SM Weights [1,1,3]')
print('   - SM is still the hardest target')
print('   - Weighted loss [1,1,2] improved all targets by 2.58%')
print('   - More aggressive weighting may help further')

print('\n3. MEDIUM PRIORITY: Gaussian Process Regression')
print('   - Competition description mentions GP')
print('   - Different model type with uncertainty quantification')
print('   - May have different generalization properties')

print('\n4. LOWER PRIORITY: Stacking Meta-Learner')
print('   - Train a simple model on base predictions')
print('   - Can learn optimal combination weights')
print('   - May improve generalization')

In [None]:
# Final recommendation
print('=== FINAL RECOMMENDATION ===')
print('\nGiven:')
print('- 3 submissions remaining')
print('- CV-LB gap is ~10x (structural, not model-specific)')
print('- Target 0.01727 is 5x better than best LB 0.0887')
print('- exp_028 (four-model ensemble) was WORSE than exp_026')

print('\nStrategy:')
print('1. Try post-processing normalization (SM+P2+P3=1)')
print('2. Try higher SM weights [1,1,3]')
print('3. Combine both: weighted loss + normalization')

print('\nKey insight:')
print('The CV-LB gap is the fundamental problem.')
print('We need approaches that GENERALIZE better, not just improve CV.')
print('Post-processing normalization is a physics-based constraint that may help.')
print('\nDO NOT SUBMIT exp_028 - it is worse than exp_026.')