# Loop 65 Analysis: Strategic Assessment

## Key Problem
- Best CV: 0.008194 (exp_032)
- Best LB: 0.0877 (exp_030)
- CV-LB relationship: LB = 4.21 × CV + 0.0535 (R² = 0.98)
- **CRITICAL**: Intercept (0.0535) > Target (0.0347)
- Target: 0.0347

## What This Means
Even with CV=0 (impossible), the predicted LB would be 0.0535 > target.
The current approach CANNOT reach the target by minimizing CV alone.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887},
    {'exp': 'exp_030', 'cv': 0.0083, 'lb': 0.0877},
    {'exp': 'exp_041', 'cv': 0.0090, 'lb': 0.0932},
    {'exp': 'exp_042', 'cv': 0.0145, 'lb': 0.1147},
]

df = pd.DataFrame(submissions)
print('Submission History:')
print(df.to_string())

In [None]:
# Fit CV-LB relationship
from scipy import stats

cv = df['cv'].values
lb = df['lb'].values

slope, intercept, r_value, p_value, std_err = stats.linregress(cv, lb)
print(f'\nCV-LB Relationship:')
print(f'LB = {slope:.2f} × CV + {intercept:.4f}')
print(f'R² = {r_value**2:.4f}')
print(f'\nIntercept: {intercept:.4f}')
print(f'Target: 0.0347')
print(f'Gap: {intercept - 0.0347:.4f}')

In [None]:
# What CV would we need to hit target?
target = 0.0347
required_cv = (target - intercept) / slope
print(f'\nRequired CV to hit target: {required_cv:.6f}')
if required_cv < 0:
    print('IMPOSSIBLE: Required CV is NEGATIVE!')
    print('The current approach CANNOT reach the target.')

In [None]:
# Analyze residuals - which submissions beat the trend?
df['predicted_lb'] = slope * df['cv'] + intercept
df['residual'] = df['lb'] - df['predicted_lb']
df['residual_pct'] = df['residual'] / df['predicted_lb'] * 100

print('\nResidual Analysis (negative = better than expected):')
print(df[['exp', 'cv', 'lb', 'predicted_lb', 'residual', 'residual_pct']].to_string())

print(f'\nBest residual: {df["residual"].min():.4f} ({df.loc[df["residual"].idxmin(), "exp"]})')
print(f'Worst residual: {df["residual"].max():.4f} ({df.loc[df["residual"].idxmax(), "exp"]})')

In [None]:
# What approaches have been tried?
approaches_tried = [
    'MLP with Spange features',
    'LightGBM',
    'DRFP features (PCA)',
    'Combined Spange + DRFP + ACS PCA',
    'Deep Residual MLP (FAILED)',
    'Large Ensemble (15 models)',
    'Simpler models [64, 32]',
    'Ridge Regression',
    'Diverse Ensemble (MLP + LightGBM)',
    'GP + MLP + LGBM ensemble',
    'Per-target optimization',
    'Per-solvent-type models (FAILED)',
    'GNN/GAT (FAILED)',
    'ChemBERTa (FAILED)',
    'TabNet (FAILED)',
    'Importance weighting',
    'Mixup augmentation',
    'Uncertainty weighting',
    'Isotonic calibration',
    'Prediction shrinkage',
    'GroupKFold CV',
    'Aggressive regularization',
    'Physical constraints (mass balance)',
    'Conformalized Quantile Regression',
]

print('Approaches Tried (66 experiments):')
for i, approach in enumerate(approaches_tried, 1):
    print(f'{i}. {approach}')

In [None]:
# What HASN'T been tried?
untried_approaches = [
    '1. Multi-task learning with auxiliary targets (e.g., predict solvent properties)',
    '2. Domain adaptation techniques (e.g., DANN)',
    '3. Meta-learning (MAML) for few-shot adaptation to new solvents',
    '4. Bayesian Neural Networks for uncertainty quantification',
    '5. Neural Process for conditional predictions',
    '6. Prototype networks for solvent similarity',
    '7. Contrastive learning for solvent representations',
    '8. Self-supervised pre-training on solvent data',
    '9. Transfer learning from related chemistry tasks',
    '10. Ensemble of fundamentally different architectures (not just weights)',
]

print('\nPotentially Untried Approaches:')
for approach in untried_approaches:
    print(approach)

In [None]:
# Key insight: The problem is OOD generalization
# The test set likely contains solvents NOT in training
# Our CV (leave-one-solvent-out) simulates this but the gap suggests
# the test solvents are MORE different than any single training solvent

print('\n=== KEY INSIGHT ===')
print('The CV-LB gap suggests the test solvents are MORE different from training')
print('than any single training solvent is from the rest.')
print('')
print('This is an EXTRAPOLATION problem, not an INTERPOLATION problem.')
print('')
print('Possible reasons for the gap:')
print('1. Test solvents have properties outside the training range')
print('2. Test solvents belong to different chemical families')
print('3. The model overfits to training solvent patterns')
print('')
print('What might help:')
print('1. Features that generalize better across chemical space')
print('2. Models that are more robust to distribution shift')
print('3. Regularization that prevents overfitting to training solvents')