# Loop 45 Strategic Analysis

## Key Questions:
1. What is the actual CV-LB relationship?
2. What would it take to reach target 0.073?
3. What unexplored approaches remain?
4. Should we submit exp_044 or try something else?

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Submission history
submissions = [
    ('exp_000', 0.011081, 0.09816),
    ('exp_001', 0.012297, 0.10649),
    ('exp_003', 0.010501, 0.09719),
    ('exp_005', 0.01043, 0.09691),
    ('exp_006', 0.009749, 0.09457),
    ('exp_007', 0.009262, 0.09316),
    ('exp_009', 0.009192, 0.09364),
    ('exp_012', 0.009004, 0.09134),
    ('exp_024', 0.008689, 0.08929),
    ('exp_026', 0.008465, 0.08875),
    ('exp_030', 0.008298, 0.08772),
    ('exp_035', 0.009825, 0.09696),
]

df = pd.DataFrame(submissions, columns=['exp', 'cv', 'lb'])
print('=== Submission History ===')
print(df.to_string(index=False))
print()

# Fit linear regression
X = df['cv'].values.reshape(-1, 1)
y = df['lb'].values
reg = LinearRegression().fit(X, y)
print(f'CV-LB Relationship: LB = {reg.coef_[0]:.4f} * CV + {reg.intercept_:.4f}')
print(f'R² = {reg.score(X, y):.4f}')
print()

# What CV would we need to hit target?
target = 0.073
required_cv = (target - reg.intercept_) / reg.coef_[0]
print(f'Target LB: {target}')
print(f'Required CV to hit target: {required_cv:.6f}')
print(f'Current best CV: {df["cv"].min():.6f}')
print(f'Gap: {df["cv"].min() - required_cv:.6f}')
print()

# Best LB so far
best_lb = df['lb'].min()
best_cv = df.loc[df['lb'].idxmin(), 'cv']
print(f'Best LB: {best_lb:.5f} (from CV {best_cv:.6f})')
print(f'Gap to target: {best_lb - target:.5f} ({(best_lb - target)/target*100:.1f}%)')

=== Submission History ===
    exp       cv      lb
exp_000 0.011081 0.09816
exp_001 0.012297 0.10649
exp_003 0.010501 0.09719
exp_005 0.010430 0.09691
exp_006 0.009749 0.09457
exp_007 0.009262 0.09316
exp_009 0.009192 0.09364
exp_012 0.009004 0.09134
exp_024 0.008689 0.08929
exp_026 0.008465 0.08875
exp_030 0.008298 0.08772
exp_035 0.009825 0.09696

CV-LB Relationship: LB = 4.2876 * CV + 0.0528
R² = 0.9523

Target LB: 0.073
Required CV to hit target: 0.004715
Current best CV: 0.008298
Gap: 0.003583

Best LB: 0.08772 (from CV 0.008298)
Gap to target: 0.01472 (20.2%)


In [2]:
# Analyze the anomaly: exp_035 had worse CV but similar LB to exp_030
print('=== Anomaly Analysis ===')
print()
print('exp_030: CV=0.008298, LB=0.08772 (BEST LB)')
print('exp_035: CV=0.009825, LB=0.09696')
print()
print('exp_035 had 18.4% worse CV but only 10.5% worse LB')
print('This suggests the CV-LB relationship may not be perfectly linear')
print()

# Calculate residuals
df['predicted_lb'] = reg.predict(df['cv'].values.reshape(-1, 1))
df['residual'] = df['lb'] - df['predicted_lb']
print('Residuals from linear fit:')
print(df[['exp', 'cv', 'lb', 'predicted_lb', 'residual']].to_string(index=False))
print()
print(f'Mean residual: {df["residual"].mean():.6f}')
print(f'Std residual: {df["residual"].std():.6f}')

=== Anomaly Analysis ===

exp_030: CV=0.008298, LB=0.08772 (BEST LB)
exp_035: CV=0.009825, LB=0.09696

exp_035 had 18.4% worse CV but only 10.5% worse LB
This suggests the CV-LB relationship may not be perfectly linear

Residuals from linear fit:
    exp       cv      lb  predicted_lb  residual
exp_000 0.011081 0.09816      0.100296 -0.002136
exp_001 0.012297 0.10649      0.105510  0.000980
exp_003 0.010501 0.09719      0.097809 -0.000619
exp_005 0.010430 0.09691      0.097505 -0.000595
exp_006 0.009749 0.09457      0.094585 -0.000015
exp_007 0.009262 0.09316      0.092497  0.000663
exp_009 0.009192 0.09364      0.092196  0.001444
exp_012 0.009004 0.09134      0.091390 -0.000050
exp_024 0.008689 0.08929      0.090040 -0.000750
exp_026 0.008465 0.08875      0.089079 -0.000329
exp_030 0.008298 0.08772      0.088363 -0.000643
exp_035 0.009825 0.09696      0.094911  0.002049

Mean residual: -0.000000
Std residual: 0.001131


In [3]:
# What if we could reduce the intercept?
print('=== Intercept Analysis ===')
print()
print(f'Current intercept: {reg.intercept_:.4f}')
print(f'Target: {target}')
print()
print('If intercept were 0.04 (instead of 0.0528):')
new_intercept = 0.04
required_cv_new = (target - new_intercept) / reg.coef_[0]
print(f'  Required CV: {required_cv_new:.6f}')
print(f'  This is achievable with current best CV {df["cv"].min():.6f}')
print()
print('Key insight: The intercept is the bottleneck, not the CV')
print('We need to find an approach that reduces the intercept')

=== Intercept Analysis ===

Current intercept: 0.0528
Target: 0.073

If intercept were 0.04 (instead of 0.0528):
  Required CV: 0.007697
  This is achievable with current best CV 0.008298

Key insight: The intercept is the bottleneck, not the CV
We need to find an approach that reduces the intercept


In [4]:
# What approaches haven't been tried?
print('=== Unexplored Approaches ===')
print()
print('TRIED AND FAILED:')
print('- GNN (exp_040): CV 0.068767 - too high')
print('- ChemBERTa (exp_041): CV 0.010288 - no improvement')
print('- Calibration (exp_042): CV 0.010008 - no improvement')
print('- Non-linear mixture (exp_043): CV 0.073776 - mixture only')
print('- Hybrid model (exp_044): CV 0.008597 - slight degradation')
print('- Learned embeddings (exp_039): CV 0.080438 - OOD failure')
print()
print('POTENTIALLY UNEXPLORED:')
print('1. Importance-weighted CV (address distribution shift)')
print('2. Adversarial validation (identify drifting features)')
print('3. Mean reversion (blend predictions toward training mean)')
print('4. Separate models for single vs mixture (not just features)')
print('5. Target-specific models (SM, Product 2, Product 3)')
print('6. Ensemble of diverse model families (GP + MLP + LGBM + Ridge)')

=== Unexplored Approaches ===

TRIED AND FAILED:
- GNN (exp_040): CV 0.068767 - too high
- ChemBERTa (exp_041): CV 0.010288 - no improvement
- Calibration (exp_042): CV 0.010008 - no improvement
- Non-linear mixture (exp_043): CV 0.073776 - mixture only
- Hybrid model (exp_044): CV 0.008597 - slight degradation
- Learned embeddings (exp_039): CV 0.080438 - OOD failure

POTENTIALLY UNEXPLORED:
1. Importance-weighted CV (address distribution shift)
2. Adversarial validation (identify drifting features)
3. Mean reversion (blend predictions toward training mean)
4. Separate models for single vs mixture (not just features)
5. Target-specific models (SM, Product 2, Product 3)
6. Ensemble of diverse model families (GP + MLP + LGBM + Ridge)


In [5]:
# Analyze what made exp_030 the best LB
print('=== exp_030 Analysis ===')
print()
print('exp_030 achieved best LB (0.08772) with CV 0.008298')
print('This was a GP+MLP+LGBM ensemble with weights (0.15, 0.55, 0.3)')
print()
print('Key features of exp_030:')
print('- GP weight: 0.15 (Gaussian Process for uncertainty)')
print('- MLP weight: 0.55 (Neural network for non-linear patterns)')
print('- LGBM weight: 0.30 (Gradient boosting for tabular data)')
print('- Combined Spange + DRFP features')
print('- Arrhenius kinetics features')
print()
print('The ensemble diversity may be key to the good LB performance')

=== exp_030 Analysis ===

exp_030 achieved best LB (0.08772) with CV 0.008298
This was a GP+MLP+LGBM ensemble with weights (0.15, 0.55, 0.3)

Key features of exp_030:
- GP weight: 0.15 (Gaussian Process for uncertainty)
- MLP weight: 0.55 (Neural network for non-linear patterns)
- LGBM weight: 0.30 (Gradient boosting for tabular data)
- Combined Spange + DRFP features
- Arrhenius kinetics features

The ensemble diversity may be key to the good LB performance


In [6]:
# Strategic recommendation
print('=== STRATEGIC RECOMMENDATION ===')
print()
print('CURRENT SITUATION:')
print(f'- Best LB: 0.08772 (exp_030)')
print(f'- Target: 0.073')
print(f'- Gap: 20.2%')
print(f'- Remaining submissions: 4')
print()
print('THE INTERCEPT PROBLEM:')
print(f'- CV-LB relationship: LB = 4.29*CV + 0.0528')
print(f'- Intercept (0.0528) is 72% of target (0.073)')
print(f'- Even CV=0 would give LB=0.0528')
print()
print('RECOMMENDED APPROACH:')
print('1. DO NOT submit exp_044 (CV 0.008597 is worse than exp_030)')
print('2. Focus on approaches that could change the CV-LB relationship:')
print('   a. Mean reversion: blend predictions toward training mean')
print('   b. Separate models: train completely different models for single vs mixture')
print('   c. Target-specific tuning: optimize for each target separately')
print('3. The key is to reduce the INTERCEPT, not just improve CV')
print()
print('SUBMISSION STRATEGY:')
print('- Submission 1: Mean reversion on exp_030 (alpha=0.8-0.9)')
print('- Submission 2: Based on results, refine or try separate models')
print('- Save 2 submissions for final refinements')

=== STRATEGIC RECOMMENDATION ===

CURRENT SITUATION:
- Best LB: 0.08772 (exp_030)
- Target: 0.073
- Gap: 20.2%
- Remaining submissions: 4

THE INTERCEPT PROBLEM:
- CV-LB relationship: LB = 4.29*CV + 0.0528
- Intercept (0.0528) is 72% of target (0.073)
- Even CV=0 would give LB=0.0528

RECOMMENDED APPROACH:
1. DO NOT submit exp_044 (CV 0.008597 is worse than exp_030)
2. Focus on approaches that could change the CV-LB relationship:
   a. Mean reversion: blend predictions toward training mean
   b. Separate models: train completely different models for single vs mixture
   c. Target-specific tuning: optimize for each target separately
3. The key is to reduce the INTERCEPT, not just improve CV

SUBMISSION STRATEGY:
- Submission 1: Mean reversion on exp_030 (alpha=0.8-0.9)
- Submission 2: Based on results, refine or try separate models
- Save 2 submissions for final refinements


In [7]:
# Key insight from kernel analysis
print('=== KEY INSIGHT FROM KERNEL ANALYSIS ===')
print()
print('The "mixall" kernel uses GroupKFold (5 splits) instead of Leave-One-Out!')
print('This is a DIFFERENT CV scheme that may have a different CV-LB relationship.')
print()
print('Our CV scheme: Leave-One-Solvent-Out (24 folds for single, 13 for mixture)')
print('Their CV scheme: GroupKFold (5 folds)')
print()
print('This could explain why our CV-LB relationship has a large intercept.')
print('The competition may be using a different evaluation scheme.')
print()
print('HOWEVER: The competition rules state that the submission must follow the template.')
print('The template uses leave-one-out splits, so we cannot change the CV scheme.')
print()
print('ALTERNATIVE INTERPRETATION:')
print('The intercept in our CV-LB relationship may be due to:')
print('1. Distribution shift between train/test')
print('2. The competition evaluating on a different metric')
print('3. The competition using different weights for single vs mixture')
print()
print('NEXT STEPS:')
print('1. Focus on approaches that could reduce the intercept')
print('2. Try mean reversion (blend predictions toward training mean)')
print('3. Try separate models for single vs mixture')
print('4. Try target-specific tuning')

=== KEY INSIGHT FROM KERNEL ANALYSIS ===

The "mixall" kernel uses GroupKFold (5 splits) instead of Leave-One-Out!
This is a DIFFERENT CV scheme that may have a different CV-LB relationship.

Our CV scheme: Leave-One-Solvent-Out (24 folds for single, 13 for mixture)
Their CV scheme: GroupKFold (5 folds)

This could explain why our CV-LB relationship has a large intercept.
The competition may be using a different evaluation scheme.

HOWEVER: The competition rules state that the submission must follow the template.
The template uses leave-one-out splits, so we cannot change the CV scheme.

ALTERNATIVE INTERPRETATION:
The intercept in our CV-LB relationship may be due to:
1. Distribution shift between train/test
2. The competition evaluating on a different metric
3. The competition using different weights for single vs mixture

NEXT STEPS:
1. Focus on approaches that could reduce the intercept
2. Try mean reversion (blend predictions toward training mean)
3. Try separate models for single