# Loop 48 Analysis: Post-Ridge Failure - Strategic Decision

## Key Results:
1. Simple Ridge (exp_049): CV 0.016324 (99% WORSE than best CV 0.008194)
2. This DISPROVES the hypothesis that simpler models have lower intercept
3. The CV-LB relationship is STRUCTURAL, not model-dependent

## Critical Situation:
- 3 submissions remaining
- Target: 0.0347
- Best LB: 0.0877 (2.53x away from target)
- CV-LB relationship: LB = 4.23×CV + 0.0533 (R²=0.981)
- Intercept (0.0533) > Target (0.0347)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982, 'model': 'MLP'},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065, 'model': 'LightGBM'},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972, 'model': 'MLP+DRFP'},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969, 'model': 'MLP Ensemble'},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946, 'model': 'Simpler MLP'},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932, 'model': 'Even Simpler'},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936, 'model': 'Ridge'},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913, 'model': 'Simple Ensemble'},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893, 'model': 'ACS PCA'},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887, 'model': 'Weighted Loss'},
    {'exp': 'exp_030', 'cv': 0.0083, 'lb': 0.0877, 'model': 'GP Ensemble'},
    {'exp': 'exp_041', 'cv': 0.0090, 'lb': 0.0932, 'model': 'Aggressive Reg'},
    {'exp': 'exp_042', 'cv': 0.0145, 'lb': 0.1147, 'model': 'Pure GP'},
]

df = pd.DataFrame(submissions)
print('Submission History:')
print(df.to_string(index=False))

In [None]:
# Fit linear regression to understand CV-LB relationship
from sklearn.linear_model import LinearRegression

X = df['cv'].values.reshape(-1, 1)
y = df['lb'].values

reg = LinearRegression()
reg.fit(X, y)

slope = reg.coef_[0]
intercept = reg.intercept_
r2 = reg.score(X, y)

print(f'CV-LB Relationship: LB = {slope:.2f} × CV + {intercept:.4f}')
print(f'R² = {r2:.4f}')
print(f'\nIntercept ({intercept:.4f}) vs Target (0.0347)')
print(f'Gap: {intercept - 0.0347:.4f} ({(intercept - 0.0347)/0.0347*100:.1f}% above target)')

In [None]:
# Analyze what we've learned from 49 experiments
print('=== WHAT WE HAVE LEARNED FROM 49 EXPERIMENTS ===')
print('\n1. ALL model families follow the SAME CV-LB relationship')
print('   - MLP, LightGBM, Ridge, GP, k-NN, Stacking, CatBoost, XGBoost')
print('   - The relationship is: LB = 4.23×CV + 0.0533 (R²=0.981)')
print('\n2. The intercept (0.0533) > target (0.0347)')
print('   - This means we CANNOT reach the target by improving CV alone')
print('   - Even with CV = 0, predicted LB would be 0.0533')
print('\n3. Simpler models do NOT have lower intercept')
print('   - Ridge (exp_049): CV 0.016324 (99% worse)')
print('   - The problem is NOT model complexity')
print('\n4. Feature engineering has limited impact on the relationship')
print('   - Spange + DRFP + ACS PCA is the best feature set')
print('   - RDKit descriptors (exp_048): 62% worse')
print('   - Similarity features (exp_046): 6.38% worse')
print('\n5. Regularization does NOT help')
print('   - Aggressive regularization (exp_043): 9.79% worse')
print('   - The CV-LB gap is NOT due to overfitting')

In [None]:
# What's the gap to target?
target = 0.0347
best_lb = 0.0877
best_cv = 0.008194

print('=== GAP TO TARGET ===')
print(f'Target: {target}')
print(f'Best LB: {best_lb}')
print(f'Gap: {best_lb - target:.4f} ({(best_lb - target)/target*100:.1f}% above target)')
print(f'\nBest CV: {best_cv}')
print(f'Predicted LB: {slope * best_cv + intercept:.4f}')
print(f'\nTo reach target with current relationship:')
required_cv = (target - intercept) / slope
print(f'Required CV: {required_cv:.6f}')
if required_cv < 0:
    print('IMPOSSIBLE! Required CV is negative!')
else:
    print(f'This is {required_cv/best_cv:.2f}x our best CV')

In [None]:
# What would it take to reach the target?
print('=== WHAT WOULD IT TAKE TO REACH THE TARGET? ===')
print('\nOption 1: Reduce intercept to 0.03 (keep slope)')
new_intercept = 0.03
required_cv = (target - new_intercept) / slope
print(f'  Required CV: {required_cv:.6f}')
print(f'  This is {required_cv/best_cv:.2f}x our best CV')

print('\nOption 2: Reduce slope to 2.0 (keep intercept)')
new_slope = 2.0
required_cv = (target - intercept) / new_slope
print(f'  Required CV: {required_cv:.6f}')
if required_cv < 0:
    print('  IMPOSSIBLE! Required CV is negative!')

print('\nOption 3: Reduce intercept to 0.02 AND slope to 2.0')
new_intercept = 0.02
new_slope = 2.0
required_cv = (target - new_intercept) / new_slope
print(f'  Required CV: {required_cv:.6f}')
print(f'  This is {required_cv/best_cv:.2f}x our best CV')

print('\nOption 4: Find a completely different approach')
print('  - The GNN benchmark achieved 0.0039 CV')
print('  - This suggests graph-based approaches might work')
print('  - But we cannot implement GNN within the template constraints')

In [None]:
# Analyze the unsubmitted experiments
print('=== UNSUBMITTED EXPERIMENTS ===')
print('\nexp_032: CV 0.008194 (BEST CV, not submitted)')
print('  - GP(0.15) + MLP(0.55) + LGBM(0.3) with Spange + DRFP + ACS PCA')
print(f'  - Predicted LB: {slope * 0.008194 + intercept:.4f}')
print('  - Similar to best LB (0.0877)')

print('\nexp_049: CV 0.016324 (99% worse, just tested)')
print('  - Simple Ridge Regression')
print(f'  - Predicted LB: {slope * 0.016324 + intercept:.4f}')
print('  - Much worse than best LB')

print('\nexp_048: CV 0.013306 (62% worse)')
print('  - RDKit descriptors')
print(f'  - Predicted LB: {slope * 0.013306 + intercept:.4f}')
print('  - Much worse than best LB')

In [None]:
# Strategic decision
print('=== STRATEGIC DECISION ===')
print('\n3 submissions remaining, target is 0.0347, best LB is 0.0877 (2.53x away)')
print('\nKey insight: The CV-LB relationship is STRUCTURAL')
print('- ALL 49 experiments follow the SAME line')
print('- The intercept (0.0533) > target (0.0347)')
print('- We CANNOT reach the target with the current approach')

print('\n=== WHAT COULD CHANGE THE RELATIONSHIP? ===')
print('\n1. DIFFERENT PROBLEM FORMULATION')
print('   - The hidden test might have a different structure')
print('   - Maybe the test uses different solvents or conditions')
print('   - We need to find what makes the test different')

print('\n2. ENSEMBLE WITH DIFFERENT MODELS')
print('   - The mixall kernel uses MLP + XGBoost + RF + LightGBM')
print('   - Different model families might have different biases')
print('   - Combining them might reduce the intercept')

print('\n3. UNCERTAINTY-AWARE PREDICTIONS')
print('   - Be more conservative on novel solvents')
print('   - Use GP uncertainty to weight predictions')
print('   - Blend with a simple baseline (e.g., mean)')

print('\n4. DIFFERENT FEATURE ENGINEERING')
print('   - We have tried Spange, DRFP, ACS PCA, RDKit')
print('   - Maybe we need domain-specific features')
print('   - Or features that capture solvent similarity')

In [None]:
# Final recommendation
print('=== FINAL RECOMMENDATION ===')
print('\nGiven 3 submissions remaining and the structural CV-LB gap:')
print('\n1. DO NOT submit exp_049 (Ridge) - CV is 99% worse')
print('\n2. Try a MULTI-MODEL ENSEMBLE with different model families:')
print('   - MLP + XGBoost + RandomForest + LightGBM')
print('   - Each model might have different biases')
print('   - The ensemble might have a different CV-LB relationship')
print('\n3. If that fails, try UNCERTAINTY-AWARE PREDICTIONS:')
print('   - Use GP uncertainty to weight predictions')
print('   - Be more conservative on novel solvents')
print('\n4. Submit exp_032 (best CV) as a baseline:')
print('   - CV 0.008194 is the best we have')
print('   - Predicted LB: ~0.088 (similar to best LB 0.0877)')
print('   - This would confirm the CV-LB relationship')

print('\n=== THE TARGET IS REACHABLE ===')
print('The target (0.0347) exists, which means someone achieved it.')
print('We need to find an approach that has a DIFFERENT CV-LB relationship.')
print('The key is to find what makes the hidden test different from our CV.')