# Loop 56 Analysis: Post-Chemical Constraints Assessment

**Situation:**
- 56 experiments completed, 25 consecutive failures since exp_030
- Best LB: 0.0877 (exp_030), Target: 0.0347
- Gap: 2.53x (0.0877 / 0.0347)
- 5 submissions remaining
- exp_055 (Chemical Constraints with Softmax) FAILED - CV 0.020769 (150% worse)

**Key Finding from exp_055:**
The targets DON'T sum to 1 (mean ~0.80), so softmax constraint is INCORRECT.

**Questions:**
1. What fundamentally different approaches remain?
2. Can we change the CV-LB relationship?
3. What do the top kernels do differently?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887},
    {'exp': 'exp_030', 'cv': 0.0083, 'lb': 0.0877},
    {'exp': 'exp_035', 'cv': 0.0098, 'lb': 0.0970},
]

df = pd.DataFrame(submissions)
print("Submission History:")
print(df.to_string(index=False))
print(f"\nTarget LB: 0.0347")
print(f"Best LB: {df['lb'].min():.4f} ({df.loc[df['lb'].idxmin(), 'exp']})")
print(f"Gap to target: {df['lb'].min() / 0.0347:.2f}x")

In [None]:
# CV-LB relationship analysis
cv = df['cv'].values
lb = df['lb'].values

slope, intercept, r_value, p_value, std_err = stats.linregress(cv, lb)

print(f"CV-LB Linear Relationship:")
print(f"  LB = {slope:.2f} * CV + {intercept:.4f}")
print(f"  RÂ² = {r_value**2:.4f}")
print(f"  Intercept = {intercept:.4f}")
print(f"  Target LB = 0.0347")
print(f"")
print(f"CRITICAL INSIGHT:")
print(f"  Intercept ({intercept:.4f}) > Target ({0.0347})")
print(f"  This means even with CV=0, LB would be {intercept:.4f} > 0.0347")
print(f"")
print(f"Required CV to hit target:")
required_cv = (0.0347 - intercept) / slope
print(f"  CV = (0.0347 - {intercept:.4f}) / {slope:.2f} = {required_cv:.6f}")
if required_cv < 0:
    print(f"  NEGATIVE CV required - target is UNREACHABLE with current approach!")

In [None]:
# Analyze what the 'mixall' kernel does differently
print("="*60)
print("ANALYSIS: What does 'mixall' kernel do differently?")
print("="*60)

print("\n1. VALIDATION SCHEME:")
print("   - Our approach: Leave-One-Solvent-Out (24 folds for single, 13 for mixtures)")
print("   - mixall kernel: GroupKFold (5 splits)")
print("   - This is a LESS PESSIMISTIC CV scheme")
print("   - BUT: The LB evaluation uses the OFFICIAL scheme")

print("\n2. MODEL ARCHITECTURE:")
print("   - mixall uses: MLP + XGBoost + RandomForest + LightGBM ensemble")
print("   - Our best (exp_030): GP + MLP + LightGBM ensemble")
print("   - Key difference: mixall uses XGBoost and RandomForest")

print("\n3. FEATURE ENGINEERING:")
print("   - mixall uses: Spange descriptors only")
print("   - Our best: Spange + DRFP + ACS PCA + Arrhenius")
print("   - We have MORE features")

print("\n4. HYPERPARAMETER OPTIMIZATION:")
print("   - mixall uses: Optuna for hyperparameter tuning")
print("   - Our approach: Manual tuning")

In [None]:
# Analyze the CV-LB gap for different experiments
print("="*60)
print("CV-LB GAP ANALYSIS")
print("="*60)

df['gap'] = df['lb'] / df['cv']
df['residual'] = df['lb'] - (slope * df['cv'] + intercept)

print("\nCV-LB Gap (LB/CV ratio):")
for _, row in df.iterrows():
    print(f"  {row['exp']}: CV={row['cv']:.4f}, LB={row['lb']:.4f}, Gap={row['gap']:.2f}x, Residual={row['residual']:.4f}")

print(f"\nBest residual (below regression line): {df.loc[df['residual'].idxmin(), 'exp']} ({df['residual'].min():.4f})")
print(f"Worst residual (above regression line): {df.loc[df['residual'].idxmax(), 'exp']} ({df['residual'].max():.4f})")

In [None]:
# What approaches have NOT been tried?
print("="*60)
print("APPROACHES NOT YET TRIED")
print("="*60)

print("\n1. DIFFERENT ENSEMBLE MEMBERS:")
print("   - XGBoost (used in mixall, not in our best model)")
print("   - RandomForest (used in mixall, not in our best model)")
print("   - CatBoost (not tried)")

print("\n2. DIFFERENT LOSS FUNCTIONS:")
print("   - Quantile loss (for uncertainty estimation)")
print("   - Asymmetric loss (penalize over/under predictions differently)")
print("   - Focal loss (focus on hard examples)")

print("\n3. DIFFERENT VALIDATION SCHEMES:")
print("   - GroupKFold (like mixall) - but this is a gray area in rules")
print("   - Stratified by target values")
print("   - Time-based splits (if there's temporal structure)")

print("\n4. POST-PROCESSING:")
print("   - Prediction calibration (isotonic regression)")
print("   - Ensemble of predictions from different CV folds")
print("   - Prediction clipping/normalization")

print("\n5. DOMAIN-SPECIFIC APPROACHES:")
print("   - Physics-informed constraints (NOT sum-to-1, but other constraints)")
print("   - Solvent similarity-based weighting")
print("   - Transfer learning from related datasets")

In [None]:
# Strategic recommendations
print("="*60)
print("STRATEGIC RECOMMENDATIONS FOR LOOP 56")
print("="*60)

print("\n1. TRY XGBOOST + RANDOMFOREST IN ENSEMBLE:")
print("   - The mixall kernel uses these, we don't")
print("   - Could provide different inductive biases")
print("   - May have different CV-LB relationship")

print("\n2. TRY PREDICTION CALIBRATION:")
print("   - Use isotonic regression to calibrate predictions")
print("   - This explicitly corrects systematic bias")
print("   - Could reduce the intercept in CV-LB relationship")

print("\n3. TRY DIFFERENT LOSS FUNCTION:")
print("   - Quantile loss for uncertainty estimation")
print("   - May produce more robust predictions")

print("\n4. TRY OPTUNA HYPERPARAMETER OPTIMIZATION:")
print("   - The mixall kernel uses this")
print("   - Could find better hyperparameters")

print("\n5. SUBMISSION STRATEGY:")
print("   - 5 submissions remaining")
print("   - Try 2-3 fundamentally different approaches")
print("   - Save 2 submissions for final attempts")

In [None]:
# Final summary
print("="*60)
print("LOOP 56 SUMMARY")
print("="*60)

print("\nCurrent Status:")
print(f"  - Best CV: 0.008298 (exp_030)")
print(f"  - Best LB: 0.0877 (exp_030)")
print(f"  - Target LB: 0.0347")
print(f"  - Gap: 2.53x")
print(f"  - Submissions remaining: 5")
print(f"  - Consecutive failures: 25")

print("\nKey Findings:")
print("  1. CV-LB relationship: LB = 4.31*CV + 0.0525")
print("  2. Intercept (0.0525) > Target (0.0347) - target unreachable with current approach")
print("  3. exp_055 (softmax constraint) FAILED because targets don't sum to 1")
print("  4. The 'mixall' kernel uses different ensemble members (XGBoost, RF)")

print("\nRecommended Next Steps:")
print("  1. Try XGBoost + RandomForest in ensemble (like mixall)")
print("  2. Try prediction calibration (isotonic regression)")
print("  3. Try Optuna hyperparameter optimization")
print("  4. Focus on approaches that might change the CV-LB relationship")