# Loop 113 LB Feedback Analysis

## New Submission Result
- exp_110: SimilarityAwareModel with CORRECT Submission Format
- CV: 0.0129 | LB: 0.1063
- Gap: -0.0934

## Key Question
Does the chemical similarity approach change the CV-LB relationship?

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# All submissions with LB scores
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887},
    {'exp': 'exp_030', 'cv': 0.0083, 'lb': 0.0877},  # Best LB
    {'exp': 'exp_035', 'cv': 0.0098, 'lb': 0.0970},
    {'exp': 'exp_073', 'cv': 0.0084, 'lb': 0.1451},  # Outlier - likely bug
    {'exp': 'exp_111', 'cv': 0.0129, 'lb': 0.1063},  # NEW - SimilarityAwareModel
]

df = pd.DataFrame(submissions)
print(f"Total submissions with LB: {len(df)}")
print(df.to_string(index=False))

Total submissions with LB: 14
    exp     cv     lb
exp_000 0.0111 0.0982
exp_001 0.0123 0.1065
exp_003 0.0105 0.0972
exp_005 0.0104 0.0969
exp_006 0.0097 0.0946
exp_007 0.0093 0.0932
exp_009 0.0092 0.0936
exp_012 0.0090 0.0913
exp_024 0.0087 0.0893
exp_026 0.0085 0.0887
exp_030 0.0083 0.0877
exp_035 0.0098 0.0970
exp_073 0.0084 0.1451
exp_111 0.0129 0.1063


In [2]:
# Analyze CV-LB relationship
# Exclude exp_073 (outlier with LB=0.1451 - likely a bug)
df_valid = df[df['exp'] != 'exp_073'].copy()

# Fit linear regression
X = df_valid['cv'].values.reshape(-1, 1)
y = df_valid['lb'].values
reg = LinearRegression()
reg.fit(X, y)

slope = reg.coef_[0]
intercept = reg.intercept_
r2 = reg.score(X, y)

print("="*60)
print("CV-LB RELATIONSHIP ANALYSIS (excluding exp_073 outlier)")
print("="*60)
print(f"\nLinear fit: LB = {slope:.4f} × CV + {intercept:.4f}")
print(f"R-squared: {r2:.4f}")
print(f"\nIntercept: {intercept:.4f}")
print(f"Target LB: 0.0347")
print(f"\nCRITICAL: Intercept ({intercept:.4f}) > Target (0.0347)")
print(f"Required CV to hit target: (0.0347 - {intercept:.4f}) / {slope:.4f} = {(0.0347 - intercept) / slope:.6f}")
print("\n⚠️ NEGATIVE CV REQUIRED - TARGET IS MATHEMATICALLY UNREACHABLE WITH THIS LINE!")

# Check if exp_111 is on the line or off it
exp_111 = df[df['exp'] == 'exp_111'].iloc[0]
expected_lb = slope * exp_111['cv'] + intercept
actual_lb = exp_111['lb']
deviation = actual_lb - expected_lb
print(f"\n" + "="*60)
print("exp_111 (SimilarityAwareModel) ANALYSIS")
print("="*60)
print(f"CV: {exp_111['cv']:.4f}")
print(f"Expected LB from line: {expected_lb:.4f}")
print(f"Actual LB: {actual_lb:.4f}")
print(f"Deviation from line: {deviation:.4f} ({deviation/expected_lb*100:.1f}%)")
if abs(deviation) < 0.005:
    print("\n❌ exp_111 is ON THE LINE - chemical similarity did NOT change the relationship")
else:
    print(f"\n{'✅' if deviation < 0 else '❌'} exp_111 is {'BELOW' if deviation < 0 else 'ABOVE'} the line by {abs(deviation):.4f}")

CV-LB RELATIONSHIP ANALYSIS (excluding exp_073 outlier)

Linear fit: LB = 4.0895 × CV + 0.0546
R-squared: 0.9607

Intercept: 0.0546
Target LB: 0.0347

CRITICAL: Intercept (0.0546) > Target (0.0347)
Required CV to hit target: (0.0347 - 0.0546) / 4.0895 = -0.004872

⚠️ NEGATIVE CV REQUIRED - TARGET IS MATHEMATICALLY UNREACHABLE WITH THIS LINE!

exp_111 (SimilarityAwareModel) ANALYSIS
CV: 0.0129
Expected LB from line: 0.1074
Actual LB: 0.1063
Deviation from line: -0.0011 (-1.0%)

❌ exp_111 is ON THE LINE - chemical similarity did NOT change the relationship


In [3]:
# Visualize the CV-LB relationship\nplt.figure(figsize=(12, 8))\n\n# Plot all valid submissions\nplt.scatter(df_valid['cv'], df_valid['lb'], c='blue', s=100, label='Valid submissions', zorder=5)\n\n# Plot exp_073 (outlier)\nexp_073 = df[df['exp'] == 'exp_073'].iloc[0]\nplt.scatter(exp_073['cv'], exp_073['lb'], c='red', s=100, marker='x', label='exp_073 (outlier)', zorder=5)\n\n# Plot exp_111 (SimilarityAwareModel)\nplt.scatter(exp_111['cv'], exp_111['lb'], c='green', s=150, marker='*', label='exp_111 (SimilarityAwareModel)', zorder=6)\n\n# Plot regression line\ncv_range = np.linspace(0, 0.015, 100)\nlb_pred = slope * cv_range + intercept\nplt.plot(cv_range, lb_pred, 'r--', label=f'LB = {slope:.2f}×CV + {intercept:.4f} (R²={r2:.3f})', zorder=3)\n\n# Plot target\nplt.axhline(y=0.0347, color='green', linestyle=':', linewidth=2, label='Target LB = 0.0347', zorder=2)\n\n# Plot intercept\nplt.axhline(y=intercept, color='orange', linestyle=':', linewidth=2, label=f'Intercept = {intercept:.4f}', zorder=2)\n\nplt.xlabel('CV Score (MSE)', fontsize=12)\nplt.ylabel('LB Score (MSE)', fontsize=12)\nplt.title('CV vs LB Relationship - All Submissions', fontsize=14)\nplt.legend(loc='upper left')\nplt.grid(True, alpha=0.3)\n\n# Annotate best submission\nbest = df_valid.loc[df_valid['lb'].idxmin()]\nplt.annotate(f'Best: {best[\"exp\"]}\\nCV={best[\"cv\"]:.4f}\\nLB={best[\"lb\"]:.4f}', \n             xy=(best['cv'], best['lb']), xytext=(best['cv']+0.001, best['lb']+0.01),\n             arrowprops=dict(arrowstyle='->', color='black'), fontsize=10)\n\nplt.tight_layout()\nplt.savefig('/home/code/exploration/cv_lb_relationship_loop113.png', dpi=150)\nplt.show()\n\nprint(f\"\\nBest submission: {best['exp']} with LB={best['lb']:.4f}\")\nprint(f\"Gap to target: {best['lb'] - 0.0347:.4f} ({(best['lb'] - 0.0347)/0.0347*100:.1f}%)\")

In [4]:
# CRITICAL ANALYSIS: What approaches have been tried?\nprint(\"=\"*70)\nprint(\"APPROACHES TRIED AND THEIR RESULTS\")\nprint(\"=\"*70)\n\napproaches = [\n    (\"MLP (various architectures)\", \"exp_000-exp_012\", \"0.0083-0.0111\", \"0.0877-0.0982\", \"ON LINE\"),\n    (\"LightGBM\", \"exp_001\", \"0.0123\", \"0.1065\", \"ON LINE\"),\n    (\"CatBoost + XGBoost Ensemble\", \"exp_024-exp_030\", \"0.0083-0.0087\", \"0.0877-0.0893\", \"ON LINE\"),\n    (\"GP Ensemble\", \"exp_030-exp_035\", \"0.0083-0.0098\", \"0.0877-0.0970\", \"ON LINE\"),\n    (\"GNN (various)\", \"exp_040, exp_079\", \"0.0110\", \"pending\", \"UNKNOWN\"),\n    (\"ChemBERTa\", \"exp_041, exp_097\", \"0.0112\", \"pending\", \"UNKNOWN\"),\n    (\"Similarity-based blending\", \"exp_111\", \"0.0129\", \"0.1063\", \"ON LINE\"),\n]\n\nfor approach, exps, cv, lb, status in approaches:\n    print(f\"\\n{approach}:\")\n    print(f\"  Experiments: {exps}\")\n    print(f\"  CV range: {cv}\")\n    print(f\"  LB range: {lb}\")\n    print(f\"  Status: {status}\")\n\nprint(\"\\n\" + \"=\"*70)\nprint(\"KEY INSIGHT\")\nprint(\"=\"*70)\nprint(\"\"\"\nALL approaches that have been submitted fall on the SAME CV-LB line:\n  LB = 4.09 × CV + 0.0546 (R² = 0.96)\n\nThe intercept (0.0546) is HIGHER than the target (0.0347).\nThis means the target is MATHEMATICALLY UNREACHABLE with any approach\nthat falls on this line.\n\nTo beat the target, we need an approach that:\n1. Has a LOWER intercept (reduces extrapolation error)\n2. OR has a DIFFERENT slope (better generalization)\n\nThe chemical similarity approach (exp_111) did NOT change the relationship.\nIt fell exactly on the line.\n\"\"\")

In [5]:
# What's left to try?\nprint(\"=\"*70)\nprint(\"REMAINING APPROACHES TO TRY\")\nprint(\"=\"*70)\n\nprint(\"\"\"\n1. GRAPH NEURAL NETWORKS (GNN) - NOT YET SUBMITTED\n   - exp_079 has CV=0.0110 but LB is pending\n   - GNNs operate on molecular graphs, not tabular features\n   - They might have a different CV-LB relationship\n   - PRIORITY: HIGH - need to submit to test hypothesis\n\n2. ChemBERTa / Molecular Transformers - NOT YET SUBMITTED\n   - exp_097 has CV=0.0112 but LB is pending\n   - Pretrained on large chemical corpora\n   - Might generalize better to unseen solvents\n   - PRIORITY: HIGH - need to submit to test hypothesis\n\n3. DOMAIN-SPECIFIC CONSTRAINTS\n   - Physics-informed neural networks\n   - Arrhenius kinetics constraints\n   - Mass balance constraints\n   - PRIORITY: MEDIUM - might reduce extrapolation error\n\n4. PSEUDO-LABELING\n   - Use confident predictions on test set to augment training\n   - Adapt to test distribution\n   - PRIORITY: MEDIUM - might reduce distribution shift\n\n5. ADVERSARIAL VALIDATION\n   - Train classifier to distinguish train/test\n   - Use features that distinguish as calibration signals\n   - PRIORITY: LOW - already tried similarity-based approach\n\"\"\")\n\nprint(\"\\n\" + \"=\"*70)\nprint(\"RECOMMENDED NEXT STEPS\")\nprint(\"=\"*70)\nprint(\"\"\"\nWith only 3 submissions remaining, we should:\n\n1. SUBMIT exp_079 (GNN) to test if GNNs have a different CV-LB relationship\n   - If GNN is OFF the line, iterate on GNN approaches\n   - If GNN is ON the line, GNNs won't help\n\n2. If GNN doesn't work, try a FUNDAMENTALLY DIFFERENT approach:\n   - Physics-informed constraints that hold for unseen solvents\n   - Domain-specific rules that generalize\n\n3. DO NOT waste submissions on:\n   - More tabular model variants (MLP, LGBM, CatBoost, XGBoost)\n   - More ensemble combinations\n   - More feature engineering\n   \nThese all fall on the same line and won't beat the target.\n\"\"\")