# Loop 80 Analysis: Submission Failure Diagnosis

**CRITICAL FINDING:** The GroupKFold(5) submission failed because the evaluation metric expects Leave-One-Out validation structure (24 folds for single solvent, 13 folds for full data), NOT GroupKFold(5).

## Key Insights

1. **GroupKFold(5) is INCOMPATIBLE with the competition evaluation**
   - The competition requires Leave-One-Out validation
   - Submission must have 24 folds for single solvent, 13 folds for full data
   - GroupKFold(5) produces only 5 folds → evaluation fails

2. **The "mixall" kernel approach cannot be used directly**
   - The kernel may work on Kaggle with different evaluation
   - Our competition requires strict Leave-One-Out structure

3. **CV-LB Relationship Analysis (from 12 submissions)**
   - Linear fit: LB = 4.29 * CV + 0.0528 (R² = 0.95)
   - Intercept (0.0528) > Target (0.0347)
   - Required CV to hit target: -0.0042 (IMPOSSIBLE with current approaches)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Analyze the failed submission
df = pd.read_csv('/home/submission/submission.csv')
print('Failed submission structure:')
print(f'Total rows: {len(df)}')
print(f'Task 0 (single) folds: {sorted(df[df["task"]==0]["fold"].unique())}')
print(f'Task 1 (full) folds: {sorted(df[df["task"]==1]["fold"].unique())}')
print(f'Task 0 rows: {len(df[df["task"]==0])}')
print(f'Task 1 rows: {len(df[df["task"]==1])}')

print('\n--- Expected structure (Leave-One-Out) ---')
print('Task 0 (single): 24 folds (one per solvent)')
print('Task 1 (full): 13 folds (one per ramp)')

In [None]:
# CV-LB relationship analysis from all submissions
submissions = [
    ('exp_000', 0.0111, 0.0982),
    ('exp_001', 0.0123, 0.1065),
    ('exp_003', 0.0105, 0.0972),
    ('exp_005', 0.0104, 0.0969),
    ('exp_006', 0.0097, 0.0946),
    ('exp_007', 0.0093, 0.0932),
    ('exp_009', 0.0092, 0.0936),
    ('exp_012', 0.0090, 0.0913),
    ('exp_024', 0.0087, 0.0893),
    ('exp_026', 0.0085, 0.0887),
    ('exp_030', 0.0083, 0.0877),
    ('exp_035', 0.0098, 0.0970),
]

cv_scores = [s[1] for s in submissions]
lb_scores = [s[2] for s in submissions]

# Fit linear regression
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(cv_scores, lb_scores)

print('CV-LB Relationship Analysis:')
print(f'Linear fit: LB = {slope:.4f} * CV + {intercept:.4f}')
print(f'R² = {r_value**2:.4f}')
print(f'\nIntercept: {intercept:.4f}')
print(f'Target LB: 0.0347')
print(f'\nRequired CV to hit target: (0.0347 - {intercept:.4f}) / {slope:.4f} = {(0.0347 - intercept) / slope:.4f}')
print('\n*** IMPOSSIBLE: Required CV is NEGATIVE ***')

In [None]:
# What approaches have been tried?
print('='*60)
print('APPROACHES TRIED (80 experiments)')
print('='*60)

approaches = [
    ('MLP variants', '50+', 'All fall on same CV-LB line'),
    ('LightGBM', '10+', 'Same line'),
    ('XGBoost', '10+', 'Same line'),
    ('CatBoost', '5+', 'Same line'),
    ('Random Forest', '3+', 'Same line'),
    ('Gaussian Process', '3+', 'Same line'),
    ('Ridge Regression', '2+', 'Same line'),
    ('GNN (attempted)', '2', 'Model class mismatch issues'),
    ('ChemBERTa (attempted)', '2', 'Model class mismatch issues'),
    ('GroupKFold(5)', '1', 'INCOMPATIBLE with evaluation'),
]

for approach, count, result in approaches:
    print(f'{approach:30s} | {count:5s} | {result}')

print('\n' + '='*60)
print('CRITICAL: All tabular approaches fall on the SAME CV-LB line!')
print('The intercept (0.0528) is the STRUCTURAL GAP that cannot be fixed.')
print('='*60)

In [None]:
# What hasn't been tried properly?
print('='*60)
print('APPROACHES NOT PROPERLY IMPLEMENTED')
print('='*60)

untried = [
    ('GNN with correct submission cells', 'Model class mismatch in previous attempts'),
    ('ChemBERTa with correct submission cells', 'Model class mismatch in previous attempts'),
    ('Extrapolation detection features', 'Not implemented'),
    ('Uncertainty-weighted predictions', 'Not implemented'),
    ('Domain constraints (yield normalization)', 'Partially tried, not effective'),
    ('Pseudo-labeling', 'Not implemented'),
    ('Adversarial validation features', 'Not implemented'),
]

for approach, status in untried:
    print(f'{approach:45s} | {status}')

print('\n' + '='*60)
print('NEXT STEPS: Must try approaches that CHANGE the CV-LB relationship')
print('='*60)