# Loop 26 Analysis: Per-Target Failure and Next Steps

## Key Findings from exp_025

The per-target model experiment FAILED (CV 0.009068 vs baseline 0.008689, 4.36% worse).

**Per-Target MSE Breakdown:**
- Product 2 MSE: 0.005917 (IMPROVED)
- Product 3 MSE: 0.007797 (IMPROVED)
- SM MSE: 0.014034 (MUCH WORSE - this is the culprit!)

**Key Insight:** The SM model with larger architecture [64,32] is OVERFITTING. The joint model provides multi-task regularization that helps SM prediction.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
]

df = pd.DataFrame(submissions)
print('Submission History:')
print(df.to_string(index=False))

# Linear fit
from scipy import stats
slope, intercept, r, p, se = stats.linregress(df['cv'], df['lb'])
print(f'\nCV-LB Relationship: LB = {slope:.2f}*CV + {intercept:.4f} (R²={r**2:.3f})')
print(f'Target: 0.01727')
print(f'Predicted CV for target LB: {(0.01727 - intercept) / slope:.6f}')

In [None]:
# Analyze what approaches have been tried
experiments = [
    {'id': 'exp_000', 'cv': 0.0111, 'approach': 'MLP baseline'},
    {'id': 'exp_001', 'cv': 0.0123, 'approach': 'LightGBM baseline'},
    {'id': 'exp_003', 'cv': 0.0105, 'approach': 'Combined Spange+DRFP'},
    {'id': 'exp_005', 'cv': 0.0104, 'approach': 'Large ensemble (15 models)'},
    {'id': 'exp_006', 'cv': 0.0097, 'approach': 'Simpler [64,32]'},
    {'id': 'exp_007', 'cv': 0.0093, 'approach': 'Even simpler [32,16]'},
    {'id': 'exp_009', 'cv': 0.0092, 'approach': 'Ridge regression'},
    {'id': 'exp_012', 'cv': 0.0090, 'approach': 'Simple ensemble MLP+LGBM'},
    {'id': 'exp_024', 'cv': 0.0087, 'approach': 'ACS PCA features'},
    {'id': 'exp_025', 'cv': 0.0091, 'approach': 'Per-target models (WORSE)'},
]

print('Experiment Trajectory:')
for e in experiments:
    print(f"{e['id']}: CV {e['cv']:.4f} - {e['approach']}")

print('\n=== WHAT WORKED ===')
print('1. Simpler architectures [32,16] > [256,128,64]')
print('2. MLP + LightGBM ensemble')
print('3. ACS PCA features (additional 5 features)')
print('4. Arrhenius kinetics features')
print('5. TTA for mixtures')

print('\n=== WHAT FAILED ===')
print('1. Per-target models (SM overfits with separate model)')
print('2. Deep residual networks')
print('3. Attention mechanisms')
print('4. Fragprints instead of DRFP')
print('5. Very large ensembles alone')

In [None]:
# Analyze the per-target failure more deeply
print('=== PER-TARGET ANALYSIS ===')
print('\nexp_024 (joint model) vs exp_025 (per-target):')
print('\nTarget    | exp_024 (joint) | exp_025 (per-target) | Change')
print('-' * 60)

# exp_024 per-target breakdown (estimated from overall CV)
# Overall CV 0.008689, assuming similar distribution
exp024_overall = 0.008689
exp025_p2 = 0.005917
exp025_p3 = 0.007797
exp025_sm = 0.014034
exp025_overall = 0.009068

print(f'Product 2 | ~0.006 (est)    | {exp025_p2:.6f}       | Improved')
print(f'Product 3 | ~0.008 (est)    | {exp025_p3:.6f}       | Improved')
print(f'SM        | ~0.012 (est)    | {exp025_sm:.6f}       | WORSE!')
print(f'Overall   | {exp024_overall:.6f}       | {exp025_overall:.6f}       | 4.36% worse')

print('\n=== KEY INSIGHT ===')
print('The joint model provides MULTI-TASK REGULARIZATION:')
print('- SM benefits from shared representation with Products')
print('- Separating SM removes this regularization')
print('- The [64,32] architecture for SM alone OVERFITS')

print('\n=== NEXT STEPS ===')
print('1. Try LOSS WEIGHTING instead of separate models')
print('   - Keep joint model, but weight SM loss higher')
print('   - loss = 2.0*SM_loss + 1.0*P2_loss + 1.0*P3_loss')
print('2. Try CONSISTENCY REGULARIZATION')
print('   - Add constraint: SM + P2 + P3 ≈ 1 (mass balance)')
print('3. Try 4-MODEL ENSEMBLE')
print('   - Add XGBoost and RandomForest for diversity')

In [None]:
# What unexplored approaches remain?
print('=== UNEXPLORED APPROACHES ===')

approaches = [
    ('Loss weighting for SM', 'HIGH', 'Weight SM loss 2x in joint model'),
    ('Consistency regularization', 'MEDIUM', 'Add SM + P2 + P3 ≈ 1 constraint'),
    ('4-model ensemble', 'MEDIUM', 'Add XGBoost + RandomForest'),
    ('Stacking meta-learner', 'MEDIUM', 'Learn optimal combination weights'),
    ('Non-linear mixture encoding', 'LOW', 'Add A*B*pct*(1-pct) interaction'),
    ('Larger MLP ensemble (10+)', 'LOW', 'More models for variance reduction'),
    ('Different optimizer (AdamW)', 'LOW', 'May help with regularization'),
    ('Cosine annealing LR', 'LOW', 'Better learning rate schedule'),
]

print('\nPriority | Approach | Description')
print('-' * 70)
for approach, priority, desc in approaches:
    print(f'{priority:8} | {approach:30} | {desc}')

print('\n=== RECOMMENDED NEXT EXPERIMENT ===')
print('exp_026: Loss-Weighted Joint Model')
print('- Keep joint [32,16] MLP + LightGBM ensemble')
print('- Weight SM loss 2x higher in training')
print('- Preserves multi-task regularization')
print('- Focuses optimization on hardest target (SM)')