# Loop 7 Analysis: Simpler Model Results

## Key Finding
The simpler model [64, 32] achieved the **BEST CV score** (0.009749), a 6.5% improvement over the previous best.

## Questions to Answer
1. What does this mean for the overfitting hypothesis?
2. Should we submit to LB?
3. What's the predicted LB score?
4. What should we try next?

In [None]:
import pandas as pd
import numpy as np

# All experiments
experiments = [
    {'exp': 'exp_000', 'name': 'Baseline MLP [128,128,64]', 'cv': 0.011081, 'lb': 0.0982, 'models': 3},
    {'exp': 'exp_001', 'name': 'LightGBM', 'cv': 0.012297, 'lb': 0.1065, 'models': 3},
    {'exp': 'exp_002', 'name': 'DRFP with PCA', 'cv': 0.016948, 'lb': None, 'models': 5},
    {'exp': 'exp_003', 'name': 'Combined [256,128,64]', 'cv': 0.010501, 'lb': 0.0972, 'models': 5},
    {'exp': 'exp_004', 'name': 'Deep Residual (FAILED)', 'cv': 0.051912, 'lb': None, 'models': 10},
    {'exp': 'exp_005', 'name': 'Large Ensemble [256,128,64]', 'cv': 0.010430, 'lb': 0.0969, 'models': 15},
    {'exp': 'exp_006', 'name': 'Simpler [64,32]', 'cv': 0.009749, 'lb': None, 'models': 5},
]

df = pd.DataFrame(experiments)
df['ratio'] = df['lb'] / df['cv']
print('=== ALL EXPERIMENTS ===')
print(df.to_string(index=False))

In [None]:
# Analyze the simpler model result
print('=== SIMPLER MODEL ANALYSIS ===')
print()
print('exp_006 (Simpler [64,32]) vs exp_005 (Large Ensemble [256,128,64]):')
print(f'  CV: 0.009749 vs 0.010430 → {(0.010430 - 0.009749) / 0.010430 * 100:.1f}% BETTER')
print(f'  Architecture: [64, 32] vs [256, 128, 64]')
print(f'  Dropout: 0.1 vs 0.3')
print(f'  Models: 5 vs 15')
print(f'  Training time: ~63 min vs ~6.5 hours')
print()
print('KEY INSIGHT: The simpler model has BETTER CV, not worse!')
print('This suggests the larger models were overfitting even within CV.')
print()
print('Breakdown by task:')
print(f'  Single Solvent: 0.011120 (exp_006) vs 0.011533 (exp_005) → 3.6% better')
print(f'  Full Data:      0.009016 (exp_006) vs 0.009841 (exp_005) → 8.4% better')
print()
print('The improvement is LARGER for Full Data (mixtures), suggesting:')
print('  - Simpler models generalize better to mixture combinations')
print('  - The complex model was overfitting to specific solvent pairs')

In [None]:
# Predict LB score for exp_006
print('=== LB PREDICTION ===')
print()

# Calculate average ratio from submitted experiments
submitted = df[df['lb'].notna()]
avg_ratio = submitted['ratio'].mean()
std_ratio = submitted['ratio'].std()

print(f'Average CV-LB ratio: {avg_ratio:.2f}x (std: {std_ratio:.2f})')
print()

# Predict LB for exp_006
exp006_cv = 0.009749
predicted_lb = exp006_cv * avg_ratio
predicted_lb_low = exp006_cv * (avg_ratio - std_ratio)
predicted_lb_high = exp006_cv * (avg_ratio + std_ratio)

print(f'exp_006 CV: {exp006_cv:.6f}')
print(f'Predicted LB: {predicted_lb:.4f} (range: {predicted_lb_low:.4f} - {predicted_lb_high:.4f})')
print()
print(f'Best current LB: 0.0969 (exp_005)')
print(f'Predicted improvement: {(0.0969 - predicted_lb) / 0.0969 * 100:.1f}%')
print()
print('If the CV-LB ratio holds, exp_006 should achieve LB ~0.088')
print('This would be the BEST LB score so far!')

In [None]:
# Should we submit?
print('=== SUBMISSION DECISION ===')
print()
print('Arguments FOR submitting exp_006:')
print('  1. Best CV score (0.009749) - 6.5% better than previous best')
print('  2. Validates simpler model hypothesis on LB')
print('  3. If LB improves, opens direction for even simpler models')
print('  4. 3 submissions remaining - can afford to test')
print()
print('Arguments AGAINST submitting exp_006:')
print('  1. Could save submission for final ensemble')
print('  2. CV improvement may not translate to LB')
print()
print('RECOMMENDATION: SUBMIT exp_006')
print('  - The 6.5% CV improvement is significant')
print('  - We need to validate if simpler models have different CV-LB ratio')
print('  - This informs our strategy for remaining experiments')

In [None]:
# What to try next based on result
print('=== NEXT STEPS BASED ON LB RESULT ===')
print()
print('IF LB improves (e.g., 0.088 or better):')
print('  → Simpler model hypothesis VALIDATED')
print('  → Try even simpler: [32, 16], [32], or linear models')
print('  → The optimal architecture may be much simpler than expected')
print('  → Target may be achievable with very simple models')
print()
print('IF LB stays same (~0.097):')
print('  → CV improvement does NOT translate to LB')
print('  → The CV-LB gap is NOT due to model complexity')
print('  → Need fundamentally different approach:')
print('     - Gaussian Processes with Tanimoto kernel')
print('     - Per-target models')
print('     - Domain adaptation techniques')
print()
print('IF LB gets worse:')
print('  → Simpler models underfit on LB')
print('  → The complex model was actually better for generalization')
print('  → Focus on other approaches (GPs, per-target)')

In [None]:
# Target analysis
print('=== TARGET ANALYSIS ===')
print()
target = 0.0333
best_lb = 0.0969
best_cv = 0.009749

print(f'Target: {target}')
print(f'Best LB: {best_lb}')
print(f'Best CV: {best_cv}')
print()
print(f'Gap to target: {(best_lb - target) / target * 100:.0f}% improvement needed')
print()
print('With 9x CV-LB ratio:')
print(f'  To beat {target}, need CV < {target / avg_ratio:.6f}')
print(f'  Current best CV: {best_cv}')
print(f'  CV improvement needed: {(best_cv - target/avg_ratio) / best_cv * 100:.0f}%')
print()
print('REALITY CHECK:')
print('  - We need ~62% CV improvement to beat target')
print('  - Simpler model gave 6.5% improvement')
print('  - Would need ~10 such improvements (unlikely)')
print('  - BUT: simpler models may have DIFFERENT CV-LB ratio')
print('  - If ratio drops to 3x, current CV would give LB ~0.029 (beats target!)')