# Loop 20 Strategic Analysis

## Situation Summary
- **Best CV**: 0.0623 (exp_004/016/017/018) - HGB+ETR per-target with Spange descriptors
- **Best LB**: 0.0956 (exp_004/016) - 53% CV-LB gap
- **Target**: 0.01727 (5.5x away from best LB)
- **Submissions remaining**: 2

## GNN Experiment (exp_020) Results
- Basic GNN: CV 0.099 (59% worse than best)
- Improved GNN (GAT): ~0.11 MAE on quick test
- RDKit MLP: ~0.17 MAE on quick test
- **Conclusion**: GNN failed due to insufficient data (only 24 solvents)

## Key Questions
1. Why is there a 53% CV-LB gap?
2. What approaches haven't been tried?
3. How can we close the gap to target?

In [1]:
import pandas as pd
import numpy as np
import json

# Load session state
with open('/home/code/session_state.json') as f:
    state = json.load(f)

# Analyze submissions
print('=== SUBMISSIONS ANALYSIS ===')
for s in state.get('submissions', []):
    cv = s.get('cv_score', 'N/A')
    lb = s.get('lb_score', 'N/A')
    if isinstance(lb, (int, float)) and isinstance(cv, (int, float)):
        gap = (lb - cv) / cv * 100
        print(f"{s['model_name'][:40]}: CV={cv:.4f}, LB={lb:.4f}, Gap={gap:.1f}%")
    else:
        print(f"{s['model_name'][:40]}: CV={cv}, LB={lb}")

=== SUBMISSIONS ANALYSIS ===
005_no_tta_per_target: CV=0.0623, LB=0.0956, Gap=53.5%
007_intermediate_regularization: CV=0.0688, LB=0.0991, Gap=43.9%
exp_012: Template-Compliant GroupKFold E: CV=0.0844, LB=
exp_017: Replicate exp_004's EXACT Archi: CV=0.0623, LB=0.0956, Gap=53.4%


In [2]:
# Analyze all experiments
print('\n=== EXPERIMENT SCORES ===')
exps = state.get('experiments', [])
scores = [(e['id'], e['score'], e['name'][:50]) for e in exps]
scores_sorted = sorted(scores, key=lambda x: x[1])

print('\nTop 10 by CV:')
for exp_id, score, name in scores_sorted[:10]:
    print(f'{exp_id}: {score:.4f} - {name}')

print('\nBottom 5 by CV:')
for exp_id, score, name in scores_sorted[-5:]:
    print(f'{exp_id}: {score:.4f} - {name}')


=== EXPERIMENT SCORES ===

Top 10 by CV:
exp_004: 0.0623 - 005_no_tta_per_target
exp_016: 0.0623 - exp_017: Replicate exp_004's EXACT Architecture
exp_018: 0.0624 - exp_019: MLP with Strong Regularization + exp_004 
exp_009: 0.0669 - exp_010: MLP + GBDT Ensemble (Like Top Kernel)
exp_008: 0.0673 - 009_diverse_ensemble
exp_017: 0.0681 - exp_018: DRFP-Based Ensemble with Prediction Combi
exp_006: 0.0688 - 007_intermediate_regularization
exp_007: 0.0721 - 008_gaussian_process
exp_002: 0.0805 - 003_simple_rf_regularized
exp_001: 0.0810 - 002_template_compliant_ensemble

Bottom 5 by CV:
exp_010: 0.0841 - exp_011: GroupKFold + Top Kernel Architecture
exp_011: 0.0844 - exp_012: Template-Compliant GroupKFold Ensemble
exp_014: 0.0891 - exp_015: Per-Target + MLP Hybrid with COMBINED Fea
exp_005: 0.0896 - 006_regularized_ridge_baseline
exp_019: 0.0990 - exp_020_gnn_basic


In [3]:
# Key insight: CV-LB gap analysis
print('\n=== CV-LB GAP ANALYSIS ===')
print('\nSubmissions with valid LB scores:')
for s in state.get('submissions', []):
    cv = s.get('cv_score')
    lb = s.get('lb_score')
    if isinstance(lb, (int, float)) and isinstance(cv, (int, float)):
        gap = (lb - cv) / cv * 100
        print(f"CV: {cv:.4f} -> LB: {lb:.4f} (Gap: {gap:.1f}%)")

print('\n=== KEY INSIGHT ===')
print('The CV-LB gap is CONSISTENT at ~53% across all models.')
print('This means the test set has fundamentally different solvents.')
print('\nTo reach target 0.01727:')
print('- If gap stays at 53%: Need CV ~0.011')
print('- If gap reduces to 30%: Need CV ~0.013')
print('- If gap reduces to 20%: Need CV ~0.014')
print('\nCurrent best CV: 0.0623 -> Need 4-5x improvement in CV!')


=== CV-LB GAP ANALYSIS ===

Submissions with valid LB scores:
CV: 0.0623 -> LB: 0.0956 (Gap: 53.5%)
CV: 0.0688 -> LB: 0.0991 (Gap: 43.9%)
CV: 0.0623 -> LB: 0.0956 (Gap: 53.4%)

=== KEY INSIGHT ===
The CV-LB gap is CONSISTENT at ~53% across all models.
This means the test set has fundamentally different solvents.

To reach target 0.01727:
- If gap stays at 53%: Need CV ~0.011
- If gap reduces to 30%: Need CV ~0.013
- If gap reduces to 20%: Need CV ~0.014

Current best CV: 0.0623 -> Need 4-5x improvement in CV!


In [4]:
# What approaches have been tried?
print('\n=== APPROACHES TRIED ===')
approaches = {
    'Tree-based (RF, HGB, ETR, XGB, LGB)': 'exp_002-008, exp_013-018',
    'MLP': 'exp_000-001, exp_009-012, exp_019',
    'Gaussian Process': 'exp_007',
    'Ensemble (MLP+GBDT)': 'exp_009-012',
    'Per-target models': 'exp_003-005, exp_013-018',
    'GNN (basic)': 'exp_020',
    'Optuna hyperparameter tuning': 'exp_013',
    'DRFP features': 'exp_017',
    'Spange descriptors': 'exp_004-005, exp_016-019',
    'ACS descriptors': 'exp_004-005, exp_016-017',
    'Arrhenius kinetics features': 'exp_000-001, exp_004-005',
}

for approach, exps in approaches.items():
    print(f'- {approach}: {exps}')

print('\n=== APPROACHES NOT YET TRIED ===')
not_tried = [
    'Pre-trained molecular representations (ChemBERTa, MolBERT)',
    'Transfer learning from larger datasets',
    'SOAP descriptors + gradient boosting',
    'Uncertainty-aware predictions',
    'Domain adaptation techniques',
    'Multi-fidelity learning',
]
for approach in not_tried:
    print(f'- {approach}')


=== APPROACHES TRIED ===
- Tree-based (RF, HGB, ETR, XGB, LGB): exp_002-008, exp_013-018
- MLP: exp_000-001, exp_009-012, exp_019
- Gaussian Process: exp_007
- Ensemble (MLP+GBDT): exp_009-012
- Per-target models: exp_003-005, exp_013-018
- GNN (basic): exp_020
- Optuna hyperparameter tuning: exp_013
- DRFP features: exp_017
- Spange descriptors: exp_004-005, exp_016-019
- ACS descriptors: exp_004-005, exp_016-017
- Arrhenius kinetics features: exp_000-001, exp_004-005

=== APPROACHES NOT YET TRIED ===
- Pre-trained molecular representations (ChemBERTa, MolBERT)
- Transfer learning from larger datasets
- SOAP descriptors + gradient boosting
- Uncertainty-aware predictions
- Domain adaptation techniques
- Multi-fidelity learning


In [5]:
# Strategic decision analysis
print('\n=== STRATEGIC DECISION ===')
print('\nWith 2 submissions remaining, we need to be strategic.')
print('\nOption A: Submit best model (exp_004/016) again')
print('- Already submitted twice, got LB 0.0956')
print('- No point in submitting again')

print('\nOption B: Try fundamentally different approach')
print('- Pre-trained molecular representations')
print('- Transfer learning from larger datasets')
print('- BUT: May not have time to implement properly')

print('\nOption C: Optimize what works')
print('- exp_004 architecture is best (CV 0.0623)')
print('- Try stronger regularization to reduce CV-LB gap')
print('- Try different feature combinations')
print('- Try ensemble of best models')

print('\nOption D: Focus on reducing CV-LB gap')
print('- The 53% gap is the real problem')
print('- Need approaches that generalize better')
print('- Consider uncertainty-aware predictions')

print('\n=== RECOMMENDATION ===')
print('Given limited time and submissions:')
print('1. DO NOT submit exp_020 (GNN) - it is worse than baseline')
print('2. Focus on reducing CV-LB gap with existing architecture')
print('3. Try ensemble of diverse models for better generalization')
print('4. Save submissions for models that show CV improvement')


=== STRATEGIC DECISION ===

With 2 submissions remaining, we need to be strategic.

Option A: Submit best model (exp_004/016) again
- Already submitted twice, got LB 0.0956
- No point in submitting again

Option B: Try fundamentally different approach
- Pre-trained molecular representations
- Transfer learning from larger datasets
- BUT: May not have time to implement properly

Option C: Optimize what works
- exp_004 architecture is best (CV 0.0623)
- Try stronger regularization to reduce CV-LB gap
- Try different feature combinations
- Try ensemble of best models

Option D: Focus on reducing CV-LB gap
- The 53% gap is the real problem
- Need approaches that generalize better
- Consider uncertainty-aware predictions

=== RECOMMENDATION ===
Given limited time and submissions:
1. DO NOT submit exp_020 (GNN) - it is worse than baseline
2. Focus on reducing CV-LB gap with existing architecture
3. Try ensemble of diverse models for better generalization
4. Save submissions for models that 