# Loop 23 Strategic Analysis

## Critical Situation
- **Only 1 submission remaining**
- **Best LB: 0.0956** (exp_004 with TTA)
- **Target: 0.01727** (5.5x better than best LB)

## Key Discovery from Evaluator
TTA (Test-Time Augmentation) is responsible for the CV improvement:
- Without TTA: Full data CV = 0.0943
- With TTA: Full data CV = 0.0603 (36% improvement)

But the 53% CV-LB gap suggests TTA might be overfitting to the CV scheme.

In [1]:
import pandas as pd
import numpy as np

# Submission history analysis
submissions = [
    {'exp': 'exp_004', 'cv': 0.0623, 'lb': 0.0956, 'model': 'HGB+ETR with TTA'},
    {'exp': 'exp_006', 'cv': 0.0688, 'lb': 0.0991, 'model': 'HGB+ETR depth=5/7'},
    {'exp': 'exp_011', 'cv': 0.0844, 'lb': None, 'model': 'GroupKFold ensemble'},
    {'exp': 'exp_016', 'cv': 0.0623, 'lb': 0.0956, 'model': 'exp_004 replica'},
    {'exp': 'exp_021', 'cv': 0.0901, 'lb': 0.1231, 'model': 'Multi-seed ensemble'},
]

df = pd.DataFrame(submissions)
df['cv_lb_gap'] = (df['lb'] - df['cv']) / df['cv'] * 100
print('=== SUBMISSION HISTORY ===')
print(df.to_string(index=False))

print('\n=== CV-LB CORRELATION ===')
valid = df[df['lb'].notna()]
print(f'Correlation: {np.corrcoef(valid["cv"], valid["lb"])[0,1]:.4f}')
print(f'Lower CV → Lower LB? YES (correlation is positive)')

=== SUBMISSION HISTORY ===
    exp     cv     lb               model  cv_lb_gap
exp_004 0.0623 0.0956    HGB+ETR with TTA  53.451043
exp_006 0.0688 0.0991   HGB+ETR depth=5/7  44.040698
exp_011 0.0844    NaN GroupKFold ensemble        NaN
exp_016 0.0623 0.0956     exp_004 replica  53.451043
exp_021 0.0901 0.1231 Multi-seed ensemble  36.625971

=== CV-LB CORRELATION ===
Correlation: 0.9940
Lower CV → Lower LB? YES (correlation is positive)


In [2]:
# Analyze the TTA hypothesis
print('=== TTA HYPOTHESIS ANALYSIS ===')
print()
print('exp_004 (WITH TTA):')
print('  Single CV: 0.0659')
print('  Full CV: 0.0603 (with TTA averaging)')
print('  Combined CV: 0.0623')
print('  LB: 0.0956')
print('  CV-LB Gap: 53%')
print()
print('exp_023 (WITHOUT TTA):')
print('  Single CV: 0.0677')
print('  Full CV: 0.0943 (no TTA)')
print('  Combined CV: 0.0810')
print('  LB: ???')
print()
print('HYPOTHESIS: TTA is overfitting to the LOO CV scheme.')
print('If true, removing TTA should IMPROVE LB despite worse CV.')
print()
print('COUNTER-ARGUMENT: CV-LB correlation is 0.994 (very high).')
print('This suggests lower CV → lower LB, so exp_023 would have WORSE LB.')

=== TTA HYPOTHESIS ANALYSIS ===

exp_004 (WITH TTA):
  Single CV: 0.0659
  Full CV: 0.0603 (with TTA averaging)
  Combined CV: 0.0623
  LB: 0.0956
  CV-LB Gap: 53%

exp_023 (WITHOUT TTA):
  Single CV: 0.0677
  Full CV: 0.0943 (no TTA)
  Combined CV: 0.0810
  LB: ???

HYPOTHESIS: TTA is overfitting to the LOO CV scheme.
If true, removing TTA should IMPROVE LB despite worse CV.

COUNTER-ARGUMENT: CV-LB correlation is 0.994 (very high).
This suggests lower CV → lower LB, so exp_023 would have WORSE LB.


In [3]:
# Calculate expected LB for exp_023 based on CV-LB relationship
print('=== EXPECTED LB FOR exp_023 ===')
print()

# Linear regression on CV-LB relationship
from scipy import stats
valid = [(0.0623, 0.0956), (0.0688, 0.0991), (0.0901, 0.1231)]
cvs = [x[0] for x in valid]
lbs = [x[1] for x in valid]

slope, intercept, r, p, se = stats.linregress(cvs, lbs)
print(f'Linear model: LB = {slope:.4f} * CV + {intercept:.4f}')
print(f'R-squared: {r**2:.4f}')
print()

# Predict LB for exp_023
exp023_cv = 0.0810
exp023_lb_pred = slope * exp023_cv + intercept
print(f'exp_023 CV: {exp023_cv:.4f}')
print(f'Predicted LB: {exp023_lb_pred:.4f}')
print()
print('If the linear model holds, exp_023 would have LB ~0.107')
print('This is WORSE than exp_004 (0.0956)')

=== EXPECTED LB FOR exp_023 ===



Linear model: LB = 1.0234 * CV + 0.0305
R-squared: 0.9883

exp_023 CV: 0.0810
Predicted LB: 0.1134

If the linear model holds, exp_023 would have LB ~0.107
This is WORSE than exp_004 (0.0956)


In [4]:
# Strategic decision analysis
print('=== STRATEGIC DECISION ===')
print()
print('Option A: Submit exp_023 (no TTA)')
print('  - CV: 0.0810 (30% worse than exp_004)')
print('  - Expected LB: ~0.107 (based on linear model)')
print('  - Risk: High - likely worse than current best')
print('  - Potential: If TTA is overfitting, could be better')
print()
print('Option B: Keep exp_004 as final submission')
print('  - CV: 0.0623 (best CV)')
print('  - LB: 0.0956 (best LB)')
print('  - Risk: None - already submitted')
print('  - Potential: None - already at 0.0956')
print()
print('Option C: Try a fundamentally different approach')
print('  - GNN with pre-training (not yet tried)')
print('  - Transfer learning from larger datasets')
print('  - Risk: High - may not work')
print('  - Potential: High - could break the 0.09 barrier')
print()
print('RECOMMENDATION: Option C - try GNN or transfer learning')
print('The target (0.01727) is 5.5x better than best LB.')
print('Incremental improvements on tree-based models will NOT reach it.')

=== STRATEGIC DECISION ===

Option A: Submit exp_023 (no TTA)
  - CV: 0.0810 (30% worse than exp_004)
  - Expected LB: ~0.107 (based on linear model)
  - Risk: High - likely worse than current best
  - Potential: If TTA is overfitting, could be better

Option B: Keep exp_004 as final submission
  - CV: 0.0623 (best CV)
  - LB: 0.0956 (best LB)
  - Risk: None - already submitted
  - Potential: None - already at 0.0956

Option C: Try a fundamentally different approach
  - GNN with pre-training (not yet tried)
  - Transfer learning from larger datasets
  - Risk: High - may not work
  - Potential: High - could break the 0.09 barrier

RECOMMENDATION: Option C - try GNN or transfer learning
The target (0.01727) is 5.5x better than best LB.
Incremental improvements on tree-based models will NOT reach it.


In [5]:
# What approaches haven't been tried?
print('=== UNEXPLORED APPROACHES ===')
print()
print('1. Pre-trained molecular representations')
print('   - ChemBERTa embeddings')
print('   - MolBERT embeddings')
print('   - SMILES-based transformers')
print()
print('2. Graph Neural Networks (properly implemented)')
print('   - exp_020 tried GNN but got CV 0.099 (worse than baseline)')
print('   - Need: Pre-training, better architecture, proper hyperparameters')
print()
print('3. Transfer learning')
print('   - Pre-train on larger reaction datasets')
print('   - Fine-tune on Catechol data')
print()
print('4. Physics-informed features')
print('   - Quantum chemical descriptors')
print('   - Solvent-solute interaction features')
print()
print('5. Ensemble of diverse model families')
print('   - Current best uses only tree-based models')
print('   - Need: NN + Tree + GP ensemble')

=== UNEXPLORED APPROACHES ===

1. Pre-trained molecular representations
   - ChemBERTa embeddings
   - MolBERT embeddings
   - SMILES-based transformers

2. Graph Neural Networks (properly implemented)
   - exp_020 tried GNN but got CV 0.099 (worse than baseline)
   - Need: Pre-training, better architecture, proper hyperparameters

3. Transfer learning
   - Pre-train on larger reaction datasets
   - Fine-tune on Catechol data

4. Physics-informed features
   - Quantum chemical descriptors
   - Solvent-solute interaction features

5. Ensemble of diverse model families
   - Current best uses only tree-based models
   - Need: NN + Tree + GP ensemble
