# Loop 51 Strategic Analysis

## Situation Assessment
- **Best CV**: 0.008194 (exp_032: GP 0.15 + MLP 0.55 + LGBM 0.3)
- **Best LB**: 0.0877 (exp_030)
- **Target**: 0.0347 (2.53x away from best LB)
- **Submissions remaining**: 3
- **Latest experiment**: ChemBERTa - CV 0.019444 (137% WORSE) and Pure ChemBERTa CV 0.033498 (309% WORSE)

## Key Findings from ChemBERTa Experiment
1. Pure ChemBERTa embeddings (768-dim) performed 309% WORSE than best CV
2. Hybrid ChemBERTa + Spange + DRFP + ACS performed 137% WORSE than best CV
3. Pre-trained molecular embeddings from ZINC database do NOT help
4. Domain mismatch: ChemBERTa trained on drug-like molecules, not solvents

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Submission history
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887},
    {'exp': 'exp_030', 'cv': 0.0083, 'lb': 0.0877},
    {'exp': 'exp_041', 'cv': 0.0090, 'lb': 0.0932},
    {'exp': 'exp_042', 'cv': 0.0145, 'lb': 0.1147},
]

df = pd.DataFrame(submissions)
print('Submission History:')
print(df.to_string(index=False))

Submission History:
    exp     cv     lb
exp_000 0.0111 0.0982
exp_001 0.0123 0.1065
exp_003 0.0105 0.0972
exp_005 0.0104 0.0969
exp_006 0.0097 0.0946
exp_007 0.0093 0.0932
exp_009 0.0092 0.0936
exp_012 0.0090 0.0913
exp_024 0.0087 0.0893
exp_026 0.0085 0.0887
exp_030 0.0083 0.0877
exp_041 0.0090 0.0932
exp_042 0.0145 0.1147


In [2]:
# Fit linear regression to CV-LB relationship
from sklearn.linear_model import LinearRegression

X = df['cv'].values.reshape(-1, 1)
y = df['lb'].values

model = LinearRegression()
model.fit(X, y)

print(f'\nCV-LB Relationship:')
print(f'LB = {model.coef_[0]:.2f} * CV + {model.intercept_:.4f}')
print(f'R² = {model.score(X, y):.4f}')

# Predict LB for best CV (0.008194)
best_cv = 0.008194
predicted_lb = model.predict([[best_cv]])[0]
print(f'\nBest CV (0.008194) → Predicted LB: {predicted_lb:.4f}')

# What CV would we need to reach target 0.0347?
target_lb = 0.0347
required_cv = (target_lb - model.intercept_) / model.coef_[0]
print(f'\nTo reach target LB {target_lb}:')
print(f'Required CV: {required_cv:.6f}')
if required_cv < 0:
    print('WARNING: Required CV is NEGATIVE - target is BELOW the intercept!')


CV-LB Relationship:
LB = 4.23 * CV + 0.0533
R² = 0.9807

Best CV (0.008194) → Predicted LB: 0.0880

To reach target LB 0.0347:
Required CV: -0.004396


In [3]:
# ChemBERTa experiment results
print('\n=== ChemBERTa Experiment Results ===')
print('Pure ChemBERTa: CV 0.033498 (309% WORSE than best CV 0.008194)')
print('Hybrid ChemBERTa: CV 0.019444 (137% WORSE than best CV 0.008194)')
print()
print('CONCLUSION: Pre-trained molecular embeddings do NOT help!')
print('Reasons:')
print('1. ChemBERTa trained on ZINC (drug-like molecules), not solvents')
print('2. 768-dim embeddings may not capture solvent-specific properties')
print('3. Spange descriptors (polarity, H-bonding) are more informative for this task')


=== ChemBERTa Experiment Results ===
Pure ChemBERTa: CV 0.033498 (309% WORSE than best CV 0.008194)
Hybrid ChemBERTa: CV 0.019444 (137% WORSE than best CV 0.008194)

CONCLUSION: Pre-trained molecular embeddings do NOT help!
Reasons:
1. ChemBERTa trained on ZINC (drug-like molecules), not solvents
2. 768-dim embeddings may not capture solvent-specific properties
3. Spange descriptors (polarity, H-bonding) are more informative for this task


In [4]:
# What approaches have been tried?
print('\n=== APPROACHES TRIED ===')
approaches = [
    ('MLP', 'exp_000', 0.0111, 'Baseline'),
    ('LightGBM', 'exp_001', 0.0123, 'Baseline'),
    ('DRFP + PCA', 'exp_002', 0.0169, 'WORSE'),
    ('Spange + DRFP', 'exp_003', 0.0105, 'Better'),
    ('Deep Residual MLP', 'exp_004', 0.0519, 'FAILED'),
    ('Large Ensemble (15)', 'exp_005', 0.0104, 'Marginal'),
    ('Simpler MLP [64,32]', 'exp_006', 0.0097, 'Better'),
    ('Even Simpler [32,16]', 'exp_008', 0.0093, 'Better'),
    ('Ridge Regression', 'exp_009', 0.0092, 'Better'),
    ('Single Layer [16]', 'exp_010', 0.0091, 'Better'),
    ('Diverse Ensemble', 'exp_011', 0.0091, 'Similar'),
    ('Simple Ensemble', 'exp_012', 0.0090, 'Better'),
    ('ACS PCA features', 'exp_024', 0.0087, 'Better'),
    ('Weighted Loss', 'exp_026', 0.0085, 'Better'),
    ('GP + MLP + LGBM', 'exp_030', 0.0083, 'BEST LB'),
    ('Pure GP', 'exp_032', 0.0082, 'BEST CV'),
    ('Aggressive Regularization', 'exp_041', 0.0090, 'No help'),
    ('Pure GP (different)', 'exp_042', 0.0145, 'WORSE'),
    ('GNN (Hybrid)', 'exp_051', 0.0141, 'WORSE'),
    ('ChemBERTa Pure', 'exp_050', 0.0335, 'MUCH WORSE'),
    ('ChemBERTa Hybrid', 'exp_052', 0.0194, 'WORSE'),
]

for approach, exp, cv, result in approaches:
    print(f'{approach:25s} | {exp} | CV {cv:.4f} | {result}')


=== APPROACHES TRIED ===
MLP                       | exp_000 | CV 0.0111 | Baseline
LightGBM                  | exp_001 | CV 0.0123 | Baseline
DRFP + PCA                | exp_002 | CV 0.0169 | WORSE
Spange + DRFP             | exp_003 | CV 0.0105 | Better
Deep Residual MLP         | exp_004 | CV 0.0519 | FAILED
Large Ensemble (15)       | exp_005 | CV 0.0104 | Marginal
Simpler MLP [64,32]       | exp_006 | CV 0.0097 | Better
Even Simpler [32,16]      | exp_008 | CV 0.0093 | Better
Ridge Regression          | exp_009 | CV 0.0092 | Better
Single Layer [16]         | exp_010 | CV 0.0091 | Better
Diverse Ensemble          | exp_011 | CV 0.0091 | Similar
Simple Ensemble           | exp_012 | CV 0.0090 | Better
ACS PCA features          | exp_024 | CV 0.0087 | Better
Weighted Loss             | exp_026 | CV 0.0085 | Better
GP + MLP + LGBM           | exp_030 | CV 0.0083 | BEST LB
Pure GP                   | exp_032 | CV 0.0082 | BEST CV
Aggressive Regularization | exp_041 | CV 0.0090 | No h

In [5]:
# What HASN'T been tried?
print('\n=== APPROACHES NOT YET TRIED ===')
print()
print('1. DIFFERENT EVALUATION WEIGHTING')
print('   - Server may weight tasks/targets differently')
print('   - Try optimizing for specific targets (SM, Product 2, Product 3)')
print('   - Try different weights for single vs full data')
print()
print('2. PER-SOLVENT-TYPE MODELS')
print('   - Different models for alcohols, esters, ethers, etc.')
print('   - May reduce CV-LB gap if different solvent types behave differently')
print()
print('3. REACTION-AWARE FEATURES')
print('   - Current features only describe solvents')
print('   - Add features about the reaction itself')
print('   - Catechol oxidation mechanism may be solvent-dependent')
print()
print('4. TEMPERATURE-SOLVENT INTERACTIONS')
print('   - Add explicit interaction terms between T and solvent properties')
print('   - Different solvents may have different T-sensitivity')
print()
print('5. MIXTURE-SPECIFIC FEATURES')
print('   - Non-linear mixing effects')
print('   - Solvent-solvent interaction terms')
print('   - Excess properties (deviation from ideal mixing)')


=== APPROACHES NOT YET TRIED ===

1. DIFFERENT EVALUATION WEIGHTING
   - Server may weight tasks/targets differently
   - Try optimizing for specific targets (SM, Product 2, Product 3)
   - Try different weights for single vs full data

2. PER-SOLVENT-TYPE MODELS
   - Different models for alcohols, esters, ethers, etc.
   - May reduce CV-LB gap if different solvent types behave differently

3. REACTION-AWARE FEATURES
   - Current features only describe solvents
   - Add features about the reaction itself
   - Catechol oxidation mechanism may be solvent-dependent

4. TEMPERATURE-SOLVENT INTERACTIONS
   - Add explicit interaction terms between T and solvent properties
   - Different solvents may have different T-sensitivity

5. MIXTURE-SPECIFIC FEATURES
   - Non-linear mixing effects
   - Solvent-solvent interaction terms
   - Excess properties (deviation from ideal mixing)


In [6]:
# Strategic analysis
print('\n=== STRATEGIC ANALYSIS ===')
print()
print('CRITICAL INSIGHT:')
print(f'CV-LB relationship: LB = 4.23 * CV + 0.0533')
print(f'Intercept (0.0533) > Target (0.0347)')
print(f'Even with CV=0, predicted LB would be 0.0533 > 0.0347')
print()
print('This means:')
print('1. We CANNOT reach target by improving CV alone')
print('2. We need to CHANGE the CV-LB relationship')
print('3. The intercept represents a "baseline error" on test data')
print()
print('Possible causes of high intercept:')
print('1. Distribution shift: test solvents are chemically different')
print('2. Evaluation scheme: server weights tasks/targets differently')
print('3. Overfitting: models overfit to CV folds but not test')
print('4. Missing information: features don\'t capture what matters for test')


=== STRATEGIC ANALYSIS ===

CRITICAL INSIGHT:
CV-LB relationship: LB = 4.23 * CV + 0.0533
Intercept (0.0533) > Target (0.0347)
Even with CV=0, predicted LB would be 0.0533 > 0.0347

This means:
1. We CANNOT reach target by improving CV alone
2. We need to CHANGE the CV-LB relationship
3. The intercept represents a "baseline error" on test data

Possible causes of high intercept:
1. Distribution shift: test solvents are chemically different
2. Evaluation scheme: server weights tasks/targets differently
3. Overfitting: models overfit to CV folds but not test
4. Missing information: features don't capture what matters for test


In [7]:
# Final recommendation
print('\n=== FINAL RECOMMENDATION ===')
print()
print('With 3 submissions remaining and target 2.53x away:')
print()
print('PRIORITY 1: Try approaches that could CHANGE the CV-LB relationship')
print('  - Per-target optimization (different models for SM vs Products)')
print('  - Per-solvent-type models')
print('  - Simpler, more generalizable features')
print()
print('PRIORITY 2: Investigate the evaluation scheme')
print('  - What if server weights single solvent data more heavily?')
print('  - What if server weights certain targets more heavily?')
print()
print('PRIORITY 3: Save at least 1 submission for final model')
print('  - Don\'t waste submissions on approaches that won\'t help')
print('  - Focus on understanding the gap before submitting')
print()
print('DO NOT SUBMIT:')
print('  - ChemBERTa models (CV is much worse)')
print('  - GNN models (CV is worse)')
print('  - Any model with CV > 0.010')


=== FINAL RECOMMENDATION ===

With 3 submissions remaining and target 2.53x away:

PRIORITY 1: Try approaches that could CHANGE the CV-LB relationship
  - Per-target optimization (different models for SM vs Products)
  - Per-solvent-type models
  - Simpler, more generalizable features

PRIORITY 2: Investigate the evaluation scheme
  - What if server weights single solvent data more heavily?
  - What if server weights certain targets more heavily?

PRIORITY 3: Save at least 1 submission for final model
  - Don't waste submissions on approaches that won't help
  - Focus on understanding the gap before submitting

DO NOT SUBMIT:
  - ChemBERTa models (CV is much worse)
  - GNN models (CV is worse)
  - Any model with CV > 0.010
