# Loop 15 Analysis: Why exp_015 Full Data Performance Degraded

## Key Question
exp_015 achieved BETTER single solvent (0.0638 vs 0.0659) but MUCH WORSE full data (0.1027 vs 0.0603).

What's different between exp_004 and exp_015?

In [1]:
import pandas as pd
import numpy as np

# Key differences between exp_004 and exp_015:
print("=== exp_004 (Best CV 0.0623, LB 0.0956) ===")
print("HGB: max_depth=7, max_iter=700, learning_rate=0.04")
print("ETR: n_estimators=500, max_depth=10, min_samples_leaf=2")
print("Features: 0.8*ACS_PCA + 0.2*Spange")
print("Arrhenius: YES (1/T, ln(t), t*T interaction)")
print("MLP: NO")
print("TTA: NO")
print()
print("=== exp_015 (CV 0.0891) ===")
print("HGB: max_depth=None (unlimited)")
print("ETR: max_depth=None (unlimited)")
print("Features: 0.8*ACS_PCA + 0.2*Spange")
print("Arrhenius: NO (MISSING!)")
print("MLP: YES [128, 64, 32]")
print("TTA: NO")
print()
print("=== KEY DIFFERENCES ===")
print("1. exp_015 uses UNLIMITED depth -> overfitting on full data")
print("2. exp_015 is MISSING Arrhenius kinetics features")
print("3. exp_015 adds MLP component")

=== exp_004 (Best CV 0.0623, LB 0.0956) ===
HGB: max_depth=7, max_iter=700, learning_rate=0.04
ETR: n_estimators=500, max_depth=10, min_samples_leaf=2
Features: 0.8*ACS_PCA + 0.2*Spange
Arrhenius: YES (1/T, ln(t), t*T interaction)
MLP: NO
TTA: NO

=== exp_015 (CV 0.0891) ===
HGB: max_depth=None (unlimited)
ETR: max_depth=None (unlimited)
Features: 0.8*ACS_PCA + 0.2*Spange
Arrhenius: NO (MISSING!)
MLP: YES [128, 64, 32]
TTA: NO

=== KEY DIFFERENCES ===
1. exp_015 uses UNLIMITED depth -> overfitting on full data
2. exp_015 is MISSING Arrhenius kinetics features
3. exp_015 adds MLP component


In [2]:
# Load data to understand the problem
DATA_PATH = '/home/data'

df_single = pd.read_csv(f'{DATA_PATH}/catechol_single_solvent_yields.csv')
df_full = pd.read_csv(f'{DATA_PATH}/catechol_full_data_yields.csv')

print(f"Single solvent: {len(df_single)} samples, {df_single['SOLVENT NAME'].nunique()} solvents")
print(f"Full data: {len(df_full)} samples, {df_full[['SOLVENT A NAME', 'SOLVENT B NAME']].drop_duplicates().shape[0]} ramps")

# Check if full data has more complexity
print(f"\nSingle solvent features: Residence Time, Temperature, SOLVENT NAME")
print(f"Full data features: Residence Time, Temperature, SOLVENT A NAME, SOLVENT B NAME, SolventB%")
print(f"\nFull data has 2 solvents + mixing ratio -> more complex!")

Single solvent: 656 samples, 24 solvents
Full data: 1227 samples, 13 ramps

Single solvent features: Residence Time, Temperature, SOLVENT NAME
Full data features: Residence Time, Temperature, SOLVENT A NAME, SOLVENT B NAME, SolventB%

Full data has 2 solvents + mixing ratio -> more complex!


In [3]:
# Hypothesis: Deep models overfit on full data because:
# 1. More features (2 solvents + mixing ratio)
# 2. Fewer samples per ramp (13 ramps vs 24 solvents)
# 3. Non-linear mixing effects are hard to generalize

# Check samples per ramp
ramps = df_full.groupby(['SOLVENT A NAME', 'SOLVENT B NAME']).size()
print(f"Samples per ramp: {ramps.describe()}")
print(f"\nMin samples: {ramps.min()}, Max samples: {ramps.max()}")

# Check samples per solvent
solvents = df_single.groupby('SOLVENT NAME').size()
print(f"\nSamples per solvent: {solvents.describe()}")
print(f"Min samples: {solvents.min()}, Max samples: {solvents.max()}")

Samples per ramp: count     13.000000
mean      94.384615
std       41.552253
min       34.000000
25%       36.000000
50%      122.000000
75%      125.000000
max      127.000000
dtype: float64

Min samples: 34, Max samples: 127

Samples per solvent: count    24.000000
mean     27.333333
std      13.528285
min       5.000000
25%      18.000000
50%      22.000000
75%      37.000000
max      59.000000
dtype: float64
Min samples: 5, Max samples: 59


In [4]:
# CRITICAL INSIGHT:
# Full data has ~100 samples per ramp, single has ~30 samples per solvent
# But full data has MORE complexity (2 solvents + mixing ratio)
# Deep models can memorize the training patterns but fail on unseen ramps

# SOLUTION:
# 1. Use SHALLOWER models for full data (like exp_004: depth=7/10)
# 2. Add Arrhenius kinetics features (1/T, ln(t), interaction)
# 3. Maybe REMOVE MLP for full data (it may be overfitting)

print("=== RECOMMENDED NEXT EXPERIMENT ===")
print("1. Use exp_004's exact hyperparameters (depth=7/10)")
print("2. Add Arrhenius kinetics features (MISSING in exp_015!)")
print("3. Try WITHOUT MLP for full data")
print("4. Keep MLP for single solvent (where it works well)")
print()
print("=== ALTERNATIVE: SEPARATE HYPERPARAMETERS ===")
print("Single solvent: Deep models (depth=None) + MLP")
print("Full data: Shallow models (depth=7/10) + NO MLP")
print("This is ALLOWED per competition rules!")

=== RECOMMENDED NEXT EXPERIMENT ===
1. Use exp_004's exact hyperparameters (depth=7/10)
2. Add Arrhenius kinetics features (MISSING in exp_015!)
3. Try WITHOUT MLP for full data
4. Keep MLP for single solvent (where it works well)

=== ALTERNATIVE: SEPARATE HYPERPARAMETERS ===
Single solvent: Deep models (depth=None) + MLP
Full data: Shallow models (depth=7/10) + NO MLP
This is ALLOWED per competition rules!


In [5]:
# Let's also check the CV-LB gap pattern
print("=== CV-LB GAP ANALYSIS ===")
print()
print("exp_004: CV 0.0623 -> LB 0.0956 (53% gap)")
print("exp_006: CV 0.0688 -> LB 0.0991 (44% gap)")
print("exp_011: CV 0.0844 -> LB FAILED (GroupKFold broke submission)")
print()
print("Pattern: Higher CV -> Higher LB (positive correlation)")
print("But the gap is ~50% - this is expected for unseen solvents")
print()
print("exp_015: CV 0.0891 -> Expected LB ~0.13 (if 50% gap)")
print("This is WORSE than exp_004's LB 0.0956")
print()
print("CONCLUSION: exp_015 should NOT be submitted")
print("Need to improve CV first, then submit")

=== CV-LB GAP ANALYSIS ===

exp_004: CV 0.0623 -> LB 0.0956 (53% gap)
exp_006: CV 0.0688 -> LB 0.0991 (44% gap)
exp_011: CV 0.0844 -> LB FAILED (GroupKFold broke submission)

Pattern: Higher CV -> Higher LB (positive correlation)
But the gap is ~50% - this is expected for unseen solvents

exp_015: CV 0.0891 -> Expected LB ~0.13 (if 50% gap)
This is WORSE than exp_004's LB 0.0956

CONCLUSION: exp_015 should NOT be submitted
Need to improve CV first, then submit


In [6]:
# FINAL RECOMMENDATION:
print("=== FINAL RECOMMENDATION FOR NEXT EXPERIMENT ===")
print()
print("OPTION A: Fix exp_015 by adding Arrhenius features + shallow models")
print("- Add Arrhenius kinetics: 1/T, ln(t), t*T interaction")
print("- Use depth=7 for HGB, depth=10 for ETR (like exp_004)")
print("- Keep MLP for single solvent only")
print("- Expected CV: ~0.0623 (matching exp_004)")
print()
print("OPTION B: Separate hyperparameters for single vs full")
print("- Single: Deep models + MLP (current exp_015 approach)")
print("- Full: Shallow models + NO MLP (exp_004 approach)")
print("- This is ALLOWED per competition rules")
print()
print("OPTION C: Ensemble exp_004 + exp_015")
print("- exp_004 is better for full data (0.0603)")
print("- exp_015 is better for single solvent (0.0638)")
print("- Ensemble: Use exp_015 for single, exp_004 for full")
print("- Expected CV: 0.0638 * 0.35 + 0.0603 * 0.65 = 0.0615")
print()
print("RECOMMENDED: OPTION C (Ensemble) - quickest path to improvement")

=== FINAL RECOMMENDATION FOR NEXT EXPERIMENT ===

OPTION A: Fix exp_015 by adding Arrhenius features + shallow models
- Add Arrhenius kinetics: 1/T, ln(t), t*T interaction
- Use depth=7 for HGB, depth=10 for ETR (like exp_004)
- Keep MLP for single solvent only
- Expected CV: ~0.0623 (matching exp_004)

OPTION B: Separate hyperparameters for single vs full
- Single: Deep models + MLP (current exp_015 approach)
- Full: Shallow models + NO MLP (exp_004 approach)
- This is ALLOWED per competition rules

OPTION C: Ensemble exp_004 + exp_015
- exp_004 is better for full data (0.0603)
- exp_015 is better for single solvent (0.0638)
- Ensemble: Use exp_015 for single, exp_004 for full
- Expected CV: 0.0638 * 0.35 + 0.0603 * 0.65 = 0.0615

RECOMMENDED: OPTION C (Ensemble) - quickest path to improvement
