# Loop 63 Analysis - Final Push

## Situation
- Best LB: 0.0873 (exp_032)
- Target: 0.0702
- Gap: 24.3%
- 4 submissions remaining

## Key Insights from Kernels
1. matthewmaree: CatBoost + XGBoost ensemble with correlation-filtered features
2. mixall: MLP + XGB + RF + LGBM ensemble
3. Both use multiple feature sources with filtering

In [1]:
import pandas as pd
import numpy as np
import json

# Load session state to see all experiments
with open('/home/code/session_state.json') as f:
    state = json.load(f)

# Get submission history
submissions = state.get('submissions', [])
print('Submission History:')
for s in submissions:
    print(f"  {s.get('experiment_id', 'N/A')}: CV={s.get('cv_score', 'N/A'):.4f}, LB={s.get('lb_score', 'N/A'):.4f}")

# Calculate CV-LB relationship
cv_scores = [s.get('cv_score', 0) for s in submissions if s.get('cv_score')]
lb_scores = [s.get('lb_score', 0) for s in submissions if s.get('lb_score')]

if len(cv_scores) >= 2:
    from scipy import stats
    slope, intercept, r_value, p_value, std_err = stats.linregress(cv_scores, lb_scores)
    print(f'\nCV-LB Relationship: LB = {slope:.2f}*CV + {intercept:.4f} (R²={r_value**2:.3f})')
    print(f'To hit target 0.0702: Need CV = {(0.0702 - intercept) / slope:.6f}')

Submission History:
  exp_000: CV=0.0111, LB=0.0982
  exp_001: CV=0.0123, LB=0.1065
  exp_003: CV=0.0105, LB=0.0972
  exp_005: CV=0.0104, LB=0.0969
  exp_006: CV=0.0097, LB=0.0946
  exp_007: CV=0.0093, LB=0.0932
  exp_009: CV=0.0092, LB=0.0936
  exp_012: CV=0.0090, LB=0.0913
  exp_024: CV=0.0087, LB=0.0893
  exp_026: CV=0.0085, LB=0.0887
  exp_030: CV=0.0083, LB=0.0877
  exp_035: CV=0.0098, LB=0.0970
  exp_032: CV=0.0082, LB=0.0873



CV-LB Relationship: LB = 4.34*CV + 0.0523 (R²=0.958)
To hit target 0.0702: Need CV = 0.004136


In [2]:
# Analyze what worked and what didn't
experiments = state.get('experiments', [])

# Group by score
best_cv = min([e.get('score', 1.0) for e in experiments])
print(f'Best CV achieved: {best_cv:.6f}')

# Find experiments with best CV
best_exps = [e for e in experiments if e.get('score', 1.0) < 0.009]
print(f'\nExperiments with CV < 0.009:')
for e in best_exps:
    print(f"  {e.get('id')}: {e.get('name')} - CV={e.get('score', 'N/A'):.6f}")

Best CV achieved: 0.008194

Experiments with CV < 0.009:
  exp_010: Diverse Ensemble: MLP[32,16] + LightGBM + MLP[64,32] - CV=0.008829
  exp_011: Simple Ensemble: [32,16] MLP + LightGBM Only - CV=0.008785
  exp_022: ACS PCA Features - 4.47% CV Improvement - CV=0.008601
  exp_023: ACS PCA Compliant Submission - CV=0.008964
  exp_024: ACS PCA Fixed Compliant - HuberLoss + Scheduler - CV=0.008689
  exp_026: Weighted Loss Joint Model - 2.58% Improvement - CV=0.008465
  exp_028: Four-Model Ensemble (MLP+LGBM+XGB+CatBoost) - 2.47% Worse - CV=0.008674
  exp_030: GP+MLP+LGBM Ensemble - 1.97% Improvement - CV=0.008298
  exp_032: Lower GP Weight (0.15) + Higher MLP (0.55) - NEW BEST CV! - CV=0.008194
  exp_033: No GP (MLP 0.6 + LGBM 0.4) - 3.29% WORSE - CV=0.008463
  exp_036: exp_035 Regenerated - Best CV Model (SUBMIT) - CV=0.008194
  exp_038: exp_035 Best Model - Ready for Submission - CV=0.008194
  exp_039: exp_035 Regenerated - Best CV Model for Submission - CV=0.008194
  exp_044: Hybrid Mod

In [3]:
# Key question: What approaches haven't been tried?
# From kernels:
# 1. CatBoost with MultiRMSE loss (matthewmaree)
# 2. Correlation-based feature filtering (matthewmaree)
# 3. Different ensemble weights for single vs full (matthewmaree)
# 4. Clipping + normalization (matthewmaree)

# Our experiments tried:
# - MLP, LightGBM, XGBoost, CatBoost, RandomForest, GP
# - Various feature combinations
# - Various ensemble weights

# What's different in matthewmaree?
# 1. Uses ALL feature sources (spange, acs_pca, drfps, fragprints, smiles)
# 2. Correlation filtering with threshold 0.90
# 3. CatBoost with specific hyperparameters
# 4. Different weights for single (7:6) vs full (1:2)

print('Key differences in matthewmaree kernel:')
print('1. Uses ALL feature sources with correlation filtering')
print('2. CatBoost with MultiRMSE loss')
print('3. Different ensemble weights for single vs full')
print('4. Clipping + normalization of predictions')

Key differences in matthewmaree kernel:
1. Uses ALL feature sources with correlation filtering
2. CatBoost with MultiRMSE loss
3. Different ensemble weights for single vs full
4. Clipping + normalization of predictions


In [4]:
# Let's check what features we have available
import os

data_path = '/home/data'
print('Available data files:')
for f in os.listdir(data_path):
    print(f'  {f}')

# Load and check feature dimensions
spange = pd.read_csv(f'{data_path}/spange_descriptors_lookup.csv')
drfp = pd.read_csv(f'{data_path}/drfps_catechol_lookup.csv')
acs_pca = pd.read_csv(f'{data_path}/acs_pca_descriptors_lookup.csv')
fragprints = pd.read_csv(f'{data_path}/fragprints_lookup.csv')

print(f'\nFeature dimensions:')
print(f'  Spange: {spange.shape}')
print(f'  DRFP: {drfp.shape}')
print(f'  ACS PCA: {acs_pca.shape}')
print(f'  Fragprints: {fragprints.shape}')

Available data files:
  smiles_lookup.csv
  drfps_catechol_lookup.csv
  acs_pca_descriptors_lookup.csv
  catechol_single_solvent_yields.csv
  instructions.txt
  catechol_full_data_yields.csv
  fragprints_lookup.csv
  description.md
  spange_descriptors_lookup.csv
  utils.py

Feature dimensions:
  Spange: (26, 14)
  DRFP: (24, 2049)
  ACS PCA: (24, 6)
  Fragprints: (24, 2134)


In [5]:
# Strategy for final push:
# 1. Try CatBoost + XGBoost ensemble like matthewmaree
# 2. Use correlation-filtered features
# 3. Different weights for single vs full

# But first, let's understand the CV-LB gap better
# The gap is structural - improving CV alone won't help
# We need to reduce the intercept

# Options:
# 1. Try a fundamentally different approach (GNN failed)
# 2. Try different feature engineering
# 3. Try different model architecture
# 4. Try different ensemble strategy

print('Final strategy options:')
print('1. CatBoost + XGBoost ensemble (matthewmaree style)')
print('2. Try fragprints features (not used in our best model)')
print('3. Try correlation filtering (not used in our best model)')
print('4. Try different ensemble weights for single vs full')

Final strategy options:
1. CatBoost + XGBoost ensemble (matthewmaree style)
2. Try fragprints features (not used in our best model)
3. Try correlation filtering (not used in our best model)
4. Try different ensemble weights for single vs full
