# Loop 69 — LB Feedback (exp_068 / 065_clean_exp030_template)

**Result:** CV 0.0083 | **LB 0.0877**

Goal of this notebook:
- Update the empirical CV→LB mapping with this confirmed submission
- Quantify what kind of change is needed (slope/intercept) to reach target **LB < 0.0347**
- Translate that into the next strategy (distribution-shift / applicability-domain correction + mixture interaction representation)


In [None]:
import json
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

state_path = Path('/home/code/session_state.json')
state = json.loads(state_path.read_text())

# Pull submission history from the user-provided list when available.
# Fallback: use what's in session_state if it has submissions with lb_score.

# Manually encode the confirmed (cv, lb) points from prompt (only those with known LB)
points = [
    (0.0111, 0.0982),
    (0.0123, 0.1065),
    (0.0105, 0.0972),
    (0.0104, 0.0969),
    (0.0097, 0.0946),
    (0.0093, 0.0932),
    (0.0092, 0.0936),
    (0.0090, 0.0913),
    (0.0087, 0.0893),
    (0.0085, 0.0887),
    (0.0083, 0.0877),
]

cv = np.array([p[0] for p in points]).reshape(-1, 1)
lb = np.array([p[1] for p in points])

reg = LinearRegression().fit(cv, lb)
pred = reg.predict(cv)

slope = float(reg.coef_[0])
intercept = float(reg.intercept_)
r2 = float(r2_score(lb, pred))

slope, intercept, r2

In [None]:
target = 0.0347

# What LB would we expect at current best CV?
cv_best = float(np.min(cv))
lb_at_bestcv = float(reg.predict([[cv_best]])[0])

# How much intercept reduction is needed at a fixed CV to hit target?
# target = slope*cv + intercept_new => intercept_new = target - slope*cv
intercept_needed_at_bestcv = float(target - slope * cv_best)
intercept_reduction_needed = float(intercept - intercept_needed_at_bestcv)

pd.DataFrame({
    'metric': ['slope','intercept','R2','best_CV_in_points','expected_LB_at_best_CV','target_LB','intercept_needed_at_best_CV','intercept_reduction_needed'],
    'value': [slope, intercept, r2, cv_best, lb_at_bestcv, target, intercept_needed_at_bestcv, intercept_reduction_needed]
})

In [None]:
# Sanity check: residuals (are we systematically above/below line?)
resid = lb - pred
pd.DataFrame({'cv': cv.flatten(), 'lb': lb, 'pred': pred, 'resid': resid}).sort_values('cv')

## Interpretation

- The mapping remains extremely linear (high R²). This means *standard* modeling improvements mainly move us along the same line.
- To reach **LB < 0.0347**, we need to **change the relationship**, especially by reducing systematic OOD error (the effective intercept).

## Strategic implication

Next experiments must be designed to **reduce extrapolation error** on unseen solvents/ramps:
1. **Applicability-domain (AD) / distance-based shrinkage** (cross-fitted): blend model prediction toward a robust baseline when the held-out solvent/mixture is far from the training manifold.
2. **Mixture representation upgrade**: add explicit (A,B,pct) pair-interaction features (concat, differences, products) instead of pure linear blending.
3. **Representation change** (if AD doesn’t move the LB): GNN / ChemBERTa embeddings, but ensure strict template compliance and identical model class in CV vs submission cells.
