# Loop 71 Analysis: LB Feedback for exp_067

## Key Result
- **exp_067** (sigmoid_output): CV=0.0083, LB=0.0877
- This matches exp_030's LB score (0.0877) exactly!

## Critical Observation
The sigmoid output fix didn't change the LB score - exp_030 and exp_067 both got LB=0.0877.
This confirms the submission pipeline is working correctly.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# All submissions with confirmed LB scores
submissions = [
    {'exp': 'exp_000', 'cv': 0.011081, 'lb': 0.09816},
    {'exp': 'exp_001', 'cv': 0.012297, 'lb': 0.10649},
    {'exp': 'exp_003', 'cv': 0.010501, 'lb': 0.09719},
    {'exp': 'exp_005', 'cv': 0.01043, 'lb': 0.09691},
    {'exp': 'exp_006', 'cv': 0.009749, 'lb': 0.09457},
    {'exp': 'exp_007', 'cv': 0.009262, 'lb': 0.09316},
    {'exp': 'exp_009', 'cv': 0.009192, 'lb': 0.09364},
    {'exp': 'exp_012', 'cv': 0.009004, 'lb': 0.09134},
    {'exp': 'exp_024', 'cv': 0.008689, 'lb': 0.08929},
    {'exp': 'exp_026', 'cv': 0.008465, 'lb': 0.08875},
    {'exp': 'exp_030', 'cv': 0.008298, 'lb': 0.08772},
    {'exp': 'exp_035', 'cv': 0.009825, 'lb': 0.09696},
    {'exp': 'exp_067', 'cv': 0.008303, 'lb': 0.08774},  # NEW - sigmoid output
]

df = pd.DataFrame(submissions)
print('=== CV-LB Relationship Analysis ===')
print(f'Number of submissions: {len(df)}')
print(f'Best CV: {df["cv"].min():.6f} ({df.loc[df["cv"].idxmin(), "exp"]})')
print(f'Best LB: {df["lb"].min():.6f} ({df.loc[df["lb"].idxmin(), "exp"]})')
print()

In [None]:
# Fit linear regression: LB = slope * CV + intercept
X = df['cv'].values.reshape(-1, 1)
y = df['lb'].values

reg = LinearRegression()
reg.fit(X, y)

slope = reg.coef_[0]
intercept = reg.intercept_
r2 = reg.score(X, y)

print('=== Linear Fit: LB = slope * CV + intercept ===')
print(f'Slope: {slope:.4f}')
print(f'Intercept: {intercept:.6f}')
print(f'R²: {r2:.4f}')
print()

# Target analysis
target = 0.0347
print('=== Target Analysis ===')
print(f'Target LB: {target}')
print(f'Intercept: {intercept:.6f}')
print(f'Gap (intercept - target): {intercept - target:.6f}')
print()

# Required CV to hit target
if slope > 0:
    required_cv = (target - intercept) / slope
    print(f'Required CV to hit target: {required_cv:.6f}')
    if required_cv < 0:
        print('>>> IMPOSSIBLE: Required CV is NEGATIVE!')
        print('>>> Standard CV optimization CANNOT reach target!')
    else:
        print(f'>>> Need to reduce CV from {df["cv"].min():.6f} to {required_cv:.6f}')
        print(f'>>> That\'s a {(df["cv"].min() - required_cv) / df["cv"].min() * 100:.1f}% reduction')

In [None]:
# Plot CV vs LB
plt.figure(figsize=(10, 6))
plt.scatter(df['cv'], df['lb'], s=100, alpha=0.7, label='Submissions')

# Add labels for each point
for _, row in df.iterrows():
    plt.annotate(row['exp'].replace('exp_', ''), (row['cv'], row['lb']), 
                 textcoords='offset points', xytext=(5, 5), fontsize=8)

# Plot regression line
cv_range = np.linspace(df['cv'].min() - 0.001, df['cv'].max() + 0.001, 100)
lb_pred = slope * cv_range + intercept
plt.plot(cv_range, lb_pred, 'r--', label=f'LB = {slope:.2f}*CV + {intercept:.4f} (R²={r2:.3f})')

# Plot target line
plt.axhline(y=target, color='g', linestyle=':', linewidth=2, label=f'Target LB = {target}')

# Plot intercept
plt.axhline(y=intercept, color='orange', linestyle=':', linewidth=1, label=f'Intercept = {intercept:.4f}')

plt.xlabel('CV Score (MSE)')
plt.ylabel('LB Score (MSE)')
plt.title('CV vs LB Relationship - All Submissions')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/home/code/exploration/cv_lb_relationship_loop71.png', dpi=150)
plt.show()
print('Plot saved to /home/code/exploration/cv_lb_relationship_loop71.png')

In [None]:
# Analyze residuals - are there any outliers that break the linear pattern?
df['lb_predicted'] = slope * df['cv'] + intercept
df['residual'] = df['lb'] - df['lb_predicted']
df['residual_pct'] = df['residual'] / df['lb'] * 100

print('=== Residual Analysis ===')
print(df[['exp', 'cv', 'lb', 'lb_predicted', 'residual', 'residual_pct']].to_string())
print()
print(f'Mean absolute residual: {df["residual"].abs().mean():.6f}')
print(f'Max absolute residual: {df["residual"].abs().max():.6f} ({df.loc[df["residual"].abs().idxmax(), "exp"]})')
print()
print('>>> All submissions fall on the same CV-LB line!')
print('>>> This confirms DISTRIBUTION SHIFT is the bottleneck, not model quality.')

## Key Findings

### 1. CV-LB Relationship is HIGHLY LINEAR (R² = 0.95+)
- All model types (MLP, LGBM, XGB, GP, CatBoost) fall on the same line
- LB ≈ 4.3 * CV + 0.053

### 2. INTERCEPT PROBLEM
- Intercept (0.053) > Target (0.0347)
- Even with CV = 0, predicted LB would be 0.053
- Required CV to hit target is NEGATIVE (impossible)

### 3. WHAT THIS MEANS
- Standard CV optimization CANNOT reach the target
- The intercept represents STRUCTURAL DISTRIBUTION SHIFT
- Test solvents are fundamentally different from training solvents
- We need approaches that REDUCE THE INTERCEPT, not just improve CV

### 4. STRATEGIES TO REDUCE INTERCEPT
1. **Extrapolation detection** - Identify when test solvent is far from training
2. **Conservative predictions** - Blend toward mean when extrapolating
3. **Physics-informed features** - Constraints that hold for unseen solvents
4. **Study top public kernels** - They may have solved this problem

In [None]:
# What would it take to reach the target?
print('=== PATH TO TARGET ===')
print(f'Current best LB: {df["lb"].min():.6f}')
print(f'Target LB: {target}')
print(f'Gap to target: {df["lb"].min() - target:.6f}')
print()

# If we could reduce the intercept
print('=== INTERCEPT REDUCTION SCENARIOS ===')
for new_intercept in [0.045, 0.040, 0.035, 0.030, 0.025]:
    required_cv = (target - new_intercept) / slope
    print(f'If intercept = {new_intercept:.3f}: Required CV = {required_cv:.6f}')
    if required_cv > 0:
        print(f'  >>> ACHIEVABLE with CV improvement!')
    else:
        print(f'  >>> Still impossible (negative CV required)')

In [None]:
# Check if any experiments have pending LB scores that might break the pattern
print('=== PENDING SUBMISSIONS ===')
pending = [
    {'exp': 'exp_049', 'cv': 0.008092, 'notes': 'CatBoost + XGBoost'},
    {'exp': 'exp_050', 'cv': 0.008092, 'notes': 'CatBoost + XGBoost FIXED'},
    {'exp': 'exp_052', 'cv': 0.01088, 'notes': 'CatBoost + XGBoost CLIPPED'},
    {'exp': 'exp_053', 'cv': 0.008092, 'notes': 'Exact Template'},
    {'exp': 'exp_054', 'cv': 0.008504, 'notes': 'Mixall Approach'},
    {'exp': 'exp_055', 'cv': 0.008504, 'notes': 'Minimal Submission'},
    {'exp': 'exp_057', 'cv': 0.009263, 'notes': 'Ens Model All Features'},
    {'exp': 'exp_063', 'cv': 0.011171, 'notes': 'Correct Final Cell'},
    {'exp': 'exp_064', 'cv': 0.009227, 'notes': 'Revert exp_030'},
    {'exp': 'exp_065', 'cv': 0.008811, 'notes': 'Clean Submission'},
]

for p in pending:
    predicted_lb = slope * p['cv'] + intercept
    print(f"{p['exp']}: CV={p['cv']:.6f}, Predicted LB={predicted_lb:.6f} - {p['notes']}")

print()
print('>>> If these follow the same pattern, they will NOT beat the target.')

## CRITICAL INSIGHT

The CV-LB relationship shows that:

1. **ALL approaches fall on the same line** - MLP, LGBM, XGB, GP, CatBoost, ensembles
2. **The intercept (0.053) is HIGHER than the target (0.0347)**
3. **Standard CV optimization CANNOT reach the target**

## WHAT WE NEED TO DO

1. **STOP optimizing CV** - It won't help reach the target
2. **Focus on REDUCING THE INTERCEPT** - This requires fundamentally different approaches
3. **Study what top competitors do** - They've solved this problem

## REMAINING SUBMISSIONS: 4

We have 4 submissions left. We should use them wisely:
1. Try a fundamentally different approach that might reduce the intercept
2. Study public kernels for techniques that work
3. Consider extrapolation detection / conservative predictions