# Loop 65 Analysis: Major Breakthrough with Ens Model Approach

Exp_069 achieved CV 0.005146 - a 35% improvement over previous best (0.007938).

Key questions:
1. What is the CV-LB relationship?
2. What LB score can we expect from this CV?
3. Is the target (0.0347) reachable?

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# All submissions data
submissions = [
    {'exp': 'exp_000', 'cv': 0.0111, 'lb': 0.0982},
    {'exp': 'exp_001', 'cv': 0.0123, 'lb': 0.1065},
    {'exp': 'exp_003', 'cv': 0.0105, 'lb': 0.0972},
    {'exp': 'exp_005', 'cv': 0.0104, 'lb': 0.0969},
    {'exp': 'exp_006', 'cv': 0.0097, 'lb': 0.0946},
    {'exp': 'exp_007', 'cv': 0.0093, 'lb': 0.0932},
    {'exp': 'exp_009', 'cv': 0.0092, 'lb': 0.0936},
    {'exp': 'exp_012', 'cv': 0.0090, 'lb': 0.0913},
    {'exp': 'exp_024', 'cv': 0.0087, 'lb': 0.0893},
    {'exp': 'exp_026', 'cv': 0.0085, 'lb': 0.0887},
    {'exp': 'exp_030', 'cv': 0.0083, 'lb': 0.0877},
    {'exp': 'exp_041', 'cv': 0.0090, 'lb': 0.0932},
    {'exp': 'exp_042', 'cv': 0.0145, 'lb': 0.1147},
]

df = pd.DataFrame(submissions)
print('Submission history:')
print(df.to_string(index=False))

Submission history:
    exp     cv     lb
exp_000 0.0111 0.0982
exp_001 0.0123 0.1065
exp_003 0.0105 0.0972
exp_005 0.0104 0.0969
exp_006 0.0097 0.0946
exp_007 0.0093 0.0932
exp_009 0.0092 0.0936
exp_012 0.0090 0.0913
exp_024 0.0087 0.0893
exp_026 0.0085 0.0887
exp_030 0.0083 0.0877
exp_041 0.0090 0.0932
exp_042 0.0145 0.1147


In [2]:
# Fit linear regression: LB = slope * CV + intercept
X = df['cv'].values.reshape(-1, 1)
y = df['lb'].values

reg = LinearRegression()
reg.fit(X, y)

slope = reg.coef_[0]
intercept = reg.intercept_
r2 = reg.score(X, y)

print(f'CV-LB Relationship:')
print(f'  LB = {slope:.2f} * CV + {intercept:.4f}')
print(f'  R² = {r2:.4f}')
print(f'')
print(f'Interpretation:')
print(f'  - Slope: {slope:.2f}x amplification of CV to LB')
print(f'  - Intercept: {intercept:.4f} (structural gap even at CV=0)')
print(f'  - Target: 0.0347')

CV-LB Relationship:
  LB = 4.23 * CV + 0.0533
  R² = 0.9807

Interpretation:
  - Slope: 4.23x amplification of CV to LB
  - Intercept: 0.0533 (structural gap even at CV=0)
  - Target: 0.0347


In [3]:
# Predict LB for exp_069 (CV = 0.005146)
cv_069 = 0.005146
predicted_lb_069 = slope * cv_069 + intercept

print(f'Prediction for exp_069:')
print(f'  CV = {cv_069:.6f}')
print(f'  Predicted LB = {slope:.2f} * {cv_069:.6f} + {intercept:.4f} = {predicted_lb_069:.4f}')
print(f'')
print(f'Target: 0.0347')
print(f'Gap to target: {predicted_lb_069 - 0.0347:.4f}')
print(f'')
print(f'Best LB so far: 0.0877 (exp_030)')
print(f'Expected improvement: {0.0877 - predicted_lb_069:.4f} ({(0.0877 - predicted_lb_069)/0.0877*100:.1f}%)')

Prediction for exp_069:
  CV = 0.005146
  Predicted LB = 4.23 * 0.005146 + 0.0533 = 0.0751

Target: 0.0347
Gap to target: 0.0404

Best LB so far: 0.0877 (exp_030)
Expected improvement: 0.0126 (14.4%)


In [4]:
# What CV would be needed to reach target?
# target = slope * cv_needed + intercept
# cv_needed = (target - intercept) / slope

target = 0.0347
cv_needed = (target - intercept) / slope

print(f'To reach target LB = {target}:')
print(f'  Required CV = ({target} - {intercept:.4f}) / {slope:.2f} = {cv_needed:.6f}')
print(f'')
if cv_needed < 0:
    print(f'  CRITICAL: Required CV is NEGATIVE!')
    print(f'  This means the target is UNREACHABLE with the current CV-LB relationship.')
    print(f'  The intercept ({intercept:.4f}) is higher than the target ({target}).')
else:
    print(f'  Current best CV: 0.005146')
    print(f'  CV improvement needed: {(0.005146 - cv_needed)/0.005146*100:.1f}%')

To reach target LB = 0.0347:
  Required CV = (0.0347 - 0.0533) / 4.23 = -0.004396

  CRITICAL: Required CV is NEGATIVE!
  This means the target is UNREACHABLE with the current CV-LB relationship.
  The intercept (0.0533) is higher than the target (0.0347).


In [5]:
# Key insight: The Ens Model approach may have a DIFFERENT CV-LB relationship
print('='*60)
print('CRITICAL ANALYSIS')
print('='*60)
print(f'')
print(f'Current CV-LB relationship (based on 13 submissions):')
print(f'  LB = {slope:.2f} * CV + {intercept:.4f}')
print(f'')
print(f'If exp_069 follows the SAME relationship:')
print(f'  Predicted LB = {predicted_lb_069:.4f}')
print(f'  This would be the BEST LB so far (vs 0.0877)')
print(f'  But still far from target (0.0347)')
print(f'')
print(f'HOWEVER: The Ens Model approach is FUNDAMENTALLY DIFFERENT:')
print(f'  - Uses CatBoost + XGBoost (not GP + MLP + LGBM)')
print(f'  - Uses feature priority-based correlation filtering')
print(f'  - Uses different ensemble weights for single vs full data')
print(f'  - Uses multi-target normalization')
print(f'')
print(f'The CV-LB relationship MAY BE DIFFERENT for this approach!')
print(f'  - If the intercept is lower, the target becomes reachable')
print(f'  - This is the key hypothesis to test with a submission')
print(f'')
print(f'RECOMMENDATION: SUBMIT exp_069 to verify the CV-LB relationship')

CRITICAL ANALYSIS

Current CV-LB relationship (based on 13 submissions):
  LB = 4.23 * CV + 0.0533

If exp_069 follows the SAME relationship:
  Predicted LB = 0.0751
  This would be the BEST LB so far (vs 0.0877)
  But still far from target (0.0347)

HOWEVER: The Ens Model approach is FUNDAMENTALLY DIFFERENT:
  - Uses CatBoost + XGBoost (not GP + MLP + LGBM)
  - Uses feature priority-based correlation filtering
  - Uses different ensemble weights for single vs full data
  - Uses multi-target normalization

The CV-LB relationship MAY BE DIFFERENT for this approach!
  - If the intercept is lower, the target becomes reachable
  - This is the key hypothesis to test with a submission

RECOMMENDATION: SUBMIT exp_069 to verify the CV-LB relationship
