# Loop 54 Analysis: Submission Error Investigation

**Issue:** Submission failed with 'Evaluation metric raised an unexpected error'

**Goal:** Understand what's causing the error and fix it.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load submission
df = pd.read_csv('/home/submission/submission.csv')

print('Submission format:')
print(f'Columns: {list(df.columns)}')
print(f'Shape: {df.shape}')
print()
print('First 5 rows:')
print(df.head())
print()
print('Target statistics:')
for col in ['target_1', 'target_2', 'target_3']:
    print(f'  {col}: min={df[col].min():.6f}, max={df[col].max():.6f}')
    print(f'    NaN: {df[col].isna().sum()}, Inf: {np.isinf(df[col]).sum()}')

Submission format:
Columns: ['id', 'index', 'task', 'fold', 'row', 'target_1', 'target_2', 'target_3']
Shape: (1883, 8)

First 5 rows:
   id  index  task  fold  row  target_1  target_2  target_3
0   0      0     0     0    0  0.016488  0.026118  0.897280
1   1      1     0     0    1  0.023463  0.033189  0.879097
2   2      2     0     0    2  0.039003  0.051166  0.847558
3   3      3     0     0    3  0.065632  0.074234  0.780666
4   4      4     0     0    4  0.084883  0.100093  0.714652

Target statistics:
  target_1: min=0.000000, max=0.382686
    NaN: 0, Inf: 0
  target_2: min=0.000000, max=0.356263
    NaN: 0, Inf: 0
  target_3: min=0.000000, max=1.000000
    NaN: 0, Inf: 0


In [2]:
# CV-LB relationship analysis
submissions = [
    ('exp_000', 0.0111, 0.0982),
    ('exp_001', 0.0123, 0.1065),
    ('exp_003', 0.0105, 0.0972),
    ('exp_005', 0.0104, 0.0969),
    ('exp_006', 0.0097, 0.0946),
    ('exp_007', 0.0093, 0.0932),
    ('exp_009', 0.0092, 0.0936),
    ('exp_012', 0.0090, 0.0913),
    ('exp_024', 0.0087, 0.0893),
    ('exp_026', 0.0085, 0.0887),
    ('exp_030', 0.0083, 0.0877),
    ('exp_035', 0.0098, 0.0970),
]

cv_scores = [s[1] for s in submissions]
lb_scores = [s[2] for s in submissions]

# Fit linear regression
from sklearn.linear_model import LinearRegression
X = np.array(cv_scores).reshape(-1, 1)
y = np.array(lb_scores)
reg = LinearRegression().fit(X, y)

print('CV-LB Relationship Analysis:')
print(f'  Linear fit: LB = {reg.coef_[0]:.2f} * CV + {reg.intercept_:.4f}')
print(f'  R-squared = {reg.score(X, y):.4f}')
print(f'  Intercept = {reg.intercept_:.4f}')
print(f'  Target = 0.0347')
print()
print('CRITICAL INSIGHT:')
print(f'  Even with CV = 0, predicted LB = {reg.intercept_:.4f}')
print(f'  Intercept ({reg.intercept_:.4f}) > Target (0.0347)')
print(f'  Required CV to hit target: ({0.0347} - {reg.intercept_:.4f}) / {reg.coef_[0]:.2f} = {(0.0347 - reg.intercept_) / reg.coef_[0]:.6f}')

CV-LB Relationship Analysis:
  Linear fit: LB = 4.31 * CV + 0.0525
  R-squared = 0.9505
  Intercept = 0.0525
  Target = 0.0347

CRITICAL INSIGHT:
  Even with CV = 0, predicted LB = 0.0525
  Intercept (0.0525) > Target (0.0347)
  Required CV to hit target: (0.0347 - 0.0525) / 4.31 = -0.004130


In [3]:
# Summary
print('\n' + '='*60)
print('SUMMARY')
print('='*60)

print('\n1. SUBMISSION FORMAT:')
print('   The submission format appears correct.')
print('   Columns: id, index, task, fold, row, target_1, target_2, target_3')
print('   Rows: 1883 (656 single + 1227 full)')

print('\n2. CV-LB RELATIONSHIP:')
print(f'   Linear fit: LB = {reg.coef_[0]:.2f} * CV + {reg.intercept_:.4f}')
print(f'   R-squared = {reg.score(X, y):.4f}')
print(f'   Intercept ({reg.intercept_:.4f}) > Target (0.0347)')

print('\n3. NEXT STEPS:')
print('   - The submission error might be due to format issues')
print('   - Need to verify the exact expected format')
print('   - Consider approaches that could CHANGE the CV-LB relationship')


SUMMARY

1. SUBMISSION FORMAT:
   The submission format appears correct.
   Columns: id, index, task, fold, row, target_1, target_2, target_3
   Rows: 1883 (656 single + 1227 full)

2. CV-LB RELATIONSHIP:
   Linear fit: LB = 4.31 * CV + 0.0525
   R-squared = 0.9505
   Intercept (0.0525) > Target (0.0347)

3. NEXT STEPS:
   - The submission error might be due to format issues
   - Need to verify the exact expected format
   - Consider approaches that could CHANGE the CV-LB relationship
