# Loop 9 Analysis: Final Strategy Assessment

## Situation
- Best CV: 0.0093 (exp_007, [32,16] MLP)
- Best LB: 0.0932 (exp_007)
- Target: 0.0333
- Gap to target: 2.8x
- Submissions remaining: 1

## Latest Experiment
- Ridge Regression CV: 0.011509 (24.3% WORSE than [32,16])
- Confirms [32,16] is near optimal simplicity
- Linear models are TOO simple

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# All experiments summary
experiments = {
    'exp_000': {'name': 'Baseline MLP [128,128,64]', 'cv': 0.011081, 'lb': 0.0982, 'params': '~77K'},
    'exp_001': {'name': 'LightGBM', 'cv': 0.012297, 'lb': 0.1065, 'params': 'N/A'},
    'exp_002': {'name': 'DRFP with PCA', 'cv': 0.016948, 'lb': None, 'params': '~100K'},
    'exp_003': {'name': 'Combined [256,128,64]', 'cv': 0.010501, 'lb': 0.0972, 'params': '~77K'},
    'exp_004': {'name': 'Deep Residual (FAILED)', 'cv': 0.051912, 'lb': None, 'params': '~200K'},
    'exp_005': {'name': 'Large Ensemble (15 models)', 'cv': 0.010430, 'lb': 0.0969, 'params': '~77K x 15'},
    'exp_006': {'name': 'Simpler [64,32]', 'cv': 0.009749, 'lb': 0.0946, 'params': '~11K'},
    'exp_007': {'name': 'Even Simpler [32,16]', 'cv': 0.009262, 'lb': 0.0932, 'params': '~5K'},
    'exp_008': {'name': 'Ridge Regression', 'cv': 0.011509, 'lb': None, 'params': '~420 (linear)'}
}

df = pd.DataFrame(experiments).T
print(df[['name', 'cv', 'lb', 'params']])

                               name        cv      lb         params
exp_000   Baseline MLP [128,128,64]  0.011081  0.0982           ~77K
exp_001                    LightGBM  0.012297  0.1065            N/A
exp_002               DRFP with PCA  0.016948    None          ~100K
exp_003       Combined [256,128,64]  0.010501  0.0972           ~77K
exp_004      Deep Residual (FAILED)  0.051912    None          ~200K
exp_005  Large Ensemble (15 models)   0.01043  0.0969      ~77K x 15
exp_006             Simpler [64,32]  0.009749  0.0946           ~11K
exp_007        Even Simpler [32,16]  0.009262  0.0932            ~5K
exp_008            Ridge Regression  0.011509    None  ~420 (linear)


In [2]:
# Analyze the simplification trend
simplification_arc = [
    ('exp_003 [256,128,64]', 0.010501, 0.0972),
    ('exp_006 [64,32]', 0.009749, 0.0946),
    ('exp_007 [32,16]', 0.009262, 0.0932),
    ('exp_008 Ridge', 0.011509, None)
]

print('\n=== SIMPLIFICATION ARC ===')
print('Model                    CV        LB        CV-LB Ratio')
print('-' * 60)
for name, cv, lb in simplification_arc:
    if lb:
        ratio = lb / cv
        print(f'{name:24} {cv:.6f}  {lb:.4f}    {ratio:.2f}x')
    else:
        print(f'{name:24} {cv:.6f}  N/A       N/A')

print('\n=== KEY INSIGHT ===')
print('Ridge Regression (linear) is 24.3% WORSE than [32,16] MLP')
print('This confirms that [32,16] is near the OPTIMAL simplicity level')
print('Some non-linearity (ReLU activations) is NECESSARY')


=== SIMPLIFICATION ARC ===
Model                    CV        LB        CV-LB Ratio
------------------------------------------------------------
exp_003 [256,128,64]     0.010501  0.0972    9.26x
exp_006 [64,32]          0.009749  0.0946    9.70x
exp_007 [32,16]          0.009262  0.0932    10.06x
exp_008 Ridge            0.011509  N/A       N/A

=== KEY INSIGHT ===
Ridge Regression (linear) is 24.3% WORSE than [32,16] MLP
This confirms that [32,16] is near the OPTIMAL simplicity level
Some non-linearity (ReLU activations) is NECESSARY


In [3]:
# CV-LB correlation analysis
submissions = [
    ('exp_000', 0.0111, 0.0982),
    ('exp_001', 0.0123, 0.1065),
    ('exp_003', 0.0105, 0.0972),
    ('exp_005', 0.0104, 0.0969),
    ('exp_006', 0.0097, 0.0946),
    ('exp_007', 0.0093, 0.0932)
]

cv_scores = [s[1] for s in submissions]
lb_scores = [s[2] for s in submissions]
ratios = [lb/cv for cv, lb in zip(cv_scores, lb_scores)]

print('\n=== CV-LB CORRELATION ===')
print('Exp       CV       LB       Ratio')
print('-' * 40)
for exp, cv, lb in submissions:
    print(f'{exp}    {cv:.4f}   {lb:.4f}   {lb/cv:.2f}x')

print(f'\nAverage ratio: {np.mean(ratios):.2f}x')
print(f'Std ratio: {np.std(ratios):.2f}x')

# Correlation
corr = np.corrcoef(cv_scores, lb_scores)[0, 1]
print(f'\nCV-LB Correlation: {corr:.4f}')


=== CV-LB CORRELATION ===
Exp       CV       LB       Ratio
----------------------------------------
exp_000    0.0111   0.0982   8.85x
exp_001    0.0123   0.1065   8.66x
exp_003    0.0105   0.0972   9.26x
exp_005    0.0104   0.0969   9.32x
exp_006    0.0097   0.0946   9.75x
exp_007    0.0093   0.0932   10.02x

Average ratio: 9.31x
Std ratio: 0.47x

CV-LB Correlation: 0.9708


In [4]:
# What would it take to beat the target?
target = 0.0333
best_lb = 0.0932
best_cv = 0.0093
avg_ratio = np.mean(ratios)

print('\n=== TARGET ANALYSIS ===')
print(f'Target LB: {target}')
print(f'Best LB: {best_lb}')
print(f'Gap: {best_lb/target:.2f}x ({(best_lb - target)/target * 100:.1f}% worse)')
print(f'\nTo beat target {target} with avg ratio {avg_ratio:.2f}x:')
required_cv = target / avg_ratio
print(f'Would need CV: {required_cv:.6f}')
print(f'Current best CV: {best_cv:.6f}')
print(f'Required improvement: {(best_cv - required_cv)/best_cv * 100:.1f}%')


=== TARGET ANALYSIS ===
Target LB: 0.0333
Best LB: 0.0932
Gap: 2.80x (179.9% worse)

To beat target 0.0333 with avg ratio 9.31x:
Would need CV: 0.003577
Current best CV: 0.009300
Required improvement: 61.5%


In [5]:
# Final decision analysis
print('\n' + '='*70)
print('FINAL DECISION ANALYSIS')
print('='*70)

print('\n1. WHAT WE KNOW:')
print('   - [32,16] MLP is the optimal architecture (CV 0.0093, LB 0.0932)')
print('   - Simpler (Ridge) is worse: CV 0.0115 (+24%)')
print('   - More complex is worse: [256,128,64] CV 0.0105 (+13%)')
print('   - CV-LB correlation is 0.97 (very strong)')
print('   - CV-LB ratio is ~9x consistently')

print('\n2. WHAT WE CANNOT DO:')
print('   - Beat target 0.0333 with current approach (need 64% improvement)')
print('   - GNN/Transformer would require significant code changes')
print('   - Competition template limits model flexibility')

print('\n3. REMAINING OPTIONS:')
print('   a) Submit exp_007 as final (already submitted, LB 0.0932)')
print('   b) Try ensemble of [32,16] + Ridge for diversity')
print('   c) Try [16] single hidden layer (between [32,16] and Ridge)')
print('   d) Try hyperparameter tuning on [32,16]')

print('\n4. RECOMMENDATION:')
print('   With 1 submission remaining, the safest option is:')
print('   - exp_007 is already our best LB (0.0932)')
print('   - Any new experiment risks being worse')
print('   - The simplification arc is COMPLETE')
print('   - [32,16] is the sweet spot between underfitting and overfitting')


FINAL DECISION ANALYSIS

1. WHAT WE KNOW:
   - [32,16] MLP is the optimal architecture (CV 0.0093, LB 0.0932)
   - Simpler (Ridge) is worse: CV 0.0115 (+24%)
   - More complex is worse: [256,128,64] CV 0.0105 (+13%)
   - CV-LB correlation is 0.97 (very strong)
   - CV-LB ratio is ~9x consistently

2. WHAT WE CANNOT DO:
   - Beat target 0.0333 with current approach (need 64% improvement)
   - GNN/Transformer would require significant code changes
   - Competition template limits model flexibility

3. REMAINING OPTIONS:
   a) Submit exp_007 as final (already submitted, LB 0.0932)
   b) Try ensemble of [32,16] + Ridge for diversity
   c) Try [16] single hidden layer (between [32,16] and Ridge)
   d) Try hyperparameter tuning on [32,16]

4. RECOMMENDATION:
   With 1 submission remaining, the safest option is:
   - exp_007 is already our best LB (0.0932)
   - Any new experiment risks being worse
   - The simplification arc is COMPLETE
   - [32,16] is the sweet spot between underfitting and

In [6]:
# Potential ensemble analysis
print('\n=== POTENTIAL ENSEMBLE ANALYSIS ===')
print('\nIf we ensemble [32,16] MLP with Ridge:')
print('- [32,16] MLP CV: 0.009262')
print('- Ridge CV: 0.011509')
print('- Simple average would be: ~0.0104 (WORSE than [32,16] alone)')
print('\nEnsembling with a worse model typically hurts performance.')
print('This is NOT recommended.')

print('\n=== ALTERNATIVE: Try [16] single hidden layer ===')
print('- Between [32,16] and Ridge')
print('- May find a better sweet spot')
print('- Risk: Could be worse than [32,16]')
print('- Potential: Small chance of improvement')

print('\n=== FINAL VERDICT ===')
print('The best strategy is to KEEP exp_007 as our final submission.')
print('- It achieved the best LB score (0.0932)')
print('- The simplification arc is complete')
print('- Any further experiments risk being worse')
print('- The target (0.0333) is likely unachievable with tabular methods')


=== POTENTIAL ENSEMBLE ANALYSIS ===

If we ensemble [32,16] MLP with Ridge:
- [32,16] MLP CV: 0.009262
- Ridge CV: 0.011509
- Simple average would be: ~0.0104 (WORSE than [32,16] alone)

Ensembling with a worse model typically hurts performance.
This is NOT recommended.

=== ALTERNATIVE: Try [16] single hidden layer ===
- Between [32,16] and Ridge
- May find a better sweet spot
- Risk: Could be worse than [32,16]
- Potential: Small chance of improvement

=== FINAL VERDICT ===
The best strategy is to KEEP exp_007 as our final submission.
- It achieved the best LB score (0.0932)
- The simplification arc is complete
- Any further experiments risk being worse
- The target (0.0333) is likely unachievable with tabular methods
