# Loop 19 Strategic Analysis

## Key Findings from Evaluator:
1. exp_007 has CORRUPTED DATA (NaN values for N=24) - the 0.348 'improvement' was never real
2. GA experiment found 0 improvements - baseline is at extremely strong local optimum
3. We need MORE external data sources (top kernels use 17-19, we have ~8)
4. bbox3 binary has GLIBC issues - need to compile from source

In [1]:
import pandas as pd
import numpy as np
import os
import glob

# Verify exp_007 corruption
print('=== Verifying exp_007 Corruption ===')
df = pd.read_csv('/home/code/experiments/007_ensemble_fractional/submission.csv')
df['N'] = df['id'].str.split('_').str[0].astype(int)

for n in range(1, 201):
    group = df[df['N'] == n]
    x_vals = group['x'].astype(str)
    if x_vals.str.contains('nan', case=False).any():
        print(f'N={n}: CORRUPTED with NaN values!')
        print(group[['id', 'x', 'y', 'deg']].head(3))
        break

=== Verifying exp_007 Corruption ===
N=24: CORRUPTED with NaN values!
        id     x                        y                        deg
276  024_0  snan  s2.80374906259153888755  s113.62937398705318514658
277  024_1  snan  s2.80374906259153888755  s113.62937398705318514658
278  024_2  snan  s2.80374906259153888755  s113.62937398705318514658


In [2]:
# Count external data sources we have
print('\n=== External Data Sources Available ===')

snapshot_dirs = glob.glob('/home/nonroot/snapshots/santa-2025/*/code/**/*.csv', recursive=True)
print(f'Total CSV files in snapshots: {len(snapshot_dirs)}')

# Group by source
sources = {}
for f in snapshot_dirs:
    parts = f.split('/')
    if 'bucket' in f.lower():
        sources['bucket-of-chump'] = sources.get('bucket-of-chump', 0) + 1
    elif 'saspav' in f.lower():
        sources['saspav'] = sources.get('saspav', 0) + 1
    elif 'telegram' in f.lower():
        sources['telegram'] = sources.get('telegram', 0) + 1
    elif 'chistyakov' in f.lower():
        sources['chistyakov'] = sources.get('chistyakov', 0) + 1

print('\nIdentified sources:')
for src, count in sorted(sources.items()):
    print(f'  {src}: {count} files')


=== External Data Sources Available ===
Total CSV files in snapshots: 3618

Identified sources:
  bucket-of-chump: 40 files
  chistyakov: 45 files
  saspav: 104 files
  telegram: 120 files


In [3]:
# Compare to jonathanchan kernel sources (17-19 sources)
print('\n=== Sources Used by Top Kernels (jonathanchan) ===')
top_sources = [
    'bucket-of-chump',
    'SmartManoj/Santa-Scoreboard',
    'santa-2025-try3',
    'santa25-public',
    'telegram-public-shared-solution-for-santa-2025',
    'santa-2025-simple-optimization-new-slow-version',
    'santa25-improved-sa-with-translations',
    'santa2025-ver2',
    'santa-submission',
    'santa25-simulated-annealing-with-translations',
    'santa-2025-fix-direction',
    '72-71-santa-2025-jit-parallel-sa-c',
    'santa-claude',
    'blending-multiple-oplimisation',
    'santa2025-just-keep-on-trying',
    'decent-starting-solution',
    'why-not',
]

print(f'Top kernels use {len(top_sources)} sources:')
for src in top_sources:
    have = '✅' if any(src.lower() in s.lower() for s in sources.keys()) else '❌'
    print(f'  {have} {src}')


=== Sources Used by Top Kernels (jonathanchan) ===
Top kernels use 17 sources:
  ✅ bucket-of-chump
  ❌ SmartManoj/Santa-Scoreboard
  ❌ santa-2025-try3
  ❌ santa25-public
  ❌ telegram-public-shared-solution-for-santa-2025
  ❌ santa-2025-simple-optimization-new-slow-version
  ❌ santa25-improved-sa-with-translations
  ❌ santa2025-ver2
  ❌ santa-submission
  ❌ santa25-simulated-annealing-with-translations
  ❌ santa-2025-fix-direction
  ❌ 72-71-santa-2025-jit-parallel-sa-c
  ❌ santa-claude
  ❌ blending-multiple-oplimisation
  ❌ santa2025-just-keep-on-trying
  ❌ decent-starting-solution
  ❌ why-not


In [4]:
# Current best scores
print('\n=== Current Score Status ===')
print('Best LB: 70.353516 (exp_016)')
print('Target: 68.877877')
print('Gap: 1.476 points (2.1%)')
print()
print('At current improvement rate (~0.01/exp), need 148 more experiments')
print('This is NOT sustainable - need breakthrough approach')
print()
print('=== What Has Been Tried (All Failed) ===')
failed_approaches = [
    'Simulated Annealing (SA)',
    'Genetic Algorithm (GA)',
    'Exhaustive search for N=2',
    'No-Fit Polygon (NFP) placement',
    'Backward propagation (N to N-1)',
    'Multi-start random initialization',
    'Fractional translation',
]
for approach in failed_approaches:
    print(f'  ❌ {approach}')

print()
print('=== What Worked ===')
worked = [
    'Ensemble from multiple external sources (+0.25 points)',
    'MIN_IMPROVEMENT=0.001 threshold (prevents overlap failures)',
]
for approach in worked:
    print(f'  ✅ {approach}')


=== Current Score Status ===
Best LB: 70.353516 (exp_016)
Target: 68.877877
Gap: 1.476 points (2.1%)

At current improvement rate (~0.01/exp), need 148 more experiments
This is NOT sustainable - need breakthrough approach

=== What Has Been Tried (All Failed) ===
  ❌ Simulated Annealing (SA)
  ❌ Genetic Algorithm (GA)
  ❌ Exhaustive search for N=2
  ❌ No-Fit Polygon (NFP) placement
  ❌ Backward propagation (N to N-1)
  ❌ Multi-start random initialization
  ❌ Fractional translation

=== What Worked ===
  ✅ Ensemble from multiple external sources (+0.25 points)
  ✅ MIN_IMPROVEMENT=0.001 threshold (prevents overlap failures)


In [5]:
# Key insight: The path forward
print('\n=== PATH FORWARD ===')
print()
print('1. MORE EXTERNAL DATA SOURCES (PRIMARY LEVER)')
print('   - Download ALL missing datasets from jonathanchan list')
print('   - Each new source could have better solutions for some N values')
print('   - Expected gain: 0.1-0.5 points')
print()
print('2. COMPILE bbox3 FROM SOURCE (SECONDARY LEVER)')
print('   - bbox3.cpp exists in experiments folder')
print('   - Compile with: g++ -O3 -march=native -std=c++17 -o bbox3_local bbox3.cpp')
print('   - Run for extended periods (hours, not minutes)')
print('   - Expected gain: 0.05-0.2 points')
print()
print('3. TRY C++ OPTIMIZER FROM JONATHANCHAN (TERTIARY LEVER)')
print('   - sa_v1_parallel.cpp with population-based search')
print('   - Uses basin hopping + fractional translation')
print('   - More sophisticated than our SA')
print()
print('CRITICAL: Stop trying new Python optimization algorithms!')
print('The baseline is at an EXTREMELY strong local optimum.')


=== PATH FORWARD ===

1. MORE EXTERNAL DATA SOURCES (PRIMARY LEVER)
   - Download ALL missing datasets from jonathanchan list
   - Each new source could have better solutions for some N values
   - Expected gain: 0.1-0.5 points

2. COMPILE bbox3 FROM SOURCE (SECONDARY LEVER)
   - bbox3.cpp exists in experiments folder
   - Compile with: g++ -O3 -march=native -std=c++17 -o bbox3_local bbox3.cpp
   - Run for extended periods (hours, not minutes)
   - Expected gain: 0.05-0.2 points

3. TRY C++ OPTIMIZER FROM JONATHANCHAN (TERTIARY LEVER)
   - sa_v1_parallel.cpp with population-based search
   - Uses basin hopping + fractional translation
   - More sophisticated than our SA

CRITICAL: Stop trying new Python optimization algorithms!
The baseline is at an EXTREMELY strong local optimum.
