# Loop 4 Analysis: Understanding the Gap

## Key Findings:
1. Best VALID score: 70.615745 (ensemble_strict.csv) - 0.060 better than baseline
2. Target: 68.891380 - Gap: 1.72 points (2.50%)
3. The eazy optimizer corrupts coordinates causing Kaggle rejection
4. Many 'better' scores in snapshots have overlaps and are INVALID

In [4]:
import pandas as pd
import numpy as np
import json
from pathlib import Path

# Load session state
with open('/home/code/session_state.json') as f:
    state = json.load(f)

print('=== Experiment History ===')
for exp in state['experiments']:
    print(f"{exp['name']}: CV={exp['score']:.6f}")

print('\n=== Submission History ===')
for sub in state['submissions']:
    print(f"{sub['model_name']}: CV={sub['cv_score']}, LB={sub['lb_score']}, Error={sub.get('error', 'None')}")

print('\n=== Key Findings ===')
for f in state['data_findings'][-5:]:
    print(f"- {f['finding'][:100]}...")

=== Experiment History ===
001_baseline: CV=70.676102
002_tessellation_attempts: CV=70.676102
003_eazy_optimizer: CV=70.676059
004_eazy_long_run: CV=70.675672

=== Submission History ===
001_baseline: CV=70.676102, LB=70.676102398091, Error=None
003_eazy_optimizer: CV=70.676059, LB=70.676059085435, Error=None
004_eazy_long_run: CV=70.675672, LB=, Error=Overlapping trees in group 013

=== Key Findings ===
- All external CSV sources (telegram, santa25-public, bucket-of-chump, chistyakov) have scores WORSE t...
- Key techniques to escape local optima in polygon packing: 1) No-fit polygon (NFP) for O(1) overlap c...
- CRITICAL: The eazy C++ optimizer introduces tiny coordinate changes that cause overlaps Kaggle detec...
- Strict ensemble (70.615745) is 0.059 points better than baseline (70.676102) with NO overlaps. This ...
- Small N values have worst packing efficiency: N=1 (37.1%), N=2 (54.5%), N=3 (56.5%), N=4-5 (~59%), N...


In [5]:
# Analyze the gap between current best and target
current_best = 70.615745  # ensemble_strict
target = 68.891380
baseline = 70.676102

print(f'Current best (ensemble_strict): {current_best:.6f}')
print(f'Baseline: {baseline:.6f}')
print(f'Target: {target:.6f}')
print(f'Gap to target: {current_best - target:.6f} ({(current_best - target) / target * 100:.2f}%)')
print(f'Improvement from baseline: {baseline - current_best:.6f}')

# Theoretical minimum
tree_area = 0.245625
theoretical_min = sum(tree_area for n in range(1, 201))  # If perfect packing
print(f'\nTheoretical minimum (perfect packing): {theoretical_min:.2f}')
print(f'Gap from theoretical: {current_best - theoretical_min:.2f}')

Current best (ensemble_strict): 70.615745
Baseline: 70.676102
Target: 68.891380
Gap to target: 1.724365 (2.50%)
Improvement from baseline: 0.060357

Theoretical minimum (perfect packing): 49.12
Gap from theoretical: 21.49


In [6]:
# Load and analyze the ensemble_strict.csv to understand where improvements came from
from shapely.geometry import Polygon
from shapely import affinity
import math

TX = [0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125]
TY = [0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5]

def score_group(xs, ys, degs):
    n = len(xs)
    V = len(TX)
    mnx = mny = 1e300
    mxx = mxy = -1e300
    for i in range(n):
        r = degs[i] * math.pi / 180.0
        c, s = math.cos(r), math.sin(r)
        for j in range(V):
            X = c * TX[j] - s * TY[j] + xs[i]
            Y = s * TX[j] + c * TY[j] + ys[i]
            mnx, mxx = min(mnx, X), max(mxx, X)
            mny, mxy = min(mny, Y), max(mxy, Y)
    side = max(mxx - mnx, mxy - mny)
    return side * side / n

def strip(a):
    return np.array([float(str(v).replace('s', '')) for v in a], np.float64)

# Load baseline and ensemble
baseline_df = pd.read_csv('/home/code/experiments/003_long_sa/submission_best.csv')
ensemble_df = pd.read_csv('/home/code/experiments/005_ensemble/submission_ensemble_strict.csv')

baseline_df['N'] = baseline_df['id'].str.split('_').str[0].astype(int)
ensemble_df['N'] = ensemble_df['id'].str.split('_').str[0].astype(int)

improvements = []
for n in range(1, 201):
    b_g = baseline_df[baseline_df['N'] == n]
    e_g = ensemble_df[ensemble_df['N'] == n]
    
    b_score = score_group(strip(b_g['x']), strip(b_g['y']), strip(b_g['deg']))
    e_score = score_group(strip(e_g['x']), strip(e_g['y']), strip(e_g['deg']))
    
    if e_score < b_score - 1e-8:
        improvements.append({'N': n, 'baseline': b_score, 'ensemble': e_score, 'improvement': b_score - e_score})

print(f'N values improved: {len(improvements)}')
if improvements:
    imp_df = pd.DataFrame(improvements)
    print(f'Total improvement: {imp_df["improvement"].sum():.6f}')
    print('\nTop 10 improvements:')
    print(imp_df.nlargest(10, 'improvement').to_string())

N values improved: 157
Total improvement: 0.060358

Top 10 improvements:
      N  baseline  ensemble  improvement
22   54  0.361321  0.356260     0.005060
25   57  0.358045  0.353509     0.004536
50   87  0.353823  0.349960     0.003864
51   88  0.350672  0.347501     0.003171
86  128  0.343762  0.340751     0.003011
16   43  0.370040  0.367065     0.002975
57   94  0.352274  0.349956     0.002318
3    15  0.379203  0.376949     0.002254
32   65  0.363795  0.361611     0.002184
62  100  0.345531  0.343395     0.002136


In [7]:
# Check what approaches we haven't tried yet
print('=== Approaches Tried ===')
print('1. Baseline from pre-optimized CSV - 70.676102')
print('2. Tessellation for large N - WORSE than baseline')
print('3. bbox3 C++ optimizer - NO improvement')
print('4. eazy C++ optimizer - Small improvement but CORRUPTS coordinates')
print('5. Ensemble from multiple sources - 70.615745 (BEST so far)')

print('\n=== Approaches NOT Tried ===')
print('1. Backward propagation (deletion cascade) - chistyakov kernel')
print('2. Long-running SA (hours, not minutes) - jonathanchan runs 80+ rounds')
print('3. Genetic algorithm with crossover')
print('4. Random restarts with different initial configurations')
print('5. No-fit polygon (NFP) precomputation')
print('6. Branch-and-bound for small N (N=1-20)')
print('7. Constraint programming')

=== Approaches Tried ===
1. Baseline from pre-optimized CSV - 70.676102
2. Tessellation for large N - WORSE than baseline
3. bbox3 C++ optimizer - NO improvement
4. eazy C++ optimizer - Small improvement but CORRUPTS coordinates
5. Ensemble from multiple sources - 70.615745 (BEST so far)

=== Approaches NOT Tried ===
1. Backward propagation (deletion cascade) - chistyakov kernel
2. Long-running SA (hours, not minutes) - jonathanchan runs 80+ rounds
3. Genetic algorithm with crossover
4. Random restarts with different initial configurations
5. No-fit polygon (NFP) precomputation
6. Branch-and-bound for small N (N=1-20)
7. Constraint programming


In [8]:
# The key insight: we need to find DIFFERENT configurations, not optimize existing ones
# The ensemble_strict found 56 N values with better configs from other sources
# This proves that better configs EXIST - we just need to find more of them

print('=== Strategy for Closing the Gap ===')
print(f'Current gap: {current_best - target:.4f} points')
print(f'Improvement from ensemble: {baseline - current_best:.4f} points')
print(f'Need {(current_best - target) / (baseline - current_best):.1f}x more improvement')

print('\n=== Recommended Next Steps ===')
print('1. SUBMIT ensemble_strict.csv (70.615745) - never submitted!')
print('2. Run backward propagation (deletion cascade) for hours')
print('3. Try genetic algorithm with population of diverse solutions')
print('4. Focus on small N (1-20) where efficiency is worst')

=== Strategy for Closing the Gap ===
Current gap: 1.7244 points
Improvement from ensemble: 0.0604 points
Need 28.6x more improvement

=== Recommended Next Steps ===
1. SUBMIT ensemble_strict.csv (70.615745) - never submitted!
2. Run backward propagation (deletion cascade) for hours
3. Try genetic algorithm with population of diverse solutions
4. Focus on small N (1-20) where efficiency is worst
