# Evolver Loop 3 Analysis

## Situation Assessment

We have 3 experiments, all at the same score (70.676102):
1. exp_000: Baseline from pre-optimized snapshot
2. exp_001: Full ensemble from 30 CSV files (failed due to precision loss)
3. exp_002: Deletion cascade (found ZERO improvements)

This confirms the baseline is at a VERY strong local optimum.

## Key Insight
The pre-optimized solutions are already the result of extensive optimization by the Kaggle community.
Local search methods (SA, deletion cascade, combining solutions) cannot escape this optimum.

## What We Need
1. **Multi-start random initialization** for small N (1-20)
2. **Grid/lattice initialization** for large N (50-200)
3. **Use bbox3 C++ optimizer** for fast refinement
4. **Explore different solution basins** - not just modify existing solutions

In [1]:
import pandas as pd
import numpy as np
from shapely.geometry import Polygon
from shapely import affinity
import os

# Tree geometry
TX = [0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125]
TY = [0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5]

def get_tree_polygon(x, y, deg):
    base_poly = Polygon(zip(TX, TY))
    rotated = affinity.rotate(base_poly, deg, origin=(0, 0))
    translated = affinity.translate(rotated, x, y)
    return translated

def get_bounding_box_side(trees):
    if not trees:
        return float('inf')
    all_x, all_y = [], []
    for x, y, deg in trees:
        poly = get_tree_polygon(x, y, deg)
        bounds = poly.bounds
        all_x.extend([bounds[0], bounds[2]])
        all_y.extend([bounds[1], bounds[3]])
    return max(max(all_x) - min(all_x), max(all_y) - min(all_y))

print('Functions defined')

Functions defined


In [2]:
# Load baseline and analyze per-N scores
baseline_path = '/home/code/experiments/001_baseline/santa-2025.csv'
df = pd.read_csv(baseline_path, dtype=str)

# Parse configurations
configs = {}
for n in range(1, 201):
    prefix = f'{n:03d}_'
    rows = df[df['id'].str.startswith(prefix)]
    trees = []
    for _, row in rows.iterrows():
        x = float(str(row['x']).replace('s', ''))
        y = float(str(row['y']).replace('s', ''))
        deg = float(str(row['deg']).replace('s', ''))
        trees.append((x, y, deg))
    configs[n] = trees

# Calculate per-N scores
per_n_scores = []
for n in range(1, 201):
    side = get_bounding_box_side(configs[n])
    score = side**2 / n
    per_n_scores.append({'n': n, 'side': side, 'score': score})

df_scores = pd.DataFrame(per_n_scores)
print(f'Total baseline score: {df_scores["score"].sum():.6f}')
print(f'\nTop 10 highest per-N scores (most room for improvement):')
print(df_scores.nlargest(10, 'score')[['n', 'side', 'score']])

Total baseline score: 70.676102

Top 10 highest per-N scores (most room for improvement):
     n      side     score
0    1  0.813173  0.661250
1    2  0.949504  0.450779
2    3  1.142031  0.434745
4    5  1.443692  0.416850
3    4  1.290806  0.416545
6    7  1.673104  0.399897
5    6  1.548438  0.399610
8    9  1.867280  0.387415
7    8  1.755921  0.385407
14  15  2.384962  0.379203


In [3]:
# Analyze N=1 specifically - this is the highest contributor
n1_trees = configs[1]
print(f'N=1 configuration: {n1_trees}')
print(f'N=1 side: {get_bounding_box_side(n1_trees):.6f}')
print(f'N=1 score: {get_bounding_box_side(n1_trees)**2:.6f}')

# For N=1, the optimal rotation is 45 degrees (minimizes bounding box)
# Let's verify this
for angle in [0, 15, 30, 45, 60, 75, 90]:
    test_tree = [(0, 0, angle)]
    side = get_bounding_box_side(test_tree)
    print(f'  Angle {angle:3d}°: side = {side:.6f}, score = {side**2:.6f}')

N=1 configuration: [(-48.196086194214246, 58.770984615214225, 45.0)]
N=1 side: 0.813173
N=1 score: 0.661250
  Angle   0°: side = 1.000000, score = 1.000000
  Angle  15°: side = 0.985337, score = 0.970890
  Angle  30°: side = 0.903525, score = 0.816358
  Angle  45°: side = 0.813173, score = 0.661250
  Angle  60°: side = 0.903525, score = 0.816358
  Angle  75°: side = 0.985337, score = 0.970890
  Angle  90°: side = 1.000000, score = 1.000000


In [4]:
# Check if bbox3 is available
import subprocess

bbox3_path = '/home/nonroot/snapshots/santa-2025/21116303805/code/bbox3'
if os.path.exists(bbox3_path):
    print(f'bbox3 found at: {bbox3_path}')
    result = subprocess.run([bbox3_path, '-h'], capture_output=True, text=True)
    print('bbox3 help output:')
    print(result.stdout[:500] if result.stdout else result.stderr[:500])
else:
    print('bbox3 not found')

bbox3 found at: /home/nonroot/snapshots/santa-2025/21116303805/code/bbox3
bbox3 help output:
Loading submission.csv...
Initial score: 0.000000
Round 1/16
Score after round 1: 0.000000
Round 2/16
Score after round 2: 0.000000
Round 3/16
Score after round 3: 0.000000
Round 4/16
Score after round 4: 0.000000
Round 5/16
Score after round 5: 0.000000
Round 6/16
Score after round 6: 0.000000
Round 7/16
Score after round 7: 0.000000
Round 8/16
Score after round 8: 0.000000
Round 9/16
Score after round 9: 0.000000
Round 10/16
Score after round 10: 0.000000
Round 11/16
Score after round 11: 0.00


In [5]:
# Summary of what we know
print('='*60)
print('SUMMARY OF SITUATION')
print('='*60)
print(f'\nCurrent best score: 70.676102')
print(f'Target score: 68.919154')
print(f'Gap: 1.756948 points (2.55%)')
print(f'\nExperiments tried:')
print('  1. Baseline from pre-optimized snapshot: 70.676102')
print('  2. Full ensemble from 30 CSVs: FAILED (precision loss)')
print('  3. Deletion cascade: 70.676102 (0 improvements found)')
print(f'\nKey insight: The baseline is at a VERY strong local optimum.')
print('Local search methods cannot escape this optimum.')
print(f'\nNext steps:')
print('  1. Multi-start random initialization for N=1-20')
print('  2. Grid/lattice initialization for N>50')
print('  3. Use bbox3 C++ optimizer for fast refinement')
print('  4. Explore different solution basins')

SUMMARY OF SITUATION

Current best score: 70.676102
Target score: 68.919154
Gap: 1.756948 points (2.55%)

Experiments tried:
  1. Baseline from pre-optimized snapshot: 70.676102
  2. Full ensemble from 30 CSVs: FAILED (precision loss)
  3. Deletion cascade: 70.676102 (0 improvements found)

Key insight: The baseline is at a VERY strong local optimum.
Local search methods cannot escape this optimum.

Next steps:
  1. Multi-start random initialization for N=1-20
  2. Grid/lattice initialization for N>50
  3. Use bbox3 C++ optimizer for fast refinement
  4. Explore different solution basins
