# Evolver Loop 7 Analysis

## Key Questions:
1. Why is there a 17-point gap between Zaburo (87.99) and pre-optimized baselines (70.6)?
2. What techniques can close this gap?
3. What's the per-N breakdown of the gap?

In [1]:
import pandas as pd
import numpy as np
import json
from decimal import Decimal, getcontext
from shapely import affinity
from shapely.geometry import Polygon
from shapely.ops import unary_union

getcontext().prec = 25
scale_factor = Decimal('1e15')

print('Analysis notebook initialized')

Analysis notebook initialized


In [2]:
# Load Zaburo solution (our best valid)
zaburo_path = '/home/code/experiments/005_zaburo_rowbased/submission.csv'
zaburo_df = pd.read_csv(zaburo_path)

# Load pre-optimized baseline (has overlaps but shows what's possible)
baseline_path = '/home/nonroot/snapshots/santa-2025/21329067673/submission/submission.csv'
baseline_df = pd.read_csv(baseline_path)

print(f'Zaburo: {len(zaburo_df)} rows')
print(f'Baseline: {len(baseline_df)} rows')

Zaburo: 20100 rows
Baseline: 20100 rows


In [3]:
def parse_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return float(val[1:])
    return float(val)

def compute_score_for_n(df, n):
    """Compute score for a specific N value."""
    group = df[df['id'].str.startswith(f'{n:03d}_')]
    if len(group) == 0:
        return None
    
    # Get all coordinates
    xs = [parse_value(x) for x in group['x']]
    ys = [parse_value(y) for y in group['y']]
    degs = [parse_value(d) for d in group['deg']]
    
    # Tree template
    TREE_TEMPLATE = [
        (0.0, 0.8), (0.125, 0.5), (0.0625, 0.5), (0.2, 0.25), (0.1, 0.25),
        (0.35, 0.0), (0.075, 0.0), (0.075, -0.2), (-0.075, -0.2), (-0.075, 0.0),
        (-0.35, 0.0), (-0.1, 0.25), (-0.2, 0.25), (-0.0625, 0.5), (-0.125, 0.5),
    ]
    
    all_points = []
    for x, y, deg in zip(xs, ys, degs):
        base = Polygon(TREE_TEMPLATE)
        rotated = affinity.rotate(base, deg, origin=(0, 0))
        translated = affinity.translate(rotated, xoff=x, yoff=y)
        coords = list(translated.exterior.coords)
        all_points.extend(coords)
    
    all_points = np.array(all_points)
    minx, miny = all_points.min(axis=0)
    maxx, maxy = all_points.max(axis=0)
    side = max(maxx - minx, maxy - miny)
    return (side ** 2) / n

print('Score function defined')

Score function defined


In [4]:
# Compare per-N scores
print('Computing per-N scores...')
print('='*80)

zaburo_scores = {}
baseline_scores = {}
gaps = []

for n in range(1, 201):
    z_score = compute_score_for_n(zaburo_df, n)
    b_score = compute_score_for_n(baseline_df, n)
    
    zaburo_scores[n] = z_score
    baseline_scores[n] = b_score
    
    if z_score and b_score:
        gap = z_score - b_score
        gaps.append((n, gap, z_score, b_score))

print(f'Zaburo total: {sum(zaburo_scores.values()):.6f}')
print(f'Baseline total: {sum(baseline_scores.values()):.6f}')
print(f'Gap: {sum(zaburo_scores.values()) - sum(baseline_scores.values()):.6f}')

Computing per-N scores...


Zaburo total: 87.991248
Baseline total: 70.615745
Gap: 17.375503


In [5]:
# Sort by gap (largest first)
gaps_sorted = sorted(gaps, key=lambda x: x[1], reverse=True)

print('\nTop 20 N values with largest gaps (Zaburo - Baseline):')
print('='*80)
print(f'{"N":>4} | {"Gap":>10} | {"Zaburo":>12} | {"Baseline":>12}')
print('-'*80)
for n, gap, z, b in gaps_sorted[:20]:
    print(f'{n:4d} | {gap:10.6f} | {z:12.6f} | {b:12.6f}')

print(f'\nSum of top 20 gaps: {sum(g[1] for g in gaps_sorted[:20]):.6f}')


Top 20 N values with largest gaps (Zaburo - Baseline):
   N |        Gap |       Zaburo |     Baseline
--------------------------------------------------------------------------------
   5 |   0.383150 |     0.800000 |     0.416850
   4 |   0.349080 |     0.765625 |     0.416545
   2 |   0.269221 |     0.720000 |     0.450779
   6 |   0.267056 |     0.666667 |     0.399610
  13 |   0.230783 |     0.603077 |     0.372294
   7 |   0.230103 |     0.630000 |     0.399897
  15 |   0.223051 |     0.600000 |     0.376949
   3 |   0.218588 |     0.653333 |     0.434745
  14 |   0.190457 |     0.560000 |     0.369543
  16 |   0.188372 |     0.562500 |     0.374128
  11 |   0.170758 |     0.545682 |     0.374924
   8 |   0.165843 |     0.551250 |     0.385407
  28 |   0.163270 |     0.529375 |     0.366105
  17 |   0.159371 |     0.529412 |     0.370040
  19 |   0.153622 |     0.522237 |     0.368615
   9 |   0.150363 |     0.537778 |     0.387415
  31 |   0.145800 |     0.516129 |     0.370329

In [6]:
# Analyze where the gap comes from
print('\nGap distribution by N range:')
print('='*80)

ranges = [(1, 10), (11, 50), (51, 100), (101, 150), (151, 200)]
for start, end in ranges:
    range_gaps = [g for g in gaps if start <= g[0] <= end]
    total_gap = sum(g[1] for g in range_gaps)
    avg_gap = total_gap / len(range_gaps) if range_gaps else 0
    print(f'N={start:3d}-{end:3d}: Total gap = {total_gap:8.4f}, Avg gap = {avg_gap:.6f}')


Gap distribution by N range:
N=  1- 10: Total gap =   2.1408, Avg gap = 0.214077
N= 11- 50: Total gap =   4.6565, Avg gap = 0.116413
N= 51-100: Total gap =   3.8488, Avg gap = 0.076976
N=101-150: Total gap =   3.4536, Avg gap = 0.069073
N=151-200: Total gap =   3.2757, Avg gap = 0.065515


In [7]:
# Key insight: The baseline has overlaps but shows what's POSSIBLE
# The gap is ~17 points, mostly from small N values

print('\n' + '='*80)
print('KEY INSIGHT:')
print('='*80)
print(f'''
The pre-optimized baseline scores 70.6 but has overlaps.
Zaburo's row-based approach scores 87.99 but is VALID.

The gap of ~17 points comes from:
1. Zaburo uses simple row-based placement (not optimized)
2. Baseline uses sophisticated optimization (but creates overlaps)

To close the gap, we need:
1. Start from Zaburo's VALID solution
2. Apply optimization that MAINTAINS validity
3. Use fractional translation (0.001 to 0.00001 step sizes)
4. Focus on small N values first (highest gap per tree)
''')


KEY INSIGHT:

The pre-optimized baseline scores 70.6 but has overlaps.
Zaburo's row-based approach scores 87.99 but is VALID.

The gap of ~17 points comes from:
1. Zaburo uses simple row-based placement (not optimized)
2. Baseline uses sophisticated optimization (but creates overlaps)

To close the gap, we need:
1. Start from Zaburo's VALID solution
2. Apply optimization that MAINTAINS validity
3. Use fractional translation (0.001 to 0.00001 step sizes)
4. Focus on small N values first (highest gap per tree)



In [8]:
# Check what the SA experiment improved
sa_metrics_path = '/home/code/experiments/006_sa_from_scratch/metrics.json'
with open(sa_metrics_path) as f:
    sa_metrics = json.load(f)

print('SA Experiment Results:')
print('='*80)
print(f"Initial score: {sa_metrics['initial_score']:.6f}")
print(f"Final score: {sa_metrics['cv_score']:.6f}")
print(f"Improvement: {sa_metrics['improvement']:.6f}")

# Per-N improvements from SA
sa_per_n = sa_metrics.get('per_n_scores', {})
if sa_per_n:
    print('\nSA improvements by N (where improved):')
    for n in range(1, 11):
        z_score = zaburo_scores.get(n)
        sa_score = sa_per_n.get(str(n))
        if z_score and sa_score:
            diff = z_score - sa_score
            if diff > 0.0001:
                print(f'  N={n}: {z_score:.6f} -> {sa_score:.6f} (improved by {diff:.6f})')

SA Experiment Results:
Initial score: 87.991248
Final score: 87.811181
Improvement: 0.180067

SA improvements by N (where improved):
  N=2: 0.720000 -> 0.655506 (improved by 0.064494)
  N=3: 0.653333 -> 0.578656 (improved by 0.074677)
  N=4: 0.765625 -> 0.724729 (improved by 0.040896)
