# Evolver Loop 6 - LB Feedback Analysis

## Key Results:
- exp_005 (Zaburo row-based): CV=87.99, LB=87.99 ✅ ACCEPTED!
- This proves our format is correct and we have a valid starting point

## Gap Analysis:
- Current best valid: 87.99
- Target: 68.89
- Gap: 19.1 points (27.7%)

## Strategy Analysis:
The pre-optimized baselines (~70.6) have subtle overlaps that Kaggle rejects.
We need to either:
1. Fix the overlaps in pre-optimized baselines, OR
2. Optimize from Zaburo's valid starting point (87.99)

In [1]:
import pandas as pd
import numpy as np
import json
import os
from decimal import Decimal, getcontext
from shapely.geometry import Polygon
from shapely import affinity
from shapely.ops import unary_union

getcontext().prec = 25
scale_factor = Decimal('1e15')

print("Analysis of current state...")

Analysis of current state...


In [2]:
# Load the valid Zaburo solution and analyze per-N scores
zaburo_path = '/home/code/experiments/005_zaburo_rowbased/metrics.json'
with open(zaburo_path) as f:
    zaburo_metrics = json.load(f)

zaburo_per_n = {int(k): v for k, v in zaburo_metrics['per_n_scores'].items()}
print(f"Zaburo total score: {zaburo_metrics['cv_score']:.6f}")
print(f"\nTop 10 worst N values (highest score contribution):")
sorted_n = sorted(zaburo_per_n.items(), key=lambda x: x[1], reverse=True)
for n, score in sorted_n[:10]:
    print(f"  N={n:3d}: {score:.6f}")

Zaburo total score: 87.991248

Top 10 worst N values (highest score contribution):
  N=  5: 0.800000
  N=  4: 0.765625
  N=  2: 0.720000
  N=  6: 0.666667
  N=  1: 0.661250
  N=  3: 0.653333
  N=  7: 0.630000
  N= 13: 0.603077
  N= 15: 0.600000
  N= 16: 0.562500


In [3]:
# Compare with baseline per-N scores
baseline_path = '/home/code/experiments/001_valid_baseline/metrics.json'
if os.path.exists(baseline_path):
    with open(baseline_path) as f:
        baseline_metrics = json.load(f)
    baseline_per_n = {int(k): v for k, v in baseline_metrics.get('per_n_scores', {}).items()}
    print(f"Baseline total score: {baseline_metrics['cv_score']:.6f}")
else:
    # Load from snapshot
    print("Loading baseline from snapshot...")
    baseline_per_n = {}

Baseline total score: 70.615745


In [4]:
# Key insight: Where does Zaburo lose the most vs baseline?
if baseline_per_n:
    print("\nN values where Zaburo is MUCH worse than baseline:")
    gaps = []
    for n in range(1, 201):
        z_score = zaburo_per_n.get(n, 0)
        b_score = baseline_per_n.get(n, 0)
        gap = z_score - b_score
        gaps.append((n, gap, z_score, b_score))
    
    gaps.sort(key=lambda x: x[1], reverse=True)
    total_gap = sum(g[1] for g in gaps)
    print(f"Total gap: {total_gap:.6f}")
    print(f"\nTop 20 N values with largest gaps:")
    for n, gap, z, b in gaps[:20]:
        print(f"  N={n:3d}: gap={gap:.6f} (Zaburo={z:.6f}, Baseline={b:.6f})")


N values where Zaburo is MUCH worse than baseline:
Total gap: 17.375503

Top 20 N values with largest gaps:
  N=  5: gap=0.383150 (Zaburo=0.800000, Baseline=0.416850)
  N=  4: gap=0.349080 (Zaburo=0.765625, Baseline=0.416545)
  N=  2: gap=0.269221 (Zaburo=0.720000, Baseline=0.450779)
  N=  6: gap=0.267056 (Zaburo=0.666667, Baseline=0.399610)
  N= 13: gap=0.230783 (Zaburo=0.603077, Baseline=0.372294)
  N=  7: gap=0.230103 (Zaburo=0.630000, Baseline=0.399897)
  N= 15: gap=0.223051 (Zaburo=0.600000, Baseline=0.376949)
  N=  3: gap=0.218588 (Zaburo=0.653333, Baseline=0.434745)
  N= 14: gap=0.190457 (Zaburo=0.560000, Baseline=0.369543)
  N= 16: gap=0.188372 (Zaburo=0.562500, Baseline=0.374128)
  N= 11: gap=0.170758 (Zaburo=0.545682, Baseline=0.374924)
  N=  8: gap=0.165843 (Zaburo=0.551250, Baseline=0.385407)
  N= 28: gap=0.163270 (Zaburo=0.529375, Baseline=0.366105)
  N= 17: gap=0.159371 (Zaburo=0.529412, Baseline=0.370040)
  N= 19: gap=0.153622 (Zaburo=0.522237, Baseline=0.368615)
  N=  

In [5]:
# Strategy: The path forward
print("="*60)
print("STRATEGY ANALYSIS")
print("="*60)
print("""
1. CURRENT STATE:
   - Zaburo (87.99) is VALID but far from target (68.89)
   - Pre-optimized baselines (~70.6) have overlaps Kaggle rejects
   - Gap: 19.1 points

2. KEY INSIGHT FROM KERNELS:
   - Top teams use simulated annealing (SA) on valid solutions
   - SA moves: small translations, rotations, swaps
   - Must maintain no-overlap constraint
   - The bbox3 binary is a C++ SA optimizer

3. PATH FORWARD:
   Option A: Implement SA from scratch in Python
   - Start from Zaburo (87.99)
   - Apply SA moves: translate, rotate, swap
   - Accept moves that improve score or pass SA temperature
   - Reject moves that create overlaps
   
   Option B: Fix overlaps in pre-optimized baseline
   - Load baseline (~70.6)
   - For each N with overlaps, apply separation
   - This is risky - may not fix all overlaps
   
   Option C: Hybrid approach
   - Use Zaburo for N values where baseline has overlaps
   - Use baseline for N values where it's valid
   - Then apply SA to improve

4. RECOMMENDED: Option A (SA from scratch)
   - Most reliable path
   - Zaburo is guaranteed valid
   - SA can improve from 88 toward 70
""")

STRATEGY ANALYSIS

1. CURRENT STATE:
   - Zaburo (87.99) is VALID but far from target (68.89)
   - Pre-optimized baselines (~70.6) have overlaps Kaggle rejects
   - Gap: 19.1 points

2. KEY INSIGHT FROM KERNELS:
   - Top teams use simulated annealing (SA) on valid solutions
   - SA moves: small translations, rotations, swaps
   - Must maintain no-overlap constraint
   - The bbox3 binary is a C++ SA optimizer

3. PATH FORWARD:
   Option A: Implement SA from scratch in Python
   - Start from Zaburo (87.99)
   - Apply SA moves: translate, rotate, swap
   - Accept moves that improve score or pass SA temperature
   - Reject moves that create overlaps
   
   Option B: Fix overlaps in pre-optimized baseline
   - Load baseline (~70.6)
   - For each N with overlaps, apply separation
   - This is risky - may not fix all overlaps
   
   Option C: Hybrid approach
   - Use Zaburo for N values where baseline has overlaps
   - Use baseline for N values where it's valid
   - Then apply SA to improve



In [6]:
# Calculate theoretical improvement potential
print("\nTHEORETICAL IMPROVEMENT POTENTIAL:")
print("="*60)

# N=1 is already optimal (45 degrees)
n1_score = zaburo_per_n.get(1, 0.661)
print(f"N=1: {n1_score:.6f} (already optimal at 45°)")

# Small N (2-10) have highest per-tree contribution
small_n_total = sum(zaburo_per_n.get(n, 0) for n in range(2, 11))
print(f"N=2-10 total: {small_n_total:.6f}")

# If we could match baseline for all N:
if baseline_per_n:
    potential_score = sum(baseline_per_n.get(n, zaburo_per_n.get(n, 0)) for n in range(1, 201))
    print(f"\nIf we matched baseline for all N: {potential_score:.6f}")
    print(f"But baseline has overlaps, so this is not directly achievable.")

print(f"\nTarget: 68.89")
print(f"Current: 87.99")
print(f"Gap: 19.1 points")
print(f"\nThis requires ~22% improvement - achievable with SA!")


THEORETICAL IMPROVEMENT POTENTIAL:
N=1: 0.661250 (already optimal at 45°)
N=2-10 total: 5.808653

If we matched baseline for all N: 70.615745
But baseline has overlaps, so this is not directly achievable.

Target: 68.89
Current: 87.99
Gap: 19.1 points

This requires ~22% improvement - achievable with SA!


In [7]:
# Record key findings
print("\n" + "="*60)
print("KEY FINDINGS FOR NEXT EXPERIMENT")
print("="*60)
print("""
1. exp_005 (Zaburo, 87.99) is ACCEPTED by Kaggle - valid baseline!
2. Pre-optimized baselines (~70.6) have overlaps - can't use directly
3. Gap to target is 19.1 points (27.7%)
4. Top kernels use simulated annealing (SA) to optimize
5. SA moves: small translations (±0.01 to ±0.1), rotations (±1° to ±10°), swaps
6. Must maintain no-overlap constraint during SA
7. Small N values (1-10) have highest per-tree score contribution
8. N=1 is already optimal at 45° - no improvement possible

NEXT STEP: Implement simulated annealing from scratch in Python
- Start from Zaburo solution (87.99)
- Apply SA with temperature schedule
- Focus on high-impact N values first
- Expected improvement: 10-15 points with good SA
""")


KEY FINDINGS FOR NEXT EXPERIMENT

1. exp_005 (Zaburo, 87.99) is ACCEPTED by Kaggle - valid baseline!
2. Pre-optimized baselines (~70.6) have overlaps - can't use directly
3. Gap to target is 19.1 points (27.7%)
4. Top kernels use simulated annealing (SA) to optimize
5. SA moves: small translations (±0.01 to ±0.1), rotations (±1° to ±10°), swaps
6. Must maintain no-overlap constraint during SA
7. Small N values (1-10) have highest per-tree score contribution
8. N=1 is already optimal at 45° - no improvement possible

NEXT STEP: Implement simulated annealing from scratch in Python
- Start from Zaburo solution (87.99)
- Apply SA with temperature schedule
- Focus on high-impact N values first
- Expected improvement: 10-15 points with good SA

