# Evolver Loop 3 Analysis

## Key Insight from Evaluator
The lattice construction (88.33) is a STARTING POINT, not a final solution. The correct workflow is:
1. Generate lattice configurations
2. Apply SA optimization to lattice output
3. Compare OPTIMIZED lattice vs baseline
4. Ensemble: pick best per-N from multiple sources

## Current Status
- Best CV/LB: 70.659959
- Target: 68.919154
- Gap: 1.74 points (2.46%)

## Questions to Answer
1. What does the Jonathan Chan kernel achieve?
2. Can we apply SA to lattice output?
3. What per-N improvements are possible?

In [None]:
import pandas as pd
import numpy as np
import os

# Load baseline
df_baseline = pd.read_csv('/home/code/external_data/saspav/santa-2025.csv')

# Tree geometry
TX = [0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125]
TY = [0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5]

def parse_value(s):
    if isinstance(s, str) and s.startswith('s'):
        return float(s[1:])
    return float(s)

def create_tree_polygon(x, y, deg):
    from shapely.geometry import Polygon
    angle_rad = np.radians(deg)
    cos_a, sin_a = np.cos(angle_rad), np.sin(angle_rad)
    vertices = [(tx * cos_a - ty * sin_a + x, tx * sin_a + ty * cos_a + y) for tx, ty in zip(TX, TY)]
    return Polygon(vertices)

def compute_bounding_side(polygons):
    if not polygons:
        return 0
    all_points = []
    for poly in polygons:
        all_points.extend(list(poly.exterior.coords))
    all_points = np.array(all_points)
    min_x, min_y = all_points.min(axis=0)
    max_x, max_y = all_points.max(axis=0)
    return max(max_x - min_x, max_y - min_y)

def compute_score_for_n(df, n):
    prefix = f"{n:03d}_"
    trees = df[df['id'].str.startswith(prefix)]
    if len(trees) != n:
        return float('inf')
    polygons = [create_tree_polygon(parse_value(row['x']), parse_value(row['y']), parse_value(row['deg'])) for _, row in trees.iterrows()]
    side = compute_bounding_side(polygons)
    return side**2 / n

def compute_total_score(df):
    total = 0
    for n in range(1, 201):
        total += compute_score_for_n(df, n)
    return total

print(f"Baseline total score: {compute_total_score(df_baseline):.6f}")

In [None]:
# Analyze per-N scores and identify worst performers
per_n_scores = []
for n in range(1, 201):
    score = compute_score_for_n(df_baseline, n)
    per_n_scores.append({'n': n, 'score': score, 'efficiency': score / (n * 0.5)})

df_analysis = pd.DataFrame(per_n_scores)
df_analysis['contribution'] = df_analysis['score'] / df_analysis['score'].sum() * 100

print("Top 20 worst efficiency (most room for improvement):")
print(df_analysis.nlargest(20, 'efficiency')[['n', 'score', 'efficiency', 'contribution']])

In [None]:
# Check if C++ SA optimizer exists
import subprocess

# List available optimizers
print("Checking for C++ optimizers...")
for path in ['/home/code/experiments/002_cpp_optimizer', '/home/code/research/kernels']:
    if os.path.exists(path):
        result = subprocess.run(['find', path, '-name', '*.cpp', '-o', '-name', 'sa_*'], capture_output=True, text=True)
        print(f"\nIn {path}:")
        print(result.stdout)

In [None]:
# Check Jonathan Chan kernel for the C++ optimizer code
print("Jonathan Chan kernel uses:")
print("1. Ensemble from 15+ sources")
print("2. C++ SA optimizer with fractional translation")
print("3. Steps: 0.001, 0.0005, 0.0002, 0.0001, 0.00005, 0.00002, 0.00001")
print("4. Multi-restart with perturbation")
print("\nKey functions:")
print("- sa_v3(): Simulated annealing with temperature schedule")
print("- ls_v3(): Local search in 8 directions + rotations")
print("- fractional_translation(): Fine-grained position adjustments")
print("- opt_v3(): Combined optimization with population of 3 best")

In [None]:
# The key insight: we need to apply SA to lattice output
# Let's check if we have the lattice output saved

lattice_path = '/home/code/experiments/003_lattice_construction/lattice_output.csv'
if os.path.exists(lattice_path):
    df_lattice = pd.read_csv(lattice_path)
    print(f"Lattice output exists: {len(df_lattice)} rows")
    print(f"Lattice score: {compute_total_score(df_lattice):.6f}")
else:
    print("Lattice output not saved - need to regenerate and save it")
    print("\nNext experiment should:")
    print("1. Generate lattice configurations for all N")
    print("2. Save to CSV")
    print("3. Apply C++ SA optimizer")
    print("4. Compare optimized lattice vs baseline")
    print("5. Create ensemble picking best per-N")

In [None]:
# Summary of what we know
print("="*60)
print("ANALYSIS SUMMARY")
print("="*60)
print("\nCurrent best: 70.659959")
print("Target: 68.919154")
print("Gap: 1.74 points (2.46%)")
print("\nKey findings:")
print("1. Pre-optimized baseline is at local optimum - SA shows 0 improvement")
print("2. Raw lattice construction (88.33) is worse than baseline")
print("3. BUT: Lattice is a STARTING POINT for SA, not final solution")
print("4. Jonathan Chan kernel shows the workflow:")
print("   - Ensemble from 15+ sources")
print("   - Apply C++ SA with fractional translation")
print("   - Keep best per-N")
print("\nNext steps:")
print("1. Generate lattice configurations and SAVE them")
print("2. Apply C++ SA optimizer to lattice output")
print("3. Compare optimized lattice vs baseline per-N")
print("4. Create ensemble submission")
print("="*60)