# Evolver Loop 4 - LB Feedback Analysis

## Submission Results
- **exp_000**: CV 70.6761 | LB 70.6761 (gap: 0.0000)
- **exp_003**: CV 70.6761 | LB 70.6761 (gap: 0.0000)

## Key Finding: Perfect CV-LB Alignment
The CV and LB scores are IDENTICAL, which means:
1. Our scoring function is correct
2. Our overlap detection is correct
3. The baseline is truly at 70.676102

## The Challenge
- Target: 68.919154
- Current: 70.676102
- Gap: 1.756948 (2.49% improvement needed)

## What We've Tried (All Failed)
1. Fractional translation - 0 improvement
2. SA + local search + fractional translation - 0 improvement
3. bbox3 optimizer (80 rounds) - 0 improvement
4. tree_packer_v21 - 0 improvement
5. Ensemble from 731 CSV files - all better scores have overlaps

## Key Insight from Research
To break below 70, we need **multi-tree lattice packings** with 3-, 4-, or 5-tree repeating units, not just optimization of existing solutions.

In [None]:
import pandas as pd
import numpy as np
from shapely.geometry import Polygon
from shapely.ops import unary_union
import matplotlib.pyplot as plt

# Tree geometry
TX = np.array([0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125])
TY = np.array([0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5])

def parse_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return val[1:]
    return str(val)

def build_polygon(x, y, angle):
    angle_rad = float(angle) * np.pi / 180.0
    cos_a = np.cos(angle_rad)
    sin_a = np.sin(angle_rad)
    vertices = [(TX[i] * cos_a - TY[i] * sin_a + float(x),
                 TX[i] * sin_a + TY[i] * cos_a + float(y)) for i in range(15)]
    return Polygon(vertices)

def get_score_for_n(df, n):
    prefix = f"{n:03d}_"
    rows = df[df['id'].str.startswith(prefix)]
    if len(rows) != n:
        return float('inf')
    
    all_points = []
    for _, row in rows.iterrows():
        x = float(parse_value(row['x']))
        y = float(parse_value(row['y']))
        deg = float(parse_value(row['deg']))
        poly = build_polygon(x, y, deg)
        all_points.extend(list(poly.exterior.coords))
    
    all_points = np.array(all_points)
    side = max(all_points.max(axis=0) - all_points.min(axis=0))
    return (side ** 2) / n

print("Functions loaded")

In [None]:
# Load baseline and analyze score breakdown
baseline_df = pd.read_csv('/home/code/submission_candidates/candidate_000.csv')

scores = []
for n in range(1, 201):
    score = get_score_for_n(baseline_df, n)
    scores.append({'n': n, 'score': score, 'contribution': score})

scores_df = pd.DataFrame(scores)
print(f"Total score: {scores_df['score'].sum():.6f}")
print(f"Target: 68.919154")
print(f"Gap: {scores_df['score'].sum() - 68.919154:.6f}")
print()
print("Score breakdown by N range:")
for start in [1, 11, 21, 51, 101, 151]:
    end = min(start + 9, 200) if start < 101 else min(start + 49, 200)
    range_score = scores_df[(scores_df['n'] >= start) & (scores_df['n'] <= end)]['score'].sum()
    print(f"  N={start:3d}-{end:3d}: {range_score:.4f}")

In [None]:
# Identify which N values have the most room for improvement
# Theoretical minimum for N trees is approximately sqrt(N) * tree_area
# But for small N, the packing efficiency is much worse

print("Top 20 N values by score contribution:")
top_n = scores_df.nlargest(20, 'score')
for _, row in top_n.iterrows():
    print(f"  N={int(row['n']):3d}: score={row['score']:.6f}")

print("\nSmall N values (1-20) contribute:", scores_df[scores_df['n'] <= 20]['score'].sum())
print("Medium N values (21-100) contribute:", scores_df[(scores_df['n'] > 20) & (scores_df['n'] <= 100)]['score'].sum())
print("Large N values (101-200) contribute:", scores_df[scores_df['n'] > 100]['score'].sum())

In [None]:
# Calculate efficiency for each N
# Efficiency = (N * tree_area) / (bounding_box_area)
# Tree area is approximately 0.35 * 1.0 = 0.35 (rough estimate)

tree_area = 0.35  # Approximate

efficiencies = []
for n in range(1, 201):
    score = scores_df[scores_df['n'] == n]['score'].values[0]
    bbox_area = score * n  # score = side^2 / n, so side^2 = score * n
    efficiency = (n * tree_area) / bbox_area
    efficiencies.append({'n': n, 'efficiency': efficiency, 'score': score})

eff_df = pd.DataFrame(efficiencies)

print("Efficiency by N range (higher is better):")
for start in [1, 11, 21, 51, 101, 151]:
    end = min(start + 9, 200) if start < 101 else min(start + 49, 200)
    avg_eff = eff_df[(eff_df['n'] >= start) & (eff_df['n'] <= end)]['efficiency'].mean()
    print(f"  N={start:3d}-{end:3d}: avg efficiency={avg_eff:.4f}")

print("\nLowest efficiency N values (most room for improvement):")
lowest_eff = eff_df.nsmallest(10, 'efficiency')
for _, row in lowest_eff.iterrows():
    print(f"  N={int(row['n']):3d}: efficiency={row['efficiency']:.4f}, score={row['score']:.6f}")

In [None]:
# Key insight: To close the 1.76 point gap, we need to improve across all N values
# But small N values have the worst efficiency and contribute disproportionately

# Calculate how much improvement we need per N value
target = 68.919154
current = scores_df['score'].sum()
gap = current - target

print(f"Current total: {current:.6f}")
print(f"Target: {target:.6f}")
print(f"Gap: {gap:.6f}")
print()
print(f"Average improvement needed per N: {gap / 200:.6f}")
print(f"Percentage improvement needed: {100 * gap / current:.2f}%")
print()
print("If we could improve each N by 2.49%:")
for n in [1, 2, 5, 10, 50, 100, 200]:
    current_score = scores_df[scores_df['n'] == n]['score'].values[0]
    improved_score = current_score * 0.9751
    improvement = current_score - improved_score
    print(f"  N={n:3d}: {current_score:.6f} -> {improved_score:.6f} (save {improvement:.6f})")

## Strategy Analysis

### What We Know:
1. The baseline is at a VERY tight local optimum - standard optimizers cannot improve it
2. The gap to target is 1.76 points (2.49% improvement needed)
3. Small N values (1-20) have the worst efficiency and contribute disproportionately
4. All CSV files with better scores have overlapping trees

### What Research Suggests:
1. **Multi-tree lattice packings** with 3-, 4-, or 5-tree repeating units can break below 70
2. **Symmetric solutions** dramatically shrink search space while covering optimal region
3. **Grid-based approaches** with SA optimization on lattice parameters
4. The "super-fast-simulated-annealing-with-translations" kernel uses grid configs

### Key Approaches to Try:
1. **Lattice-based packing for large N** - Generate grid configurations and optimize
2. **Symmetric solutions for small N** - Exhaustive search with symmetry constraints
3. **Novel repeating patterns** - 3-tree, 4-tree, 5-tree unit cells
4. **Very long random restarts** - Start from scratch with different initial configs