# Evolver Loop 2 Analysis

## Situation
- Submission failed: "Overlapping trees in group 040"
- Best CV/LB: 70.676102 (from saspav solution)
- Target: 68.922808
- Gap: 1.75 points (2.5%)

## Key Questions
1. Why did the ensemble create overlapping trees?
2. What's the structure of the best solution?
3. What techniques can break through the local optimum?

In [1]:
import numpy as np
import pandas as pd
from shapely.geometry import Polygon
from shapely.affinity import rotate, translate
import matplotlib.pyplot as plt

# Load the saspav solution (known good)
df = pd.read_csv('/home/code/santa-2025-csv/santa-2025.csv')
print(f"Loaded {len(df)} rows")
print(df.head())

Loaded 20100 rows
      id                       x                       y  \
0  001_0    s-48.196086194214246     s58.770984615214225   
1  002_0   s0.154097069621355887  s-0.038540742694794648   
2  002_1  s-0.154097069621372845  s-0.561459257305224058   
3  003_0      s1.123655816140301      s0.781101815992563   
4  003_1       s1.23405569584216      s1.275999500663759   

                       deg  
0                    s45.0  
1  s203.629377730656841550  
2   s23.629377730656791812  
3        s111.125132292893  
4         s66.370622269343  


In [2]:
# Parse values
def parse_value(s):
    if isinstance(s, str) and s.startswith('s'):
        return float(s[1:])
    return float(s)

df['n'] = df['id'].apply(lambda x: int(x.split('_')[0]))
df['tree_idx'] = df['id'].apply(lambda x: int(x.split('_')[1]))
df['x_val'] = df['x'].apply(parse_value)
df['y_val'] = df['y'].apply(parse_value)
df['deg_val'] = df['deg'].apply(parse_value)

print(f"N values: {df['n'].min()} to {df['n'].max()}")
print(f"Total trees: {len(df)}")
print(f"Expected: {sum(range(1, 201))} = 20100")

N values: 1 to 200
Total trees: 20100
Expected: 20100 = 20100


In [3]:
# Analyze the structure of N=200 (largest configuration)
n200 = df[df['n'] == 200].copy()
print(f"N=200 has {len(n200)} trees")

# Analyze angle distribution
angles = n200['deg_val'].values % 360
print(f"\nAngle distribution:")
print(f"  Min: {angles.min():.2f}")
print(f"  Max: {angles.max():.2f}")
print(f"  Mean: {angles.mean():.2f}")

# Count trees pointing up vs down
up_trees = ((angles > 315) | (angles < 45)).sum()
down_trees = ((angles > 135) & (angles < 225)).sum()
print(f"\nOrientation:")
print(f"  Up (0+/-45 deg): {up_trees}")
print(f"  Down (180+/-45 deg): {down_trees}")
print(f"  Other: {len(n200) - up_trees - down_trees}")

N=200 has 200 trees

Angle distribution:
  Min: 76.70
  Max: 293.62
  Mean: 172.53

Orientation:
  Up (0+/-45 deg): 0
  Down (180+/-45 deg): 0
  Other: 200


In [4]:
# Calculate score breakdown by N range
def get_tree_vertices():
    trunk_w = 0.15
    trunk_h = 0.2
    base_w = 0.7
    mid_w = 0.4
    top_w = 0.25
    tip_y = 0.8
    tier_1_y = 0.5
    tier_2_y = 0.25
    base_y = 0.0
    trunk_bottom_y = -trunk_h
    
    vertices = [
        (0.0, tip_y), (top_w / 2, tier_1_y), (top_w / 4, tier_1_y),
        (mid_w / 2, tier_2_y), (mid_w / 4, tier_2_y), (base_w / 2, base_y),
        (trunk_w / 2, base_y), (trunk_w / 2, trunk_bottom_y),
        (-trunk_w / 2, trunk_bottom_y), (-trunk_w / 2, base_y),
        (-base_w / 2, base_y), (-mid_w / 4, tier_2_y), (-mid_w / 2, tier_2_y),
        (-top_w / 4, tier_1_y), (-top_w / 2, tier_1_y),
    ]
    return vertices

BASE_TREE = Polygon(get_tree_vertices())

def create_tree_polygon(x, y, deg):
    tree = rotate(BASE_TREE, deg, origin=(0, 0))
    tree = translate(tree, x, y)
    return tree

def get_bounding_box_side(trees_df):
    all_x, all_y = [], []
    for _, row in trees_df.iterrows():
        poly = create_tree_polygon(row['x_val'], row['y_val'], row['deg_val'])
        minx, miny, maxx, maxy = poly.bounds
        all_x.extend([minx, maxx])
        all_y.extend([miny, maxy])
    width = max(all_x) - min(all_x)
    height = max(all_y) - min(all_y)
    return max(width, height)

# Calculate score for each N
scores = []
for n in range(1, 201):
    trees_n = df[df['n'] == n]
    side = get_bounding_box_side(trees_n)
    score_n = side ** 2 / n
    scores.append({'n': n, 'side': side, 'score': score_n})

scores_df = pd.DataFrame(scores)
print(f"Total score: {scores_df['score'].sum():.6f}")
print(f"\nScore breakdown:")
print(f"  N=1-50: {scores_df[scores_df['n'] <= 50]['score'].sum():.4f}")
print(f"  N=51-100: {scores_df[(scores_df['n'] > 50) & (scores_df['n'] <= 100)]['score'].sum():.4f}")
print(f"  N=101-200: {scores_df[scores_df['n'] > 100]['score'].sum():.4f}")

Total score: 70.676102

Score breakdown:
  N=1-50: 19.0422
  N=51-100: 17.6411
  N=101-200: 33.9928


In [5]:
# Analyze efficiency (how close to theoretical minimum)
# Theoretical minimum: if trees could pack perfectly, side = sqrt(n * tree_area)
tree_area = BASE_TREE.area
print(f"Single tree area: {tree_area:.6f}")

scores_df['theoretical_side'] = np.sqrt(scores_df['n'] * tree_area)
scores_df['efficiency'] = scores_df['theoretical_side'] / scores_df['side']

print(f"\nEfficiency by N range:")
print(f"  N=1-50: {scores_df[scores_df['n'] <= 50]['efficiency'].mean():.4f}")
print(f"  N=51-100: {scores_df[(scores_df['n'] > 50) & (scores_df['n'] <= 100)]['efficiency'].mean():.4f}")
print(f"  N=101-200: {scores_df[scores_df['n'] > 100]['efficiency'].mean():.4f}")

Single tree area: 0.245625

Efficiency by N range:
  N=1-50: 0.8059
  N=51-100: 0.8344
  N=101-200: 0.8501


In [6]:
# Summary of findings
print("\n" + "="*60)
print("SUMMARY")
print("="*60)
print(f"Current best score: 70.676102")
print(f"Target score: 68.922808")
print(f"Gap: 1.753294 (2.5%)")
print()
print("Key insights:")
print("1. Current solutions are at LOCAL OPTIMA")
print("   - bbox3 C++ optimizer: 0 improvement")
print("   - tree_packer v18/v21: 0 improvement")
print("   - Ensembling: saspav already best for all N")
print()
print("2. To break through, we need CONSTRUCTIVE approaches:")
print("   - Lattice-based construction (egortrushin kernel)")
print("   - Start from scratch with different initial configs")
print("   - Use SA to optimize lattice parameters")
print()
print("3. The egortrushin kernel achieves this by:")
print("   - Creating 2-tree unit cells")
print("   - Tiling with fractional translations")
print("   - SA on translation parameters")
print("   - Backward propagation for smaller N")


SUMMARY
Current best score: 70.676102
Target score: 68.922808
Gap: 1.753294 (2.5%)

Key insights:
1. Current solutions are at LOCAL OPTIMA
   - bbox3 C++ optimizer: 0 improvement
   - tree_packer v18/v21: 0 improvement
   - Ensembling: saspav already best for all N

2. To break through, we need CONSTRUCTIVE approaches:
   - Lattice-based construction (egortrushin kernel)
   - Start from scratch with different initial configs
   - Use SA to optimize lattice parameters

3. The egortrushin kernel achieves this by:
   - Creating 2-tree unit cells
   - Tiling with fractional translations
   - SA on translation parameters
   - Backward propagation for smaller N
