# Loop 5 Analysis: Finding Different Basins

## Key Insight from Evaluator
The baseline is packed SO TIGHTLY that even small perturbations cause collisions. This means:
1. Standard optimization cannot escape the local optimum
2. We need to find DIFFERENT BASINS, not optimize the current one

## Key Insight from Web Search
"By solving the optimal layout for a small group of trees (e.g., 8) and then tiling that pattern to cover larger instances, and by applying incremental 'pocket-filling' heuristics that greedily place extra trees to exploit leftover space—often refined with local-search tweaks—top teams push their scores below 69."

## Strategy
1. Analyze which N values have the most room for improvement
2. Implement tiling approach for large N
3. Run long optimization from different starting points

In [None]:
import numpy as np
import pandas as pd
from decimal import Decimal, getcontext
from shapely.geometry import Polygon
from shapely import affinity
from shapely.ops import unary_union
import matplotlib.pyplot as plt

getcontext().prec = 25
scale_factor = Decimal("1e15")

TX = np.array([0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125])
TY = np.array([0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5])

print("Setup complete")

In [None]:
# Load baseline and analyze per-N scores
baseline_df = pd.read_csv('/home/nonroot/snapshots/santa-2025/21116303805/code/preoptimized/santa-2025-csv/santa-2025.csv')

def parse_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return val[1:]
    return str(val)

def calculate_n_score(df, n):
    """Calculate score for a specific N"""
    prefix = f"{n:03d}_"
    rows = df[df['id'].str.startswith(prefix)]
    
    all_vertices = []
    for _, row in rows.iterrows():
        x = float(parse_value(row['x']))
        y = float(parse_value(row['y']))
        deg = float(parse_value(row['deg']))
        
        rad = np.radians(deg)
        c, s = np.cos(rad), np.sin(rad)
        for tx, ty in zip(TX, TY):
            vx = tx * c - ty * s + x
            vy = tx * s + ty * c + y
            all_vertices.append([vx, vy])
    
    vertices = np.array(all_vertices)
    min_x, min_y = vertices.min(axis=0)
    max_x, max_y = vertices.max(axis=0)
    side = max(max_x - min_x, max_y - min_y)
    return side**2 / n, side

# Calculate scores for all N
n_scores = []
for n in range(1, 201):
    score, side = calculate_n_score(baseline_df, n)
    n_scores.append({'n': n, 'score': score, 'side': side, 'efficiency': n / side**2})

df_scores = pd.DataFrame(n_scores)
print(f"Total score: {df_scores['score'].sum():.6f}")
print(f"Target: 68.919154")
print(f"Gap: {df_scores['score'].sum() - 68.919154:.6f}")

In [None]:
# Analyze which N values have the most potential for improvement
# Theoretical optimal packing efficiency for irregular polygons is around 0.85-0.95

# Tree area (approximate)
tree_area = 0.5 * 0.7 * 0.25 + 0.5 * 0.4 * 0.25 + 0.5 * 0.25 * 0.3 + 0.15 * 0.2  # rough estimate
print(f"Approximate tree area: {tree_area:.4f}")

# Calculate packing efficiency for each N
df_scores['packing_efficiency'] = df_scores['n'] * tree_area / df_scores['side']**2

# Find N values with lowest efficiency (most room for improvement)
print("\nN values with LOWEST packing efficiency (most room for improvement):")
print(df_scores.nsmallest(20, 'packing_efficiency')[['n', 'score', 'side', 'packing_efficiency']])

In [None]:
# Analyze score contribution by N range
ranges = [(1, 10), (11, 50), (51, 100), (101, 150), (151, 200)]

print("Score contribution by N range:")
for start, end in ranges:
    mask = (df_scores['n'] >= start) & (df_scores['n'] <= end)
    range_score = df_scores[mask]['score'].sum()
    pct = range_score / df_scores['score'].sum() * 100
    print(f"  N={start}-{end}: {range_score:.4f} ({pct:.1f}%)")

# Calculate how much improvement needed per range to close the gap
gap = df_scores['score'].sum() - 68.919154
print(f"\nTotal gap to close: {gap:.4f}")
print("\nImprovement needed per range to close gap:")
for start, end in ranges:
    mask = (df_scores['n'] >= start) & (df_scores['n'] <= end)
    range_score = df_scores[mask]['score'].sum()
    improvement_pct = gap / range_score * 100
    print(f"  N={start}-{end}: {improvement_pct:.2f}% improvement needed")

In [None]:
# Identify perfect square N values (good for tiling)
perfect_squares = [n for n in range(1, 201) if int(np.sqrt(n))**2 == n]
print("Perfect square N values (good for tiling):")
for n in perfect_squares:
    score, side = calculate_n_score(baseline_df, n)
    sqrt_n = int(np.sqrt(n))
    print(f"  N={n} ({sqrt_n}x{sqrt_n}): score={score:.6f}, side={side:.4f}")

# Also check rectangular tilings
print("\nRectangular tiling candidates (nx * ny = N):")
for n in [72, 100, 110, 144, 156, 196, 200]:
    score, side = calculate_n_score(baseline_df, n)
    # Find factors
    factors = [(i, n//i) for i in range(2, int(np.sqrt(n))+1) if n % i == 0]
    print(f"  N={n}: score={score:.6f}, factors={factors}")

In [None]:
# Key insight: The gap is 1.76 points (2.49%)
# If we can improve large N by 3-5%, we can close the gap

# Let's calculate what score we need for each N range to hit target
target = 68.919154
current = df_scores['score'].sum()

print("Scenario Analysis:")
print("="*60)

# Scenario 1: Uniform improvement across all N
improvement_pct = (current - target) / current * 100
print(f"\n1. Uniform {improvement_pct:.2f}% improvement across all N")

# Scenario 2: Focus on large N (>100)
large_n_score = df_scores[df_scores['n'] > 100]['score'].sum()
needed_improvement = (current - target) / large_n_score * 100
print(f"\n2. Focus on N>100 (score={large_n_score:.4f}):")
print(f"   Need {needed_improvement:.2f}% improvement on large N only")

# Scenario 3: Focus on tiling candidates
tiling_ns = [72, 100, 110, 144, 156, 196, 200]
tiling_score = df_scores[df_scores['n'].isin(tiling_ns)]['score'].sum()
needed_improvement = (current - target) / tiling_score * 100
print(f"\n3. Focus on tiling candidates {tiling_ns}:")
print(f"   Current score: {tiling_score:.4f}")
print(f"   Need {needed_improvement:.2f}% improvement on these N values only")

In [None]:
# Summary of findings
print("="*60)
print("SUMMARY OF FINDINGS")
print("="*60)
print(f"""
1. Current score: {current:.6f}
2. Target score: {target:.6f}
3. Gap: {current - target:.6f} ({(current - target)/current*100:.2f}%)

4. Key insight: The baseline is at a VERY TIGHT local optimum.
   - Even small perturbations cause collisions
   - Standard optimization cannot escape
   - Need to find DIFFERENT BASINS

5. Approaches to try:
   a) TILING: Optimize 2-tree configuration, then tile for large N
      - Candidates: N=72, 100, 110, 144, 156, 196, 200
      - These contribute {tiling_score:.4f} to total score
   
   b) LONG OPTIMIZATION: Run C++ optimizer for HOURS, not minutes
      - Current runs: 5-10 minutes
      - Top solutions: 1+ hours per N
   
   c) DIFFERENT STARTING POINTS:
      - Random initialization
      - Lattice patterns
      - Different base configurations

6. The target score (68.919154) IS achievable:
   - It exists on the leaderboard
   - Gap is only 2.49%
   - Focus on finding different basins, not optimizing current one
""")
print("="*60)