# Loop 5 Analysis: Finding Different Basins

## Key Insight from Evaluator
The baseline is packed SO TIGHTLY that even small perturbations cause collisions. This means:
1. Standard optimization cannot escape the local optimum
2. We need to find DIFFERENT BASINS, not optimize the current one

## Key Insight from Web Search
"By solving the optimal layout for a small group of trees (e.g., 8) and then tiling that pattern to cover larger instances, and by applying incremental 'pocket-filling' heuristics that greedily place extra trees to exploit leftover space—often refined with local-search tweaks—top teams push their scores below 69."

## Strategy
1. Analyze which N values have the most room for improvement
2. Implement tiling approach for large N
3. Run long optimization from different starting points

In [1]:
import numpy as np
import pandas as pd
from decimal import Decimal, getcontext
from shapely.geometry import Polygon
from shapely import affinity
from shapely.ops import unary_union
import matplotlib.pyplot as plt

getcontext().prec = 25
scale_factor = Decimal("1e15")

TX = np.array([0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125])
TY = np.array([0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5])

print("Setup complete")

Setup complete


In [2]:
# Load baseline and analyze per-N scores
baseline_df = pd.read_csv('/home/nonroot/snapshots/santa-2025/21116303805/code/preoptimized/santa-2025-csv/santa-2025.csv')

def parse_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return val[1:]
    return str(val)

def calculate_n_score(df, n):
    """Calculate score for a specific N"""
    prefix = f"{n:03d}_"
    rows = df[df['id'].str.startswith(prefix)]
    
    all_vertices = []
    for _, row in rows.iterrows():
        x = float(parse_value(row['x']))
        y = float(parse_value(row['y']))
        deg = float(parse_value(row['deg']))
        
        rad = np.radians(deg)
        c, s = np.cos(rad), np.sin(rad)
        for tx, ty in zip(TX, TY):
            vx = tx * c - ty * s + x
            vy = tx * s + ty * c + y
            all_vertices.append([vx, vy])
    
    vertices = np.array(all_vertices)
    min_x, min_y = vertices.min(axis=0)
    max_x, max_y = vertices.max(axis=0)
    side = max(max_x - min_x, max_y - min_y)
    return side**2 / n, side

# Calculate scores for all N
n_scores = []
for n in range(1, 201):
    score, side = calculate_n_score(baseline_df, n)
    n_scores.append({'n': n, 'score': score, 'side': side, 'efficiency': n / side**2})

df_scores = pd.DataFrame(n_scores)
print(f"Total score: {df_scores['score'].sum():.6f}")
print(f"Target: 68.919154")
print(f"Gap: {df_scores['score'].sum() - 68.919154:.6f}")

Total score: 70.676102
Target: 68.919154
Gap: 1.756948


In [3]:
# Analyze which N values have the most potential for improvement
# Theoretical optimal packing efficiency for irregular polygons is around 0.85-0.95

# Tree area (approximate)
tree_area = 0.5 * 0.7 * 0.25 + 0.5 * 0.4 * 0.25 + 0.5 * 0.25 * 0.3 + 0.15 * 0.2  # rough estimate
print(f"Approximate tree area: {tree_area:.4f}")

# Calculate packing efficiency for each N
df_scores['packing_efficiency'] = df_scores['n'] * tree_area / df_scores['side']**2

# Find N values with lowest efficiency (most room for improvement)
print("\nN values with LOWEST packing efficiency (most room for improvement):")
print(df_scores.nsmallest(20, 'packing_efficiency')[['n', 'score', 'side', 'packing_efficiency']])

Approximate tree area: 0.2050

N values with LOWEST packing efficiency (most room for improvement):
     n     score      side  packing_efficiency
0    1  0.661250  0.813173            0.310019
1    2  0.450779  0.949504            0.454768
2    3  0.434745  1.142031            0.471541
4    5  0.416850  1.443692            0.491784
3    4  0.416545  1.290806            0.492144
6    7  0.399897  1.673104            0.512633
5    6  0.399610  1.548438            0.513000
8    9  0.387415  1.867280            0.529148
7    8  0.385407  1.755921            0.531905
14  15  0.379203  2.384962            0.540608
9   10  0.376630  1.940696            0.544301
20  21  0.376451  2.811667            0.544560
19  20  0.376057  2.742469            0.545130
10  11  0.375736  2.033002            0.545596
21  22  0.375258  2.873270            0.546291
15  16  0.374128  2.446640            0.547941
25  26  0.373997  3.118320            0.548133
11  12  0.372724  2.114873            0.550005
12  13 

In [4]:
# Analyze score contribution by N range
ranges = [(1, 10), (11, 50), (51, 100), (101, 150), (151, 200)]

print("Score contribution by N range:")
for start, end in ranges:
    mask = (df_scores['n'] >= start) & (df_scores['n'] <= end)
    range_score = df_scores[mask]['score'].sum()
    pct = range_score / df_scores['score'].sum() * 100
    print(f"  N={start}-{end}: {range_score:.4f} ({pct:.1f}%)")

# Calculate how much improvement needed per range to close the gap
gap = df_scores['score'].sum() - 68.919154
print(f"\nTotal gap to close: {gap:.4f}")
print("\nImprovement needed per range to close gap:")
for start, end in ranges:
    mask = (df_scores['n'] >= start) & (df_scores['n'] <= end)
    range_score = df_scores[mask]['score'].sum()
    improvement_pct = gap / range_score * 100
    print(f"  N={start}-{end}: {improvement_pct:.2f}% improvement needed")

Score contribution by N range:
  N=1-10: 4.3291 (6.1%)
  N=11-50: 14.7130 (20.8%)
  N=51-100: 17.6411 (25.0%)
  N=101-150: 17.1441 (24.3%)
  N=151-200: 16.8487 (23.8%)

Total gap to close: 1.7569

Improvement needed per range to close gap:
  N=1-10: 40.58% improvement needed
  N=11-50: 11.94% improvement needed
  N=51-100: 9.96% improvement needed
  N=101-150: 10.25% improvement needed
  N=151-200: 10.43% improvement needed


In [5]:
# Identify perfect square N values (good for tiling)
perfect_squares = [n for n in range(1, 201) if int(np.sqrt(n))**2 == n]
print("Perfect square N values (good for tiling):")
for n in perfect_squares:
    score, side = calculate_n_score(baseline_df, n)
    sqrt_n = int(np.sqrt(n))
    print(f"  N={n} ({sqrt_n}x{sqrt_n}): score={score:.6f}, side={side:.4f}")

# Also check rectangular tilings
print("\nRectangular tiling candidates (nx * ny = N):")
for n in [72, 100, 110, 144, 156, 196, 200]:
    score, side = calculate_n_score(baseline_df, n)
    # Find factors
    factors = [(i, n//i) for i in range(2, int(np.sqrt(n))+1) if n % i == 0]
    print(f"  N={n}: score={score:.6f}, factors={factors}")

Perfect square N values (good for tiling):
  N=1 (1x1): score=0.661250, side=0.8132
  N=4 (2x2): score=0.416545, side=1.2908
  N=9 (3x3): score=0.387415, side=1.8673
  N=16 (4x4): score=0.374128, side=2.4466
  N=25 (5x5): score=0.372144, side=3.0502
  N=36 (6x6): score=0.358820, side=3.5941
  N=49 (7x7): score=0.363432, side=4.2200
  N=64 (8x8): score=0.350468, side=4.7360
  N=81 (9x9): score=0.355438, side=5.3657
  N=100 (10x10): score=0.345531, side=5.8782
  N=121 (11x11): score=0.351331, side=6.5201
  N=144 (12x12): score=0.342276, side=7.0205
  N=169 (13x13): score=0.342500, side=7.6081
  N=196 (14x14): score=0.333299, side=8.0825

Rectangular tiling candidates (nx * ny = N):
  N=72: score=0.348559, factors=[(2, 36), (3, 24), (4, 18), (6, 12), (8, 9)]
  N=100: score=0.345531, factors=[(2, 50), (4, 25), (5, 20), (10, 10)]
  N=110: score=0.337604, factors=[(2, 55), (5, 22), (10, 11)]
  N=144: score=0.342276, factors=[(2, 72), (3, 48), (4, 36), (6, 24), (8, 18), (9, 16), (12, 12)]
  N

In [6]:
# Key insight: The gap is 1.76 points (2.49%)
# If we can improve large N by 3-5%, we can close the gap

# Let's calculate what score we need for each N range to hit target
target = 68.919154
current = df_scores['score'].sum()

print("Scenario Analysis:")
print("="*60)

# Scenario 1: Uniform improvement across all N
improvement_pct = (current - target) / current * 100
print(f"\n1. Uniform {improvement_pct:.2f}% improvement across all N")

# Scenario 2: Focus on large N (>100)
large_n_score = df_scores[df_scores['n'] > 100]['score'].sum()
needed_improvement = (current - target) / large_n_score * 100
print(f"\n2. Focus on N>100 (score={large_n_score:.4f}):")
print(f"   Need {needed_improvement:.2f}% improvement on large N only")

# Scenario 3: Focus on tiling candidates
tiling_ns = [72, 100, 110, 144, 156, 196, 200]
tiling_score = df_scores[df_scores['n'].isin(tiling_ns)]['score'].sum()
needed_improvement = (current - target) / tiling_score * 100
print(f"\n3. Focus on tiling candidates {tiling_ns}:")
print(f"   Current score: {tiling_score:.4f}")
print(f"   Need {needed_improvement:.2f}% improvement on these N values only")

Scenario Analysis:

1. Uniform 2.49% improvement across all N

2. Focus on N>100 (score=33.9928):
   Need 5.17% improvement on large N only

3. Focus on tiling candidates [72, 100, 110, 144, 156, 196, 200]:
   Current score: 2.3750
   Need 73.98% improvement on these N values only


In [7]:
# Summary of findings
print("="*60)
print("SUMMARY OF FINDINGS")
print("="*60)
print(f"""
1. Current score: {current:.6f}
2. Target score: {target:.6f}
3. Gap: {current - target:.6f} ({(current - target)/current*100:.2f}%)

4. Key insight: The baseline is at a VERY TIGHT local optimum.
   - Even small perturbations cause collisions
   - Standard optimization cannot escape
   - Need to find DIFFERENT BASINS

5. Approaches to try:
   a) TILING: Optimize 2-tree configuration, then tile for large N
      - Candidates: N=72, 100, 110, 144, 156, 196, 200
      - These contribute {tiling_score:.4f} to total score
   
   b) LONG OPTIMIZATION: Run C++ optimizer for HOURS, not minutes
      - Current runs: 5-10 minutes
      - Top solutions: 1+ hours per N
   
   c) DIFFERENT STARTING POINTS:
      - Random initialization
      - Lattice patterns
      - Different base configurations

6. The target score (68.919154) IS achievable:
   - It exists on the leaderboard
   - Gap is only 2.49%
   - Focus on finding different basins, not optimizing current one
""")
print("="*60)

SUMMARY OF FINDINGS

1. Current score: 70.676102
2. Target score: 68.919154
3. Gap: 1.756948 (2.49%)

4. Key insight: The baseline is at a VERY TIGHT local optimum.
   - Even small perturbations cause collisions
   - Standard optimization cannot escape
   - Need to find DIFFERENT BASINS

5. Approaches to try:
   a) TILING: Optimize 2-tree configuration, then tile for large N
      - Candidates: N=72, 100, 110, 144, 156, 196, 200
      - These contribute 2.3750 to total score
   
   b) LONG OPTIMIZATION: Run C++ optimizer for HOURS, not minutes
      - Current runs: 5-10 minutes
      - Top solutions: 1+ hours per N
   
   c) DIFFERENT STARTING POINTS:
      - Random initialization
      - Lattice patterns
      - Different base configurations

6. The target score (68.919154) IS achievable:
   - It exists on the leaderboard
   - Gap is only 2.49%
   - Focus on finding different basins, not optimizing current one

