# Loop 9 Strategic Analysis

## Key Question: Why are we stuck at 70.734327?

After 9 experiments, all achieving the EXACT SAME score, we need to understand:
1. Is the baseline truly at a global optimum?
2. What techniques haven't we tried?
3. What do the top kernels do differently?

In [1]:
import pandas as pd
import numpy as np
import glob
import os

# Tree shape
TX = np.array([0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125])
TY = np.array([0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5])

def get_tree_polygon(cx, cy, deg):
    rad = np.radians(deg)
    c, s = np.cos(rad), np.sin(rad)
    x = TX * c - TY * s + cx
    y = TX * s + TY * c + cy
    return x, y

def score_n(df, n):
    rows = df[df['id'].str.startswith(f'{n:03d}_')]
    if len(rows) == 0:
        return float('inf')
    
    all_x, all_y = [], []
    for _, row in rows.iterrows():
        x_val = float(str(row['x']).replace('s', ''))
        y_val = float(str(row['y']).replace('s', ''))
        deg = float(str(row['deg']).replace('s', ''))
        px, py = get_tree_polygon(x_val, y_val, deg)
        all_x.extend(px)
        all_y.extend(py)
    
    side = max(max(all_x) - min(all_x), max(all_y) - min(all_y))
    return side * side / n

print("Analysis of baseline vs overlapping configurations")

Analysis of baseline vs overlapping configurations


In [2]:
# Load baseline and overlapping CSV
baseline_df = pd.read_csv('/home/nonroot/snapshots/santa-2025/21105319338/code/datasets/santa-2025-csv/santa-2025.csv')
overlap_df = pd.read_csv('/home/code/experiments/submission_v21.csv')

# Compare scores for each N
print("Per-N Score Comparison: Baseline vs Overlapping")
print("="*70)

baseline_total = 0
overlap_total = 0
gap_by_n = []

for n in range(1, 201):
    b_score = score_n(baseline_df, n)
    o_score = score_n(overlap_df, n)
    baseline_total += b_score
    overlap_total += o_score
    gap = b_score - o_score
    gap_by_n.append((n, b_score, o_score, gap))

print(f"\nBaseline total: {baseline_total:.6f}")
print(f"Overlap total:  {overlap_total:.6f}")
print(f"Gap:            {baseline_total - overlap_total:.6f}")
print(f"Target:         68.931058")
print(f"Gap to target:  {baseline_total - 68.931058:.6f}")

Per-N Score Comparison: Baseline vs Overlapping



Baseline total: 70.734327
Overlap total:  67.727119
Gap:            3.007208
Target:         68.931058
Gap to target:  1.803269


In [3]:
# Find N values with biggest gaps
gap_by_n.sort(key=lambda x: -x[3])  # Sort by gap descending

print("\nTop 20 N values with biggest improvement potential:")
print("="*70)
print(f"{'N':>4} {'Baseline':>12} {'Overlap':>12} {'Gap':>12} {'% of Total Gap':>15}")
print("-"*70)

total_gap = baseline_total - overlap_total
for n, b, o, gap in gap_by_n[:20]:
    pct = gap / total_gap * 100 if total_gap > 0 else 0
    print(f"{n:>4} {b:>12.6f} {o:>12.6f} {gap:>12.6f} {pct:>14.1f}%")


Top 20 N values with biggest improvement potential:
   N     Baseline      Overlap          Gap  % of Total Gap
----------------------------------------------------------------------
   7     0.399897     0.157468     0.242429            8.1%
   6     0.399610     0.173625     0.225985            7.5%
  10     0.376630     0.164911     0.211719            7.0%
   9     0.387415     0.178013     0.209402            7.0%
   5     0.416850     0.212694     0.204155            6.8%
   8     0.385407     0.187564     0.197844            6.6%
   4     0.416545     0.227236     0.189308            6.3%
  12     0.372724     0.189158     0.183566            6.1%
  15     0.379203     0.210706     0.168497            5.6%
  17     0.370040     0.207258     0.162782            5.4%
   3     0.434745     0.296773     0.137973            4.6%
  11     0.375736     0.257667     0.118070            3.9%
  18     0.368771     0.273961     0.094811            3.2%
  22     0.375258     0.294317     0

In [4]:
# Analyze where the improvements come from
print("\nScore breakdown by N range:")
print("="*70)

ranges = [(1, 10), (11, 20), (21, 30), (31, 50), (51, 100), (101, 200)]

for start, end in ranges:
    b_sum = sum(x[1] for x in gap_by_n if start <= x[0] <= end)
    o_sum = sum(x[2] for x in gap_by_n if start <= x[0] <= end)
    gap_sum = b_sum - o_sum
    pct = gap_sum / total_gap * 100 if total_gap > 0 else 0
    print(f"N={start:>3}-{end:>3}: Baseline={b_sum:>8.4f}, Overlap={o_sum:>8.4f}, Gap={gap_sum:>8.4f} ({pct:>5.1f}%)")


Score breakdown by N range:
N=  1- 10: Baseline=  4.3291, Overlap=  2.7103, Gap=  1.6188 ( 53.8%)
N= 11- 20: Baseline=  3.7287, Overlap=  2.8090, Gap=  0.9197 ( 30.6%)
N= 21- 30: Baseline=  3.6895, Overlap=  3.3341, Gap=  0.3554 ( 11.8%)
N= 31- 50: Baseline=  7.2977, Overlap=  7.1873, Gap=  0.1104 (  3.7%)
N= 51-100: Baseline= 17.6641, Overlap= 17.6612, Gap=  0.0029 (  0.1%)
N=101-200: Baseline= 34.0252, Overlap= 34.0252, Gap=  0.0000 (  0.0%)


In [5]:
# Key insight: The gap is concentrated in small N values
# Let's see what the overlapping configs look like for the top N values

print("\nAnalysis of overlapping configurations for top gap N values:")
print("="*70)

from numba import njit

@njit
def pip_numba(px, py, poly_x, poly_y):
    n = len(poly_x)
    inside = False
    j = n - 1
    for i in range(n):
        if ((poly_y[i] > py) != (poly_y[j] > py)) and \
           (px < (poly_x[j] - poly_x[i]) * (py - poly_y[i]) / (poly_y[j] - poly_y[i]) + poly_x[i]):
            inside = not inside
        j = i
    return inside

@njit
def seg_intersect(ax, ay, bx, by, cx, cy, dx, dy):
    d1 = (dx-cx)*(ay-cy) - (dy-cy)*(ax-cx)
    d2 = (dx-cx)*(by-cy) - (dy-cy)*(bx-cx)
    d3 = (bx-ax)*(cy-ay) - (by-ay)*(cx-ax)
    d4 = (bx-ax)*(dy-ay) - (by-ay)*(dx-ax)
    return ((d1 > 0) != (d2 > 0)) and ((d3 > 0) != (d4 > 0))

@njit
def polygons_overlap(px1, py1, px2, py2):
    n1, n2 = len(px1), len(px2)
    if max(px1) < min(px2) or max(px2) < min(px1):
        return False
    if max(py1) < min(py2) or max(py2) < min(py1):
        return False
    for i in range(n1):
        if pip_numba(px1[i], py1[i], px2, py2):
            return True
    for i in range(n2):
        if pip_numba(px2[i], py2[i], px1, py1):
            return True
    for i in range(n1):
        ni = (i + 1) % n1
        for j in range(n2):
            nj = (j + 1) % n2
            if seg_intersect(px1[i], py1[i], px1[ni], py1[ni],
                           px2[j], py2[j], px2[nj], py2[nj]):
                return True
    return False

def count_overlaps(df, n):
    rows = df[df['id'].str.startswith(f'{n:03d}_')]
    trees = []
    for _, row in rows.iterrows():
        x = float(str(row['x']).replace('s', ''))
        y = float(str(row['y']).replace('s', ''))
        deg = float(str(row['deg']).replace('s', ''))
        trees.append((x, y, deg))
    
    overlaps = 0
    for i in range(len(trees)):
        px1, py1 = get_tree_polygon(*trees[i])
        for j in range(i+1, len(trees)):
            px2, py2 = get_tree_polygon(*trees[j])
            if polygons_overlap(px1, py1, px2, py2):
                overlaps += 1
    return overlaps

print(f"{'N':>4} {'Gap':>10} {'Overlaps':>10}")
print("-"*30)
for n, b, o, gap in gap_by_n[:15]:
    overlaps = count_overlaps(overlap_df, n)
    print(f"{n:>4} {gap:>10.6f} {overlaps:>10}")


Analysis of overlapping configurations for top gap N values:
   N        Gap   Overlaps
------------------------------


   7   0.242429         21
   6   0.225985         15
  10   0.211719         41
   9   0.209402         34
   5   0.204155         10
   8   0.197844         27
   4   0.189308          6
  12   0.183566         48
  15   0.168497         61
  17   0.162782         55
   3   0.137973          3
  11   0.118070         30
  18   0.094811         36
  22   0.080941         58
  23   0.078445         48


In [6]:
# Key finding: ALL the improvement comes from overlapping configurations
# The question is: Can we find VALID configurations that are better than baseline?

print("\n" + "="*70)
print("CRITICAL FINDING:")
print("="*70)
print("")
print("1. The baseline (70.734327) is the best VALID configuration available")
print("2. The overlapping CSV (67.727) achieves better scores by allowing overlaps")
print("3. ALL 3.0 points of improvement are locked behind overlaps")
print("")
print("The question is: Are there VALID configurations that beat the baseline?")
print("")
print("Options:")
print("1. The baseline IS the global optimum for valid configurations")
print("2. There exist better valid configurations that we haven't found")
print("")
print("Evidence for option 1:")
print("- 9 experiments with various approaches all achieved 70.734327")
print("- 100,000 random attempts per N found no improvement")
print("- C++ optimizer with 100K iterations, 32 restarts found no improvement")
print("")
print("Evidence for option 2:")
print("- The target (68.931058) IS on the leaderboard")
print("- Top teams must have found better valid configurations")
print("- We may be missing a key technique")


CRITICAL FINDING:

1. The baseline (70.734327) is the best VALID configuration available
2. The overlapping CSV (67.727) achieves better scores by allowing overlaps
3. ALL 3.0 points of improvement are locked behind overlaps

The question is: Are there VALID configurations that beat the baseline?

Options:
1. The baseline IS the global optimum for valid configurations
2. There exist better valid configurations that we haven't found

Evidence for option 1:
- 9 experiments with various approaches all achieved 70.734327
- 100,000 random attempts per N found no improvement
- C++ optimizer with 100K iterations, 32 restarts found no improvement

Evidence for option 2:
- The target (68.931058) IS on the leaderboard
- Top teams must have found better valid configurations
- We may be missing a key technique


In [7]:
# Let's check what the target score implies
target = 68.931058
baseline = 70.734327
gap_to_target = baseline - target

print(f"\nTarget analysis:")
print(f"Baseline:      {baseline:.6f}")
print(f"Target:        {target:.6f}")
print(f"Gap to target: {gap_to_target:.6f}")
print(f"")
print(f"If the gap is distributed across all N values:")
print(f"  Average improvement per N: {gap_to_target / 200:.6f}")
print(f"")
print(f"If the gap is concentrated in small N (1-32):")
print(f"  Average improvement per N: {gap_to_target / 32:.6f}")
print(f"")
print(f"The overlapping CSV achieves {overlap_total:.6f}")
print(f"This is {target - overlap_total:.6f} BELOW the target")
print(f"")
print(f"This means: Even with overlaps, we're 1.2 points below target!")
print(f"The target IS achievable with valid configurations.")


Target analysis:
Baseline:      70.734327
Target:        68.931058
Gap to target: 1.803269

If the gap is distributed across all N values:
  Average improvement per N: 0.009016

If the gap is concentrated in small N (1-32):
  Average improvement per N: 0.056352

The overlapping CSV achieves 67.727119
This is 1.203939 BELOW the target

This means: Even with overlaps, we're 1.2 points below target!
The target IS achievable with valid configurations.


In [8]:
# What techniques haven't we tried?
print("\nTechniques we've tried:")
print("="*70)
print("1. Baseline from pre-optimized CSV")
print("2. C++ tree_packer_v21 optimizer")
print("3. Python SA with collision constraints")
print("4. Structured grid packing")
print("5. Overlap repair with SA")
print("6. High-param C++ optimization (20K iter, 10 restarts)")
print("7. SA with translations (jiweiliu approach)")
print("8. Multi-generation C++ (100K iter, 32 restarts)")
print("9. Targeted N optimization with novel approaches")
print("")
print("Techniques we HAVEN'T tried:")
print("="*70)
print("1. Crystalline/lattice packing for large N (>= 58)")
print("2. Asymmetric packing strategies")
print("3. Genetic algorithm with crossover between configurations")
print("4. Beam search with backtracking")
print("5. Different starting configurations (not just baseline)")
print("6. Per-N specialized optimization with different algorithms")
print("7. Ensemble from MULTIPLE external sources (not just local)")
print("8. TPU/GPU-accelerated massive parallel search")
print("9. Mathematical optimization (LP/MIP formulations)")


Techniques we've tried:
1. Baseline from pre-optimized CSV
2. C++ tree_packer_v21 optimizer
3. Python SA with collision constraints
4. Structured grid packing
5. Overlap repair with SA
6. High-param C++ optimization (20K iter, 10 restarts)
7. SA with translations (jiweiliu approach)
8. Multi-generation C++ (100K iter, 32 restarts)
9. Targeted N optimization with novel approaches

Techniques we HAVEN'T tried:
1. Crystalline/lattice packing for large N (>= 58)
2. Asymmetric packing strategies
3. Genetic algorithm with crossover between configurations
4. Beam search with backtracking
5. Different starting configurations (not just baseline)
6. Per-N specialized optimization with different algorithms
7. Ensemble from MULTIPLE external sources (not just local)
8. TPU/GPU-accelerated massive parallel search
9. Mathematical optimization (LP/MIP formulations)


In [9]:
# Check if there are any external datasets we haven't explored
print("\nSearching for additional external datasets...")
print("="*70)

# List all CSV files in snapshots
all_csvs = glob.glob('/home/nonroot/snapshots/santa-2025/**/*.csv', recursive=True)
print(f"Total CSV files found: {len(all_csvs)}")

# Score each one
results = []
for fp in all_csvs:
    try:
        df = pd.read_csv(fp)
        if 'id' not in df.columns or 'x' not in df.columns:
            continue
        total = sum(score_n(df, n) for n in range(1, 201))
        results.append((total, fp))
    except:
        pass

results.sort()
print(f"\nTop 10 CSV files by score:")
for score, fp in results[:10]:
    print(f"{score:.6f}  {fp.split('/')[-1]}")


Searching for additional external datasets...
Total CSV files found: 261



Top 10 CSV files by score:
67.727119  submission_v21.csv
67.772662  ensemble_submission.csv
67.772662  submission.csv
70.734327  ensemble_best.csv
70.734327  gen1.csv
70.734327  submission_aggressive.csv
70.734327  submission_final.csv
70.734327  submission_frac.csv
70.734327  candidate_001.csv
70.734327  submission.csv
