# Loop 4 Analysis: Finding Novel Approaches to Beat 68.919

## Current Situation
- Best CV/LB: 70.676102 (validated, no overlaps)
- Target: 68.919154
- Gap: 1.76 points (2.49%)

## What's Been Tried (All Failed)
1. bbox3 optimizer - 80 rounds, 0 improvement
2. tree_packer_v21 - 0 improvement
3. Fractional translation - 0.000000003 improvement on N=128 only
4. sa_v1_parallel - 4 generations, 0 improvement
5. Ensemble from 731 CSV files - best 51.42 but 168/200 overlapping groups

## Key Insight
The baseline is at an EXTREMELY tight local optimum. Standard optimization cannot escape it.

## Novel Approaches to Explore
1. **Rebuild from corners** (chistyakov kernel) - Extract smaller layouts from larger ones
2. **Grid-based SA with deletion cascade** (jiweiliu kernel) - Different starting point
3. **Symmetric solutions** - Discussed in forums as key to winning
4. **Analytical solutions for small N** - N=1,2,3 may have closed-form optima

In [1]:
import pandas as pd
import numpy as np
from shapely.geometry import Polygon
from shapely.ops import unary_union
import matplotlib.pyplot as plt

# Tree geometry
TX = np.array([0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125])
TY = np.array([0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5])

def parse_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return val[1:]
    return str(val)

def build_polygon(x, y, angle):
    angle_rad = float(angle) * np.pi / 180.0
    cos_a = np.cos(angle_rad)
    sin_a = np.sin(angle_rad)
    vertices = [(TX[i] * cos_a - TY[i] * sin_a + float(x),
                 TX[i] * sin_a + TY[i] * cos_a + float(y)) for i in range(15)]
    return Polygon(vertices)

def get_score_for_n(df, n):
    prefix = f"{n:03d}_"
    rows = df[df['id'].str.startswith(prefix)]
    if len(rows) != n:
        return float('inf'), None
    
    all_points = []
    for _, row in rows.iterrows():
        x = float(parse_value(row['x']))
        y = float(parse_value(row['y']))
        deg = float(parse_value(row['deg']))
        poly = build_polygon(x, y, deg)
        all_points.extend(list(poly.exterior.coords))
    
    all_points = np.array(all_points)
    side = max(all_points.max(axis=0) - all_points.min(axis=0))
    return (side ** 2) / n, rows

print('Functions loaded')

Functions loaded


In [2]:
# Load baseline
baseline_df = pd.read_csv('/home/code/submission_candidates/candidate_000.csv')

# Calculate per-N scores
scores = {}
for n in range(1, 201):
    score, _ = get_score_for_n(baseline_df, n)
    scores[n] = score

total_score = sum(scores.values())
print(f'Total baseline score: {total_score:.6f}')
print(f'Target: 68.919154')
print(f'Gap: {total_score - 68.919154:.6f}')

# Find N values with highest scores (most room for improvement)
scores_sorted = sorted(scores.items(), key=lambda x: -x[1])
print('\nTop 20 N values by score contribution:')
for n, s in scores_sorted[:20]:
    print(f'  N={n:3d}: {s:.6f}')

Total baseline score: 70.676102
Target: 68.919154
Gap: 1.756948

Top 20 N values by score contribution:
  N=  1: 0.661250
  N=  2: 0.450779
  N=  3: 0.434745
  N=  5: 0.416850
  N=  4: 0.416545
  N=  7: 0.399897
  N=  6: 0.399610
  N=  9: 0.387415
  N=  8: 0.385407
  N= 15: 0.379203
  N= 10: 0.376630
  N= 21: 0.376451
  N= 20: 0.376057
  N= 11: 0.375736
  N= 22: 0.375258
  N= 16: 0.374128
  N= 26: 0.373997
  N= 12: 0.372724
  N= 13: 0.372323
  N= 25: 0.372144


In [3]:
# Calculate efficiency for each N
# Efficiency = (side_length^2 / n) / (theoretical_minimum)
# For a single tree, theoretical minimum is when tree is rotated 45 degrees
# Tree dimensions: width=0.7, height=1.0
# At 45 degrees: diagonal = sqrt(0.7^2 + 1.0^2) = 1.22

efficiencies = {}
for n, score in scores.items():
    # score = side^2 / n, so side = sqrt(score * n)
    side = np.sqrt(score * n)
    # Theoretical minimum for n trees packed perfectly
    # For large n, approaches sqrt(n * tree_area) where tree_area ~ 0.5
    tree_area = 0.5  # approximate
    theoretical_side = np.sqrt(n * tree_area)
    efficiency = theoretical_side / side if side > 0 else 0
    efficiencies[n] = efficiency

print('Efficiency by N (higher = better packed):')
for n in [1, 2, 3, 4, 5, 10, 20, 50, 100, 150, 200]:
    print(f'  N={n:3d}: efficiency={efficiencies[n]:.4f}, score={scores[n]:.6f}')

Efficiency by N (higher = better packed):
  N=  1: efficiency=0.8696, score=0.661250
  N=  2: efficiency=1.0532, score=0.450779
  N=  3: efficiency=1.0724, score=0.434745
  N=  4: efficiency=1.0956, score=0.416545
  N=  5: efficiency=1.0952, score=0.416850
  N= 10: efficiency=1.1522, score=0.376630
  N= 20: efficiency=1.1531, score=0.376057
  N= 50: efficiency=1.1773, score=0.360753
  N=100: efficiency=1.2029, score=0.345531
  N=150: efficiency=1.2179, score=0.337065
  N=200: efficiency=1.2167, score=0.337731


In [4]:
# Analyze what improvement is needed per N to reach target
target = 68.919154
current = total_score
gap = current - target

print(f'Total gap to close: {gap:.6f}')
print(f'Average gap per N: {gap/200:.6f}')

# If we could improve each N by the same percentage
required_reduction = gap / current
print(f'Required reduction: {required_reduction*100:.2f}%')

# What if we focus on small N values?
small_n_score = sum(scores[n] for n in range(1, 21))
print(f'\nScore from N=1-20: {small_n_score:.6f} ({small_n_score/current*100:.1f}% of total)')
print(f'If we improve N=1-20 by 50%: saves {small_n_score*0.5:.6f}')

medium_n_score = sum(scores[n] for n in range(21, 101))
print(f'\nScore from N=21-100: {medium_n_score:.6f} ({medium_n_score/current*100:.1f}% of total)')

large_n_score = sum(scores[n] for n in range(101, 201))
print(f'Score from N=101-200: {large_n_score:.6f} ({large_n_score/current*100:.1f}% of total)')

Total gap to close: 1.756948
Average gap per N: 0.008785
Required reduction: 2.49%

Score from N=1-20: 8.057295 (11.4% of total)
If we improve N=1-20 by 50%: saves 4.028647

Score from N=21-100: 28.626026 (40.5% of total)
Score from N=101-200: 33.992781 (48.1% of total)


In [5]:
# Try the "rebuild from corners" approach from chistyakov kernel
# For each large N, extract smaller layouts by taking trees closest to each corner

def get_trees_for_n(df, n):
    """Get tree data for group N"""
    prefix = f"{n:03d}_"
    rows = df[df['id'].str.startswith(prefix)]
    trees = []
    for _, row in rows.iterrows():
        x = float(parse_value(row['x']))
        y = float(parse_value(row['y']))
        deg = float(parse_value(row['deg']))
        trees.append({'x': x, 'y': y, 'deg': deg, 'polygon': build_polygon(x, y, deg)})
    return trees

def get_side_length(trees):
    """Get bounding box side length for a list of trees"""
    all_points = []
    for t in trees:
        all_points.extend(list(t['polygon'].exterior.coords))
    all_points = np.array(all_points)
    return max(all_points.max(axis=0) - all_points.min(axis=0))

def rebuild_from_corners(large_n_trees, target_n):
    """Extract target_n trees from large_n_trees by taking trees closest to each corner"""
    # Get bounding box
    all_points = []
    for t in large_n_trees:
        all_points.extend(list(t['polygon'].exterior.coords))
    all_points = np.array(all_points)
    bounds = (all_points.min(axis=0)[0], all_points.min(axis=0)[1],
              all_points.max(axis=0)[0], all_points.max(axis=0)[1])
    
    corners = [
        (bounds[0], bounds[1]),  # bottom-left
        (bounds[0], bounds[3]),  # top-left
        (bounds[2], bounds[1]),  # bottom-right
        (bounds[2], bounds[3]),  # top-right
    ]
    
    best_side = float('inf')
    best_trees = None
    
    for corner_x, corner_y in corners:
        # Calculate max distance to corner for each tree
        distances = []
        for i, t in enumerate(large_n_trees):
            b = t['polygon'].bounds
            dist = max(
                abs(b[0] - corner_x), abs(b[2] - corner_x),
                abs(b[1] - corner_y), abs(b[3] - corner_y)
            )
            distances.append((dist, i))
        
        # Sort by distance and take closest target_n trees
        distances.sort()
        selected_indices = [idx for _, idx in distances[:target_n]]
        selected_trees = [large_n_trees[i] for i in selected_indices]
        
        side = get_side_length(selected_trees)
        if side < best_side:
            best_side = side
            best_trees = selected_trees
    
    return best_trees, best_side

print('Rebuild from corners function defined')

Rebuild from corners function defined


In [6]:
# Test rebuild from corners on a few large N values
print('Testing rebuild from corners approach...')
print('='*60)

improvements = []

for large_n in [50, 100, 150, 200]:
    large_trees = get_trees_for_n(baseline_df, large_n)
    
    for target_n in range(1, min(large_n, 50)):
        current_score = scores[target_n]
        
        rebuilt_trees, rebuilt_side = rebuild_from_corners(large_trees, target_n)
        rebuilt_score = (rebuilt_side ** 2) / target_n
        
        if rebuilt_score < current_score - 1e-9:
            improvement = current_score - rebuilt_score
            improvements.append((target_n, large_n, improvement, current_score, rebuilt_score))
            print(f'N={target_n} from N={large_n}: {current_score:.6f} -> {rebuilt_score:.6f} (improvement: {improvement:.6f})')

if not improvements:
    print('No improvements found with rebuild from corners')
else:
    total_improvement = sum(imp[2] for imp in improvements)
    print(f'\nTotal potential improvement: {total_improvement:.6f}')

Testing rebuild from corners approach...


No improvements found with rebuild from corners


In [7]:
# Analyze the structure of optimal solutions for small N
# N=1: Single tree - optimal is 45-degree rotation
# N=2: Two trees - what's the optimal arrangement?

print('Analyzing N=1 configuration:')
trees_n1 = get_trees_for_n(baseline_df, 1)
print(f'  Position: ({trees_n1[0]["x"]:.6f}, {trees_n1[0]["y"]:.6f})')
print(f'  Angle: {trees_n1[0]["deg"]:.6f} degrees')
print(f'  Score: {scores[1]:.6f}')

# For N=1, optimal angle is 45 degrees
# Tree at 45 degrees has bounding box of sqrt(0.7^2 + 1.0^2) = 1.22
optimal_n1_side = np.sqrt(0.7**2 + 1.0**2)
optimal_n1_score = optimal_n1_side**2 / 1
print(f'  Theoretical optimal score: {optimal_n1_score:.6f}')
print(f'  Gap: {scores[1] - optimal_n1_score:.6f}')

print('\nAnalyzing N=2 configuration:')
trees_n2 = get_trees_for_n(baseline_df, 2)
for i, t in enumerate(trees_n2):
    print(f'  Tree {i}: ({t["x"]:.6f}, {t["y"]:.6f}), angle={t["deg"]:.6f}')
print(f'  Score: {scores[2]:.6f}')

Analyzing N=1 configuration:
  Position: (-48.196086, 58.770985)
  Angle: 45.000000 degrees
  Score: 0.661250
  Theoretical optimal score: 1.490000
  Gap: -0.828750

Analyzing N=2 configuration:
  Tree 0: (0.154097, -0.038541), angle=203.629378
  Tree 1: (-0.154097, -0.561459), angle=23.629378
  Score: 0.450779


In [8]:
# Check what's in the snapshots - are there any solutions with different approaches?
import glob
import os

snapshot_dirs = glob.glob('/home/nonroot/snapshots/santa-2025/*/')
print(f'Found {len(snapshot_dirs)} snapshot directories')

# Look for any solutions with significantly different scores
best_per_n = {n: scores[n] for n in range(1, 201)}

for snap_dir in snapshot_dirs[:10]:  # Check first 10
    csv_files = glob.glob(os.path.join(snap_dir, 'code/**/*.csv'), recursive=True)
    for csv_path in csv_files:
        try:
            df = pd.read_csv(csv_path)
            if 'id' not in df.columns or len(df) != 20100:
                continue
            
            # Check a few N values
            for n in [1, 2, 3, 5, 10]:
                score, _ = get_score_for_n(df, n)
                if score < best_per_n[n] - 0.001:
                    print(f'{csv_path}: N={n} score={score:.6f} (baseline={best_per_n[n]:.6f})')
        except:
            continue

print('\nSnapshot analysis complete')

Found 52 snapshot directories



Snapshot analysis complete


In [9]:
# Summary of findings
print('='*60)
print('SUMMARY OF ANALYSIS')
print('='*60)
print(f'\nCurrent score: {total_score:.6f}')
print(f'Target: 68.919154')
print(f'Gap: {gap:.6f} ({gap/current*100:.2f}%)')

print('\nKey findings:')
print('1. Baseline is at extremely tight local optimum')
print('2. Standard optimizers (bbox3, SA, fractional translation) cannot improve it')
print('3. Rebuild from corners approach may find some improvements')
print('4. Small N values (1-20) contribute disproportionately to score')
print('5. Need fundamentally different approach to close 1.76 point gap')

print('\nRecommended next steps:')
print('1. Implement grid-based SA with deletion cascade (jiweiliu kernel)')
print('2. Try symmetric solution search for small N')
print('3. Explore analytical solutions for N=1,2,3')
print('4. Run very long optimization from random starting points')

SUMMARY OF ANALYSIS

Current score: 70.676102
Target: 68.919154
Gap: 1.756948 (2.49%)

Key findings:
1. Baseline is at extremely tight local optimum
2. Standard optimizers (bbox3, SA, fractional translation) cannot improve it
3. Rebuild from corners approach may find some improvements
4. Small N values (1-20) contribute disproportionately to score
5. Need fundamentally different approach to close 1.76 point gap

Recommended next steps:
1. Implement grid-based SA with deletion cascade (jiweiliu kernel)
2. Try symmetric solution search for small N
3. Explore analytical solutions for N=1,2,3
4. Run very long optimization from random starting points
