# Loop 4 Analysis: Breaking the Local Optimum

## Key Insight from Evaluator
The evaluator correctly identified that we're STUCK at a local optimum:
- 4 experiments have produced only 0.000430 total improvement
- Gap to target is 1.78 points (2.59%)
- Local optimization CANNOT close this gap

## What We Need to Do Differently
1. STOP running local optimizers (SA, bbox3, eazy)
2. Find FUNDAMENTALLY DIFFERENT configurations
3. Research what approaches haven't been tried

In [1]:
import pandas as pd
import numpy as np
from decimal import Decimal, getcontext
from shapely.geometry import Polygon
from shapely import affinity
import json
import os

getcontext().prec = 25

# Tree shape
TX = [0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125]
TY = [0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5]

print('Analysis notebook initialized')

Analysis notebook initialized


## Experiment History Analysis

Let's look at what we've tried and why it's not working:

In [2]:
# Load session state
with open('/home/code/session_state.json', 'r') as f:
    state = json.load(f)

print('Experiment History:')
print('=' * 60)
for exp in state['experiments']:
    print(f"\n{exp['id']}: {exp['name']}")
    print(f"  Score: {exp['score']:.6f}")
    print(f"  Notes: {exp['notes'][:200]}...")

print('\n' + '=' * 60)
print(f"\nTotal experiments: {len(state['experiments'])}")
print(f"Best score: {min(e['score'] for e in state['experiments']):.6f}")
print(f"Target: 68.891380")
print(f"Gap: {min(e['score'] for e in state['experiments']) - 68.891380:.6f}")

Experiment History:

exp_000: 001_baseline
  Score: 70.676102
  Notes: Baseline established using pre-optimized santa-2025.csv from snapshots. Score: 70.676102, Target: 68.894234, Gap: 1.78 points (2.59%). No overlaps detected in validation. Score breakdown: N=1-10 contr...

exp_001: 002_tessellation_attempts
  Score: 70.676102
  Notes: Attempted multiple approaches to improve baseline: 1) Tessellation approach for large N - produced WORSE results (0.75 vs 0.35 for N=100) because simple grid spacing is not optimal. 2) Ensemble from 3...

exp_002: 003_eazy_optimizer
  Score: 70.676059
  Notes: Ran jazivxt eazy optimizer with Square Calculus Pressure, Elastic Pulse, and Complex Orbital Move. Used 3 scale phases (1e-3, 1e-5, 1e-7) with 5 seconds per N value. Found SMALL but VALID improvement:...

exp_003: 004_eazy_long_run
  Score: 70.675672
  Notes: Ran eazy optimizer with LONGER time per N (30s then 60s). Multiple phases with scales 1e-3, 1e-5, 1e-7. Total improvement from baseline: 0.0

## Key Observation: All Experiments Converge to Same Score

The experiments show:
- exp_000: 70.676102 (baseline)
- exp_001: 70.676102 (tessellation attempts - NO improvement)
- exp_002: 70.676059 (eazy optimizer - 0.000043 improvement)
- exp_003: 70.675672 (eazy longer - 0.000430 total improvement)

**This is the LOCAL OPTIMUM TRAP!**

All approaches converge to ~70.676 because they're all optimizing the SAME configurations.

In [3]:
# Calculate improvement rate
baseline = 70.676102
best = 70.675672
improvement = baseline - best
gap = best - 68.891380

print(f'Total improvement so far: {improvement:.6f}')
print(f'Gap to target: {gap:.6f}')
print(f'Improvement as % of gap: {improvement/gap*100:.4f}%')
print(f'\nAt this rate, we would need {gap/improvement:.0f}x more optimization')
print(f'That\'s {gap/improvement * 1.5:.0f} hours of optimization (at 1.5 hours per 0.000430 improvement)')
print('\n>>> THIS IS NOT VIABLE! Need fundamentally different approach.')

Total improvement so far: 0.000430
Gap to target: 1.784292
Improvement as % of gap: 0.0241%

At this rate, we would need 4150x more optimization
That's 6224 hours of optimization (at 1.5 hours per 0.000430 improvement)

>>> THIS IS NOT VIABLE! Need fundamentally different approach.


## What the Target Score Tells Us

The target of 68.891380 IS achievable (it's on the leaderboard). This means:
1. Better configurations EXIST
2. They can't be found by local search from current starting point
3. We need to find them through DIFFERENT means

## Approaches That Could Work:

### 1. Random Restarts with Different Initial Configurations
- Generate completely random initial placements
- Run SA from each
- Keep the best result
- This explores different basins of attraction

### 2. Genetic Algorithm with Population Diversity
- Maintain population of diverse solutions
- Crossover: swap partial configurations between solutions
- Mutation: random perturbations
- Selection: keep best + diverse solutions

### 3. Constructive Heuristics
- Build solutions tree-by-tree from scratch
- Bottom-left placement with rotation search
- Greedy with backtracking

### 4. External Data Sources
- The jonathanchan kernel uses 15+ external sources
- These may contain better configurations we haven't found

In [4]:
# Check what external data sources are available
import os

print('Checking for external CSV sources...')

# Check snapshots
snapshot_dir = '/home/nonroot/snapshots/santa-2025'
if os.path.exists(snapshot_dir):
    snapshots = os.listdir(snapshot_dir)
    print(f'\nFound {len(snapshots)} snapshot directories')
    
# Check for any pre-optimized CSVs
preopt_dir = '/home/nonroot/snapshots/santa-2025/21116303805/code/preoptimized'
if os.path.exists(preopt_dir):
    print(f'\nPreoptimized directory contents:')
    for item in os.listdir(preopt_dir):
        item_path = os.path.join(preopt_dir, item)
        if os.path.isdir(item_path):
            files = os.listdir(item_path)
            print(f'  {item}/: {len(files)} files')
        else:
            print(f'  {item}')

Checking for external CSV sources...

Found 105 snapshot directories

Preoptimized directory contents:
  bbox3
  ensemble.csv
  telegram/: 4 files
  santa25-public/: 16 files
  santa-2025-csv/: 2 files
  bucket-of-chump/: 3 files
  submission.csv
  blended/: 1 files
  santa-2025-try3/: 6 files
  santa-2025.csv
  santa-2025-try3.zip
  submission visualization.pdf
  bucket-of-chump.zip
  santa25-public.zip
  best_ensemble.csv
  chistyakov/: 2 files
  santa-2025-submission-blended-i.zip
  santa-2025-csv.zip


## Critical Insight: The Evaluator's Recommendations

The evaluator suggested:
1. **Run sa_v1_parallel for HOURS** - but this is still local optimization
2. **Implement proper tessellation** - we tried this, it was WORSE
3. **Random restarts** - THIS IS THE KEY!
4. **Backward propagation** - we tried this, no improvement

## The Real Problem

The baseline is already at a GLOBAL optimum for its configuration TYPE.
To beat it, we need a DIFFERENT configuration type.

## What Makes a "Different" Configuration?

1. **Different rotation angles** - not just 63°/243° (blue/pink)
2. **Different packing patterns** - not just the standard tessellation
3. **Asymmetric layouts** - discussions suggest asymmetric beats symmetric

## Next Steps

1. **Analyze what angles/patterns the target score uses**
2. **Try completely random initial configurations**
3. **Implement genetic algorithm with crossover**
4. **Search for external datasets with better solutions**

In [5]:
# Let's analyze the current best solution to understand its structure
df = pd.read_csv('/home/code/experiments/003_long_sa/submission_eazy_longer.csv')

print('Current best solution structure:')
print(f'Total rows: {len(df)}')

# Parse angles
df['n'] = df['id'].apply(lambda x: int(x.split('_')[0]))
df['deg_val'] = df['deg'].apply(lambda x: float(str(x).replace('s', '')))

# Analyze angle distribution for large N
for n in [100, 144, 196, 200]:
    n_data = df[df['n'] == n]
    angles = n_data['deg_val'].values
    print(f'\nN={n}: {len(angles)} trees')
    print(f'  Angle range: [{angles.min():.1f}, {angles.max():.1f}]')
    
    # Count angles in bins
    bins = np.histogram(angles, bins=8)[0]
    print(f'  Angle distribution: {bins}')

Current best solution structure:
Total rows: 20100

N=100: 100 trees
  Angle range: [66.3, 786.1]
  Angle distribution: [49  0 50  0  0  0  0  1]

N=144: 144 trees
  Angle range: [-5693.3, 247.6]
  Angle distribution: [  1   0   0   0   0   0   0 143]

N=196: 196 trees
  Angle range: [66.4, 440.2]
  Angle distribution: [68  7  0  1 90  7  0 23]

N=200: 200 trees
  Angle range: [76.7, 293.7]
  Angle distribution: [96  0  0  0  0  0 97  7]


In [6]:
# Key insight: What would a DIFFERENT configuration look like?

# The current solution uses ~63° and ~243° (blue/pink pattern)
# What if we tried:
# 1. All trees at 0° (upright)
# 2. All trees at 90° (sideways)
# 3. Random angles
# 4. Different tessellation patterns

print('Potential alternative configurations to try:')
print('=' * 50)
print('1. All upright (0°) - simple baseline')
print('2. All sideways (90°) - different packing')
print('3. Random angles - explore solution space')
print('4. Hexagonal pattern - different tessellation')
print('5. Diagonal pattern - asymmetric layout')
print('\nThe key is to find configurations that are NOT')
print('reachable by local optimization from current solution.')

Potential alternative configurations to try:
1. All upright (0°) - simple baseline
2. All sideways (90°) - different packing
3. Random angles - explore solution space
4. Hexagonal pattern - different tessellation
5. Diagonal pattern - asymmetric layout

The key is to find configurations that are NOT
reachable by local optimization from current solution.


In [None]:
# Check scores of all preoptimized CSVs
import glob

def score_csv(csv_path):
    """Calculate total score for a submission CSV"""
    try:
        df = pd.read_csv(csv_path)
        df['n'] = df['id'].apply(lambda x: int(x.split('_')[0]))
        df['x_val'] = df['x'].apply(lambda x: float(str(x).replace('s', '')))
        df['y_val'] = df['y'].apply(lambda x: float(str(x).replace('s', '')))
        df['deg_val'] = df['deg'].apply(lambda x: float(str(x).replace('s', '')))
        
        total_score = 0
        for n in range(1, 201):
            n_data = df[df['n'] == n]
            if len(n_data) != n:
                return None  # Invalid
            
            # Calculate bounding box
            all_x, all_y = [], []
            for _, row in n_data.iterrows():
                rad = np.radians(row['deg_val'])
                c, s = np.cos(rad), np.sin(rad)
                for j in range(len(TX)):
                    x = TX[j] * c - TY[j] * s + row['x_val']
                    y = TX[j] * s + TY[j] * c + row['y_val']
                    all_x.append(x)
                    all_y.append(y)
            
            side = max(max(all_x) - min(all_x), max(all_y) - min(all_y))
            total_score += side**2 / n
        
        return total_score
    except Exception as e:
        return None

# Score all preoptimized CSVs
preopt_dir = '/home/nonroot/snapshots/santa-2025/21116303805/code/preoptimized'
csv_scores = []

for root, dirs, files in os.walk(preopt_dir):
    for f in files:
        if f.endswith('.csv'):
            path = os.path.join(root, f)
            score = score_csv(path)
            if score:
                csv_scores.append((path, score))
                print(f'{score:.6f}: {os.path.relpath(path, preopt_dir)}')