# Loop 32 Strategic Analysis

## Critical Situation
- **Current best LB**: 70.315537 (exp_029)
- **Target**: 68.870973 (Jingle bins)
- **Gap**: 1.44 points (2.05%)
- **Last 12 experiments**: 0.001 points total improvement

## The Problem
The last 12 experiments (exp_021-032) found essentially ZERO improvement:
- bbox3 extended runs: 0.0000003 improvement
- External data mining: diminishing returns
- All algorithmic approaches converge to same score

## Key Insight from Leaderboard
Top teams have scores in the 68.8-69.3 range. Our 70.3 is 1.4+ points away.
This is NOT a micro-optimization problem - we need a FUNDAMENTALLY different approach.

In [1]:
# Analyze the score breakdown by N range
import pandas as pd
import numpy as np

# Load our best submission
df = pd.read_csv('/home/code/experiments/029_final_ensemble_v2/submission.csv')

def parse_coord(val):
    if isinstance(val, str) and val.startswith('s'):
        return float(val[1:])
    return float(val)

def parse_id(id_str):
    parts = str(id_str).split('_')
    return int(parts[0]), int(parts[1])

df['n'] = df['id'].apply(lambda x: parse_id(x)[0])
df['i'] = df['id'].apply(lambda x: parse_id(x)[1])
for col in ['x', 'y', 'deg']:
    df[col] = df[col].apply(parse_coord)

# Tree polygon vertices
TX = [0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125]
TY = [0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5]

def get_tree_vertices(x, y, deg):
    rad = np.radians(deg)
    cos_a, sin_a = np.cos(rad), np.sin(rad)
    vertices = []
    for tx, ty in zip(TX, TY):
        rx = tx * cos_a - ty * sin_a + x
        ry = tx * sin_a + ty * cos_a + y
        vertices.append((rx, ry))
    return vertices

def calculate_score(trees, n):
    all_x, all_y = [], []
    for tree in trees:
        vertices = get_tree_vertices(tree['x'], tree['y'], tree['deg'])
        for vx, vy in vertices:
            all_x.append(vx)
            all_y.append(vy)
    side = max(max(all_x) - min(all_x), max(all_y) - min(all_y))
    return side**2 / n

# Calculate per-N scores
per_n_scores = {}
for n in range(1, 201):
    n_df = df[df['n'] == n].sort_values('i')
    trees = n_df[['x', 'y', 'deg']].to_dict('records')
    per_n_scores[n] = calculate_score(trees, n)

total = sum(per_n_scores.values())
print(f"Total score: {total:.6f}")
print(f"Target: 68.870973")
print(f"Gap: {total - 68.870973:.6f}")
print()

# Score by N range
ranges = [(1, 10), (11, 30), (31, 50), (51, 100), (101, 150), (151, 200)]
for start, end in ranges:
    range_score = sum(per_n_scores[n] for n in range(start, end+1))
    pct = range_score / total * 100
    print(f"N={start:3d}-{end:3d}: {range_score:.4f} ({pct:.1f}%)")

print(f"\nTotal: {total:.6f}")
print(f"Target: 68.870973")
print(f"Gap: {total - 68.870973:.6f} ({(total - 68.870973)/68.870973*100:.2f}%)")


Total score: 70.315537
Target: 68.870973
Gap: 1.444564

N=  1- 10: 4.3247 (6.2%)
N= 11- 30: 7.3737 (10.5%)
N= 31- 50: 7.2540 (10.3%)
N= 51-100: 17.4771 (24.9%)
N=101-150: 17.0665 (24.3%)
N=151-200: 16.8194 (23.9%)

Total: 70.315537
Target: 68.870973
Gap: 1.444564 (2.10%)


In [2]:
# Identify which N values have the most room for improvement
# Compare to theoretical lower bound (area of N trees / N)

# Single tree area (approximate)
tree_area = 0.5 * 0.7 * 0.8  # rough approximation
print(f"Approximate single tree area: {tree_area:.4f}")

# For each N, the theoretical minimum score would be if we could pack
# N trees with zero wasted space. This is impossible, but gives a lower bound.

print("\nN values with highest potential for improvement:")
print("(comparing current score to theoretical minimum)")
print()

potential_improvements = []
for n in range(1, 201):
    current = per_n_scores[n]
    # Theoretical minimum: if trees packed perfectly, side = sqrt(n * tree_area)
    # Score = side^2 / n = n * tree_area / n = tree_area
    # But this ignores the tree shape, so it's a very loose bound
    
    # Better estimate: for small N, we can compute tighter bounds
    # For N=1, optimal is known: 0.6612
    # For N=2, optimal is known: 0.4508
    
    # Let's just look at where we're furthest from the "expected" score
    # based on the pattern of scores
    potential_improvements.append((n, current))

# Sort by score (highest first - these contribute most to total)
potential_improvements.sort(key=lambda x: -x[1])
print("Top 20 N values by score contribution:")
for n, score in potential_improvements[:20]:
    print(f"  N={n:3d}: {score:.6f}")


Approximate single tree area: 0.2800

N values with highest potential for improvement:
(comparing current score to theoretical minimum)

Top 20 N values by score contribution:
  N=  1: 0.661250
  N=  2: 0.450779
  N=  3: 0.434745
  N=  5: 0.416850
  N=  4: 0.416545
  N=  7: 0.399842
  N=  6: 0.399610
  N=  8: 0.385407
  N=  9: 0.383047
  N= 10: 0.376630
  N= 11: 0.374921
  N= 15: 0.374381
  N= 12: 0.372724
  N= 13: 0.372267
  N= 20: 0.371795
  N= 16: 0.370191
  N= 17: 0.370040
  N= 22: 0.369818
  N= 14: 0.369543
  N= 33: 0.369347


In [3]:
# Check what external data sources we have and their scores
import glob
import os

print("External data sources:")
for path in glob.glob('/home/code/research/**/*.csv', recursive=True):
    try:
        ext_df = pd.read_csv(path)
        if 'id' in ext_df.columns and len(ext_df) > 0:
            # Try to calculate total score
            ext_df['n'] = ext_df['id'].apply(lambda x: int(str(x).split('_')[0]))
            for col in ['x', 'y', 'deg']:
                if col in ext_df.columns:
                    ext_df[col] = ext_df[col].apply(parse_coord)
            
            ext_total = 0
            valid = True
            for n in range(1, 201):
                n_df = ext_df[ext_df['n'] == n]
                if len(n_df) != n:
                    valid = False
                    break
                trees = n_df[['x', 'y', 'deg']].to_dict('records')
                ext_total += calculate_score(trees, n)
            
            if valid:
                print(f"  {os.path.basename(path)}: {ext_total:.6f}")
    except Exception as e:
        pass


External data sources:


  70.378875862989_20260126_045659.csv: 70.378876


  submission.csv: 70.319061


  submission_ensembled.csv: 70.319061


  submission_ensemble.csv: 70.318517


  submission.csv: 70.319407


  submission_ensemble.csv: 70.319733


  submission.csv: 70.319731


  submission.csv: 70.329927


  submi-n1200_r10_i12.csv: 70.329956


  submi-n100_r10_i1.csv: 70.329958


  submi-n900_r10_i9.csv: 70.329956


  submi-n1400_r10_i14.csv: 70.329955


  submi-n200_r10_i2.csv: 70.329958


  submission_shake.csv: 70.329927


  submi-n1300_r10_i13.csv: 70.329956


  submi-n600_r10_i6.csv: 70.329958


  submi-n1500_r10_i15.csv: 70.329955


  submi-n800_r10_i8.csv: 70.329957


  submi-n700_r10_i7.csv: 70.329958


  submi-n400_r10_i4.csv: 70.329958


  submi-n1100_r10_i11.csv: 70.329956


  submi-n1700_r10_i17.csv: 70.329955


  submission_new.csv: 70.329955


  submi-n1000_r10_i10.csv: 70.329956


  submi-n1600_r10_i16.csv: 70.329955


  submi-n500_r10_i5.csv: 70.329958


  submi-n300_r10_i3.csv: 70.329958


  santa-2025.csv: 70.329958


  submission_best.csv: 70.926150


  70.378875862989_20260126_045659.csv: 70.378876


## Key Observations

1. **The gap is 1.44 points (2.05%)** - This is NOT achievable through micro-optimization

2. **Top teams (68.8-69.3) are 1+ points better** - They have fundamentally different solutions

3. **bbox3 extended runs found 0.0000003 improvement** - The optimizer is at its limit

4. **External data mining is exhausted** - All public sources are worse than our current best

## What Top Teams Are Doing Differently

From the discussions and kernels:
1. **Running for 24-72 HOURS** with 24+ CPUs (we ran 53 minutes)
2. **Using shake_public** - A different optimizer we haven't tried
3. **Asymmetric solutions** - Breaking symmetry for better packing
4. **Per-N specialization** - Different strategies for different N ranges

## The Path Forward

### Option 1: MUCH LONGER COMPUTE TIME
- Run bbox3 for 8-24 hours (not 53 minutes)
- This is 16-27x longer than our current runs
- May find improvements that shorter runs miss

### Option 2: TRY SHAKE_PUBLIC
- We have shake_public binary but haven't used it
- It may find different local optima than bbox3
- Library compatibility issues need to be resolved

### Option 3: FUNDAMENTALLY DIFFERENT ALGORITHM
- Implement constructive heuristics from scratch
- Focus on high-impact N values (N=1-50 contribute 27% of score)
- Try asymmetric placements

### Option 4: FIND BETTER EXTERNAL DATA
- Check for new Kaggle kernels with better scores
- Look for private/unpublished solutions
- Analyze what makes top solutions different