# Loop 16 Strategic Analysis

## Current Situation
- Best CV/LB: 70.630478 (from ensemble of 25+ sources)
- Target: 68.919154
- Gap: 1.711 points (2.42%)
- Submissions used: 6/100 (84 remaining)

## Key Findings from 17 Experiments
1. All public solutions converge to ~70.63 (same local optimum)
2. SA optimizers (bbox3, sa_v1_parallel) produce overlapping trees
3. Grid-based approaches (zaburo, tessellation) are fundamentally worse
4. Tree removal found only 1 tiny improvement (0.00001345)
5. Basin hopping, GA, constraint programming - no improvement

In [1]:
# Load current best and analyze per-N scores
import pandas as pd
import numpy as np
from decimal import Decimal, getcontext
from shapely import affinity
from shapely.geometry import Polygon
from shapely.ops import unary_union

getcontext().prec = 25
scale_factor = Decimal("1")

class ChristmasTree:
    def __init__(self, center_x='0', center_y='0', angle='0'):
        self.center_x = Decimal(center_x)
        self.center_y = Decimal(center_y)
        self.angle = Decimal(angle)
        trunk_w = Decimal('0.15')
        trunk_h = Decimal('0.2')
        base_w = Decimal('0.7')
        mid_w = Decimal('0.4')
        top_w = Decimal('0.25')
        tip_y = Decimal('0.8')
        tier_1_y = Decimal('0.5')
        tier_2_y = Decimal('0.25')
        base_y = Decimal('0.0')
        trunk_bottom_y = -trunk_h
        initial_polygon = Polygon([
            (Decimal('0.0') * scale_factor, tip_y * scale_factor),
            (top_w / Decimal('2') * scale_factor, tier_1_y * scale_factor),
            (top_w / Decimal('4') * scale_factor, tier_1_y * scale_factor),
            (mid_w / Decimal('2') * scale_factor, tier_2_y * scale_factor),
            (mid_w / Decimal('4') * scale_factor, tier_2_y * scale_factor),
            (base_w / Decimal('2') * scale_factor, base_y * scale_factor),
            (trunk_w / Decimal('2') * scale_factor, base_y * scale_factor),
            (trunk_w / Decimal('2') * scale_factor, trunk_bottom_y * scale_factor),
            (-(trunk_w / Decimal('2')) * scale_factor, trunk_bottom_y * scale_factor),
            (-(trunk_w / Decimal('2')) * scale_factor, base_y * scale_factor),
            (-(base_w / Decimal('2')) * scale_factor, base_y * scale_factor),
            (-(mid_w / Decimal('4')) * scale_factor, tier_2_y * scale_factor),
            (-(mid_w / Decimal('2')) * scale_factor, tier_2_y * scale_factor),
            (-(top_w / Decimal('4')) * scale_factor, tier_1_y * scale_factor),
            (-(top_w / Decimal('2')) * scale_factor, tier_1_y * scale_factor),
        ])
        rotated = affinity.rotate(initial_polygon, float(self.angle), origin=(0, 0))
        self.polygon = affinity.translate(rotated,
                                          xoff=float(self.center_x * scale_factor),
                                          yoff=float(self.center_y * scale_factor))

def get_tree_list_side_length(tree_list):
    all_polygons = [t.polygon for t in tree_list]
    bounds = unary_union(all_polygons).bounds
    return Decimal(max(bounds[2] - bounds[0], bounds[3] - bounds[1])) / scale_factor

def parse_csv(csv_path):
    result = pd.read_csv(csv_path)
    result['x'] = result['x'].str.strip('s')
    result['y'] = result['y'].str.strip('s')
    result['deg'] = result['deg'].str.strip('s')
    result[['group_id', 'item_id']] = result['id'].str.split('_', n=2, expand=True)
    dict_of_tree_list = {}
    dict_of_side_length = {}
    for group_id, group_data in result.groupby('group_id'):
        tree_list = [ChristmasTree(center_x=row['x'], center_y=row['y'], angle=row['deg']) for _, row in group_data.iterrows()]
        dict_of_tree_list[group_id] = tree_list
        dict_of_side_length[group_id] = get_tree_list_side_length(tree_list)
    return dict_of_tree_list, dict_of_side_length

print("Loading ensemble_best.csv...")
dict_of_tree_list, dict_of_side_length = parse_csv('/home/code/exploration/datasets/ensemble_best.csv')

# Calculate per-N scores
per_n_scores = {}
for n in range(1, 201):
    key = f'{n:03d}'
    side = dict_of_side_length[key]
    score = float(side ** 2 / Decimal(n))
    per_n_scores[n] = {'side': float(side), 'score': score}

total_score = sum(v['score'] for v in per_n_scores.values())
print(f"Total score: {total_score:.8f}")
print(f"Target: 68.919154")
print(f"Gap: {total_score - 68.919154:.6f} ({(total_score - 68.919154)/68.919154*100:.2f}%)")

# Find N values with highest contribution to score
contributions = [(n, v['score'], v['score']/total_score*100) for n, v in per_n_scores.items()]
contributions.sort(key=lambda x: -x[1])
print("\nTop 20 N values by score contribution:")
for n, score, pct in contributions[:20]:
    print(f"  N={n:3d}: score={score:.6f} ({pct:.2f}%)")

Loading ensemble_best.csv...


Total score: 70.63047845
Target: 68.919154
Gap: 1.711324 (2.48%)

Top 20 N values by score contribution:
  N=  1: score=0.661250 (0.94%)
  N=  2: score=0.450779 (0.64%)
  N=  3: score=0.434745 (0.62%)
  N=  5: score=0.416850 (0.59%)
  N=  4: score=0.416545 (0.59%)
  N=  7: score=0.399897 (0.57%)
  N=  6: score=0.399610 (0.57%)
  N=  9: score=0.387415 (0.55%)
  N=  8: score=0.385407 (0.55%)
  N= 15: score=0.376978 (0.53%)
  N= 10: score=0.376630 (0.53%)
  N= 21: score=0.376451 (0.53%)
  N= 20: score=0.376057 (0.53%)
  N= 22: score=0.375258 (0.53%)
  N= 11: score=0.374924 (0.53%)
  N= 16: score=0.374128 (0.53%)
  N= 26: score=0.373997 (0.53%)
  N= 12: score=0.372724 (0.53%)
  N= 13: score=0.372294 (0.53%)
  N= 25: score=0.372144 (0.53%)


In [2]:
# Calculate what improvement is needed per N to reach target
import math

target = 68.919154
current = total_score
gap = current - target

print(f"Current: {current:.6f}")
print(f"Target: {target:.6f}")
print(f"Gap: {gap:.6f}")

# If we improve each N proportionally
print(f"\nIf we improve each N by the same percentage:")
required_reduction = gap / current * 100
print(f"  Required reduction: {required_reduction:.2f}%")

# Calculate required side length reduction for each N
print(f"\nRequired side length reduction per N (to close gap proportionally):")
for n in [1, 10, 50, 100, 150, 200]:
    current_side = per_n_scores[n]['side']
    current_score = per_n_scores[n]['score']
    # If we reduce score by required_reduction%, what's the new side?
    new_score = current_score * (1 - required_reduction/100)
    new_side = math.sqrt(new_score * n)
    side_reduction = (current_side - new_side) / current_side * 100
    print(f"  N={n:3d}: side {current_side:.4f} -> {new_side:.4f} (reduction: {side_reduction:.2f}%)")

Current: 70.630478
Target: 68.919154
Gap: 1.711324

If we improve each N by the same percentage:
  Required reduction: 2.42%

Required side length reduction per N (to close gap proportionally):
  N=  1: side 0.8132 -> 0.8033 (reduction: 1.22%)
  N= 10: side 1.9407 -> 1.9170 (reduction: 1.22%)
  N= 50: side 4.2471 -> 4.1953 (reduction: 1.22%)
  N=100: side 5.8603 -> 5.7888 (reduction: 1.22%)
  N=150: side 7.1105 -> 7.0239 (reduction: 1.22%)
  N=200: side 8.2164 -> 8.1163 (reduction: 1.22%)


In [3]:
# Analyze which N values have the most "room for improvement"
# by comparing to theoretical lower bounds

# For N trees arranged in a grid, the minimum side is approximately:
# side = sqrt(N) * tree_spacing
# where tree_spacing depends on tree dimensions and rotation

# Tree dimensions at 0 degrees: width=0.7, height=1.0
# At 45 degrees: diagonal ~1.22

print("Efficiency analysis (comparing to theoretical minimum):")
print("="*60)

for n in range(1, 201):
    current_side = per_n_scores[n]['side']
    current_score = per_n_scores[n]['score']
    
    # Theoretical minimum: if trees could be packed perfectly
    # Assume tree footprint ~0.7 x 1.0 = 0.7 sq units
    # Minimum area = N * 0.7 (if 100% efficient)
    # Minimum side = sqrt(N * 0.7)
    theoretical_min_side = math.sqrt(n * 0.7)
    theoretical_min_score = theoretical_min_side ** 2 / n
    
    # Efficiency = theoretical_min / actual
    efficiency = theoretical_min_side / current_side * 100
    
    # Only show N values with low efficiency (room for improvement)
    if efficiency < 50 or n <= 10 or n % 50 == 0:
        print(f"N={n:3d}: side={current_side:.4f}, theoretical_min={theoretical_min_side:.4f}, efficiency={efficiency:.1f}%")

Efficiency analysis (comparing to theoretical minimum):
N=  1: side=0.8132, theoretical_min=0.8367, efficiency=102.9%
N=  2: side=0.9495, theoretical_min=1.1832, efficiency=124.6%
N=  3: side=1.1420, theoretical_min=1.4491, efficiency=126.9%
N=  4: side=1.2908, theoretical_min=1.6733, efficiency=129.6%
N=  5: side=1.4437, theoretical_min=1.8708, efficiency=129.6%
N=  6: side=1.5484, theoretical_min=2.0494, efficiency=132.4%
N=  7: side=1.6731, theoretical_min=2.2136, efficiency=132.3%
N=  8: side=1.7559, theoretical_min=2.3664, efficiency=134.8%
N=  9: side=1.8673, theoretical_min=2.5100, efficiency=134.4%
N= 10: side=1.9407, theoretical_min=2.6458, efficiency=136.3%
N= 50: side=4.2471, theoretical_min=5.9161, efficiency=139.3%
N=100: side=5.8603, theoretical_min=8.3666, efficiency=142.8%
N=150: side=7.1105, theoretical_min=10.2470, efficiency=144.1%
N=200: side=8.2164, theoretical_min=11.8322, efficiency=144.0%


## Key Insights

1. **The gap is 2.42%** - This is significant but not impossible
2. **Top N values contribute disproportionately** - Focus on high-contribution N values
3. **Packing efficiency varies** - Some N values may have more room for improvement
4. **All public solutions converge to same optimum** - Need fundamentally different approach

## What Hasn't Been Tried

1. **Egortrushin tessellation SA** - Creates grid patterns with specific dimensions
   - N=72: [4,9] grid
   - N=100: [5,10] grid
   - N=200: [7,15] grid (210 trees, delete 10 worst)
   
2. **Asymmetric layouts** - Discussion mentions top teams use asymmetric solutions

3. **Very high temperature SA from random starts** - Escape current basin entirely

4. **Hybrid: tessellation + tree deletion** - Generate tessellation, then extract for smaller N