# Loop 12 Analysis: Understanding the Gap to Target

## Current Situation
- Best CV/LB: 70.659437
- Target: 68.919154
- Gap: 1.74 points (2.53%)

## Key Questions
1. Where is the gap coming from?
2. What techniques have NOT been tried?
3. What do top teams know that we don't?

In [1]:
import pandas as pd
import numpy as np
from shapely.geometry import Polygon
from shapely import affinity
import matplotlib.pyplot as plt

# The correct tree shape
def get_tree_polygon(x, y, deg):
    trunk_w = 0.15
    trunk_h = 0.2
    base_w = 0.7
    mid_w = 0.4
    top_w = 0.25
    tip_y = 0.8
    tier_1_y = 0.5
    tier_2_y = 0.25
    base_y = 0.0
    trunk_bottom_y = -trunk_h
    
    vertices = [
        (0.0, tip_y),
        (top_w / 2, tier_1_y),
        (top_w / 4, tier_1_y),
        (mid_w / 2, tier_2_y),
        (mid_w / 4, tier_2_y),
        (base_w / 2, base_y),
        (trunk_w / 2, base_y),
        (trunk_w / 2, trunk_bottom_y),
        (-trunk_w / 2, trunk_bottom_y),
        (-trunk_w / 2, base_y),
        (-base_w / 2, base_y),
        (-mid_w / 4, tier_2_y),
        (-mid_w / 2, tier_2_y),
        (-top_w / 4, tier_1_y),
        (-top_w / 2, tier_1_y),
    ]
    
    poly = Polygon(vertices)
    poly = affinity.rotate(poly, deg, origin=(0, 0))
    poly = affinity.translate(poly, xoff=x, yoff=y)
    return poly

# Tree area
tree_poly = Polygon([
    (0.0, 0.8),
    (0.125, 0.5),
    (0.0625, 0.5),
    (0.2, 0.25),
    (0.1, 0.25),
    (0.35, 0.0),
    (0.075, 0.0),
    (0.075, -0.2),
    (-0.075, -0.2),
    (-0.075, 0.0),
    (-0.35, 0.0),
    (-0.1, 0.25),
    (-0.2, 0.25),
    (-0.0625, 0.5),
    (-0.125, 0.5),
])
print(f'Tree area: {tree_poly.area:.6f}')
print(f'Tree bounding box: {tree_poly.bounds}')
print(f'Tree width: {tree_poly.bounds[2] - tree_poly.bounds[0]:.3f}')
print(f'Tree height: {tree_poly.bounds[3] - tree_poly.bounds[1]:.3f}')

Tree area: 0.245625
Tree bounding box: (-0.35, -0.2, 0.35, 0.8)
Tree width: 0.700
Tree height: 1.000


In [2]:
# Read the best submission
df = pd.read_csv('/home/submission/submission.csv')
df['x'] = df['x'].str.strip('s').astype(float)
df['y'] = df['y'].str.strip('s').astype(float)
df['deg'] = df['deg'].str.strip('s').astype(float)
df['N'] = df['id'].str.split('_').str[0].astype(int)

# Calculate score per N
scores = []
for n in range(1, 201):
    group = df[df['N'] == n]
    if len(group) == 0:
        continue
    
    all_x, all_y = [], []
    for _, row in group.iterrows():
        poly = get_tree_polygon(row['x'], row['y'], row['deg'])
        coords = list(poly.exterior.coords)
        for cx, cy in coords:
            all_x.append(cx)
            all_y.append(cy)
    
    side = max(max(all_x) - min(all_x), max(all_y) - min(all_y))
    score = side**2 / n
    scores.append({'N': n, 'side': side, 'score': score})

scores_df = pd.DataFrame(scores)
print(f'Total score: {scores_df["score"].sum():.6f}')
print(f'Target: 68.919154')
print(f'Gap: {scores_df["score"].sum() - 68.919154:.6f}')

Total score: 70.659437
Target: 68.919154
Gap: 1.740283


In [3]:
# Calculate how much improvement is needed per N
current_total = scores_df['score'].sum()
target_total = 68.919154
gap = current_total - target_total

print(f'Gap to close: {gap:.6f}')
print(f'Average improvement needed per N: {gap/200:.6f}')
print()

# If we could improve each N proportionally, what % reduction in side length?
r = 1 - np.sqrt(target_total / current_total)
print(f'Uniform side reduction needed: {r*100:.4f}%')
print(f'This means reducing each side by {r:.6f} of its current value')

Gap to close: 1.740283
Average improvement needed per N: 0.008701

Uniform side reduction needed: 1.2391%
This means reducing each side by 0.012391 of its current value


In [4]:
# Analyze which N values have the most room for improvement
tree_area = 0.245625  # From tree polygon

efficiency_data = []
for _, row in scores_df.iterrows():
    n = int(row['N'])
    side = row['side']
    actual_area = side ** 2
    theoretical_min = n * tree_area
    efficiency = actual_area / theoretical_min
    
    target_efficiency = 1.35
    potential_improvement = row['score'] * (1 - (target_efficiency / efficiency)**2) if efficiency > target_efficiency else 0
    
    efficiency_data.append({
        'N': n,
        'side': side,
        'score': row['score'],
        'efficiency': efficiency,
        'potential_improvement': potential_improvement
    })

eff_df = pd.DataFrame(efficiency_data)
print('Top 20 N values by efficiency (worst packing):')  
print(eff_df.nlargest(20, 'efficiency')[['N', 'side', 'score', 'efficiency']].to_string())

Top 20 N values by efficiency (worst packing):
     N      side     score  efficiency
0    1  0.813173  0.661250    2.692112
1    2  0.949504  0.450779    1.835233
2    3  1.142031  0.434745    1.769955
4    5  1.443692  0.416850    1.697098
3    4  1.290806  0.416545    1.695857
6    7  1.673104  0.399897    1.628078
5    6  1.548438  0.399610    1.626912
8    9  1.867280  0.387415    1.577262
7    8  1.755921  0.385407    1.569088
14  15  2.384962  0.379203    1.543828
9   10  1.940696  0.376630    1.533354
20  21  2.811667  0.376451    1.532625
19  20  2.742469  0.376057    1.531020
10  11  2.033002  0.375736    1.529714
21  22  2.873270  0.375258    1.527768
15  16  2.446640  0.374128    1.523167
25  26  3.118320  0.373997    1.522634
11  12  2.114873  0.372724    1.517451
12  13  2.199960  0.372294    1.515701
24  25  3.050182  0.372144    1.515092


In [5]:
# Score contribution by N range
print('Score contribution by N range:')
for start in [1, 11, 21, 51, 101, 151]:
    end = min(start + 9, 200) if start < 151 else 200
    subset = eff_df[(eff_df['N'] >= start) & (eff_df['N'] <= end)]
    print(f'N={start:3d}-{end:3d}: total_score={subset["score"].sum():.4f}, avg_efficiency={subset["efficiency"].mean():.4f}')

print()
print('Total score from N=1-10:', eff_df[eff_df['N'] <= 10]['score'].sum())
print('Total score from N=11-50:', eff_df[(eff_df['N'] > 10) & (eff_df['N'] <= 50)]['score'].sum())
print('Total score from N=51-200:', eff_df[eff_df['N'] > 50]['score'].sum())

Score contribution by N range:
N=  1- 10: total_score=4.3291, avg_efficiency=1.7625
N= 11- 20: total_score=3.7280, avg_efficiency=1.5178
N= 21- 30: total_score=3.6889, avg_efficiency=1.5018
N= 51- 60: total_score=3.5910, avg_efficiency=1.4620
N=101-110: total_score=3.4331, avg_efficiency=1.3977
N=151-200: total_score=16.8451, avg_efficiency=1.3716

Total score from N=1-10: 4.3291279238767775
Total score from N=11-50: 14.712640496840867
Total score from N=51-200: 51.61766871010656


## Key Findings

1. **Small N values (1-10) contribute ~4.0 points** - these have the worst efficiency
2. **Medium N values (11-50) contribute ~14.5 points**
3. **Large N values (51-200) contribute ~52 points**

## The Gap Analysis

To close the 1.74 point gap:
- We need ~1.24% reduction in side length uniformly
- OR we need specific improvements in certain N values

## What techniques have NOT been tried?

1. **Genetic algorithms with topology crossover** - exchange tree arrangements between solutions
2. **Strip packing to square conversion** - different optimization landscape
3. **Constraint programming** - exact methods for small N
4. **Learning from top LB solutions** - if any are shared
5. **Asymmetric configurations for specific N values** - not just N=22, N=24