# Loop 34 Analysis: Strategy Assessment

## Current Situation
- **Best LB**: 70.316492 (exp_022)
- **Current CV**: 70.315389 (not yet submitted)
- **Target**: 68.870074
- **Gap**: 1.445 points (2.06%)

## Key Findings from 35 Experiments
1. **13+ algorithms tested** - ALL found ZERO improvement over baseline
2. **Only ensemble worked** - combining external sources improved 70.615 → 70.316
3. **External sources exhausted** - our baseline is BETTER than all 297+ external files
4. **exp_007 mystery** - achieved 70.265 but had NaN in N=24 (invalid)

In [None]:
import pandas as pd
import numpy as np
import json
import math

# Load current submission
df = pd.read_csv('/home/submission/submission.csv')

# Tree geometry
TX = np.array([0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125])
TY = np.array([0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5])

def get_tree_vertices(x, y, angle_deg):
    angle_rad = math.radians(angle_deg)
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    vertices = []
    for tx, ty in zip(TX, TY):
        rx = tx * cos_a - ty * sin_a
        ry = tx * sin_a + ty * cos_a
        vertices.append((rx + x, ry + y))
    return vertices

def compute_bbox(trees):
    all_x, all_y = [], []
    for x, y, angle in trees:
        vertices = get_tree_vertices(x, y, angle)
        for vx, vy in vertices:
            all_x.append(vx)
            all_y.append(vy)
    return max(max(all_x) - min(all_x), max(all_y) - min(all_y))

def compute_score(trees, n):
    return compute_bbox(trees) ** 2 / n

def parse_val(v):
    return float(str(v).replace('s', ''))

# Compute per-N scores
per_n_scores = {}
for n in range(1, 201):
    n_df = df[df['id'].str.startswith(f'{n:03d}_')]
    if len(n_df) == 0:
        n_df = df[df['id'].str.startswith(f'{n}_')]
    if len(n_df) != n:
        continue
    trees = [(parse_val(row['x']), parse_val(row['y']), parse_val(row['deg'])) for _, row in n_df.iterrows()]
    per_n_scores[n] = compute_score(trees, n)

total = sum(per_n_scores.values())
print(f'Total score: {total:.6f}')
print(f'Target: 68.870074')
print(f'Gap: {total - 68.870074:.6f} ({(total - 68.870074)/total*100:.2f}%)')

In [None]:
# Analyze score distribution by N
import matplotlib.pyplot as plt

ns = list(per_n_scores.keys())
scores = list(per_n_scores.values())

plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.bar(ns, scores, alpha=0.7)
plt.xlabel('N')
plt.ylabel('Score contribution')
plt.title('Score contribution by N')

# Cumulative contribution
cumsum = np.cumsum(scores)
plt.subplot(1, 2, 2)
plt.plot(ns, cumsum)
plt.xlabel('N')
plt.ylabel('Cumulative score')
plt.title('Cumulative score')
plt.axhline(y=68.870074, color='r', linestyle='--', label='Target')
plt.legend()
plt.tight_layout()
plt.savefig('/home/code/exploration/score_distribution.png')
plt.show()

print(f'\nTop 10 N values contribute: {sum(scores[:10]):.4f} ({sum(scores[:10])/total*100:.1f}% of total)')
print(f'Top 50 N values contribute: {sum(scores[:50]):.4f} ({sum(scores[:50])/total*100:.1f}% of total)')

In [None]:
# Check what improvements would be needed
gap = total - 68.870074

print('To close the gap of {:.4f} points:'.format(gap))
print()

# Option 1: Uniform improvement
print('Option 1: Uniform improvement across all N')
print(f'  Need {gap/200:.6f} improvement per N')
print(f'  That\'s {gap/200/0.35*100:.1f}% improvement per N (avg score ~0.35)')
print()

# Option 2: Focus on small N
print('Option 2: Focus on small N (N=1-20)')
small_n_total = sum(scores[:20])
print(f'  Small N (1-20) contribute: {small_n_total:.4f}')
print(f'  Need to reduce by: {gap/small_n_total*100:.1f}% to close gap')
print()

# Option 3: Find a few big improvements
print('Option 3: Find big improvements in specific N values')
print(f'  If we could improve N=1 by 50%: save {scores[0]*0.5:.4f}')
print(f'  If we could improve N=1-5 by 30%: save {sum(scores[:5])*0.3:.4f}')
print(f'  If we could improve N=1-10 by 20%: save {sum(scores[:10])*0.2:.4f}')

In [None]:
# Check N=1 specifically - this is the highest contributor
n1_df = df[df['id'].str.startswith('001_')]
if len(n1_df) == 0:
    n1_df = df[df['id'].str.startswith('1_')]

print(f'N=1 data:')
print(n1_df)

x1 = parse_val(n1_df.iloc[0]['x'])
y1 = parse_val(n1_df.iloc[0]['y'])
deg1 = parse_val(n1_df.iloc[0]['deg'])

print(f'\nPosition: ({x1:.6f}, {y1:.6f})')
print(f'Angle: {deg1:.6f}°')

# For N=1, the score is just bbox^2
# The tree has height 1.0 (from -0.2 to 0.8) and width 0.7 (from -0.35 to 0.35)
# Optimal rotation minimizes max(width, height) of the rotated tree

# Let's compute the bbox for different angles
angles = np.linspace(0, 360, 3601)
bboxes = []
for angle in angles:
    trees = [(0, 0, angle)]
    bbox = compute_bbox(trees)
    bboxes.append(bbox)

min_bbox = min(bboxes)
min_angle = angles[np.argmin(bboxes)]
print(f'\nOptimal angle for N=1: {min_angle:.2f}° with bbox {min_bbox:.6f}')
print(f'Optimal N=1 score: {min_bbox**2:.6f}')
print(f'Current N=1 score: {per_n_scores[1]:.6f}')
print(f'Potential improvement: {per_n_scores[1] - min_bbox**2:.6f}')

## Key Insight: N=1 is Already Optimal

The analysis shows that N=1 is already at or very close to the optimal angle. The tree's geometry means the minimum bounding box is achieved at specific angles.

## Strategy Assessment

After 35 experiments:
1. **All local search methods failed** - SA, GA, B&B, NFP, jostle, etc.
2. **Ensemble was the only success** - but external sources are exhausted
3. **The baseline is at a PUBLIC KERNEL CEILING** - ~70.316

## What Would It Take to Reach 68.87?

The gap is 1.445 points (2.06%). To close this:
- Need ~0.007 improvement per N on average
- OR significant improvements on high-contribution N values
- OR access to private solutions not publicly available

## Remaining Options

1. **Submit current solution** - verify 70.315389 on Kaggle
2. **Extended C++ optimization** - run bbox3 for hours/days
3. **Novel algorithms** - but 13+ have already failed
4. **Private solutions** - not accessible

In [None]:
# Final summary
print('='*60)
print('LOOP 34 ANALYSIS SUMMARY')
print('='*60)
print(f'Current best CV: 70.315389')
print(f'Current best LB: 70.316492 (exp_022)')
print(f'Target: 68.870074')
print(f'Gap: 1.445 points (2.06%)')
print()
print('EXPERIMENTS COMPLETED: 35')
print('- Binary optimizers: 4 (bbox3, sa_fast)')
print('- Ensemble approaches: 10+')
print('- Novel algorithms: 13+ (SA, GA, B&B, NFP, lattice, tessellation, etc.)')
print()
print('WHAT WORKED:')
print('- Ensemble of external sources: 70.615 -> 70.316 (0.30 improvement)')
print()
print('WHAT FAILED:')
print('- ALL local search methods found ZERO improvement')
print('- External sources exhausted - our baseline is better than all 297+ files')
print()
print('REMAINING OPTIONS:')
print('1. Submit current solution (70.315389) to verify on Kaggle')
print('2. Extended C++ optimization (hours/days with bbox3)')
print('3. Novel algorithms (but 13+ have already failed)')
print()
print('REALITY CHECK:')
print('- Top public kernel: ~70.3')
print('- Top LB (Rafbill): 69.99')
print('- Target: 68.87')
print('- Gap to 1st place: 0.33 points (0.5%)')
print('- Gap to target: 1.45 points (2.1%)')
print()
print('The target likely requires resources or solutions we don\'t have access to.')