# Loop 4 LB Feedback Analysis

## Submission Results
- **CV Score**: 70.6761
- **LB Score**: 70.6761 (perfect match!)
- **Target**: 68.922808
- **Gap**: 1.75 points (2.5% reduction needed)

## Key Observations
1. CV = LB exactly - this is a deterministic optimization problem with no distribution shift
2. Current best is 70.68, target is 68.92 - need 1.75 point improvement
3. The target of 68.92 is BETTER than the current #1 on public leaderboard (71.19)

In [1]:
import pandas as pd
import numpy as np
import os

# Check what sources we have
sources = [
    '/home/code/preoptimized_submission.csv',
    '/home/code/datasets/santa-2025.csv',
    '/home/code/datasets/71.97.csv',
    '/home/code/datasets/72.49.csv',
    '/home/code/datasets/submission.csv',
    '/home/code/datasets/jazivxt_output/submission.csv',
    '/home/code/datasets/eazy_output/submission.csv',
    '/home/code/datasets/ashraful_output/submission.csv',
    '/home/code/datasets/bucket-of-chump/submission.csv',
    '/home/code/datasets/chistyakov_packed/submission.csv' if os.path.exists('/home/code/datasets/chistyakov_packed/submission.csv') else None,
    '/home/code/datasets/egortrushin_output/submission.csv' if os.path.exists('/home/code/datasets/egortrushin_output/submission.csv') else None,
]

print('Available sources:')
for src in sources:
    if src and os.path.exists(src):
        print(f'  ✓ {src}')
    elif src:
        print(f'  ✗ {src} (not found)')

Available sources:
  ✓ /home/code/preoptimized_submission.csv
  ✓ /home/code/datasets/santa-2025.csv
  ✓ /home/code/datasets/71.97.csv
  ✓ /home/code/datasets/72.49.csv
  ✓ /home/code/datasets/submission.csv
  ✓ /home/code/datasets/jazivxt_output/submission.csv
  ✓ /home/code/datasets/eazy_output/submission.csv
  ✓ /home/code/datasets/ashraful_output/submission.csv
  ✓ /home/code/datasets/bucket-of-chump/submission.csv
  ✓ /home/code/datasets/egortrushin_output/submission.csv


In [2]:
# Check for additional datasets we might have missed
import glob

print('\nAll CSV files in datasets:')
for f in glob.glob('/home/code/datasets/**/*.csv', recursive=True):
    print(f'  {f}')


All CSV files in datasets:
  /home/code/datasets/submission.csv
  /home/code/datasets/santa-2025.csv
  /home/code/datasets/72.49.csv
  /home/code/datasets/71.97.csv
  /home/code/datasets/chistyakov_kernel_output/submission.csv
  /home/code/datasets/chistyakov_output/submission.csv
  /home/code/datasets/telegram/72.49.csv
  /home/code/datasets/telegram/71.97.csv
  /home/code/datasets/jazivxt_output/submission.csv
  /home/code/datasets/bucket-of-chump/submission.csv
  /home/code/datasets/ashraful_output/submission.csv
  /home/code/datasets/saspav/santa-2025.csv
  /home/code/datasets/saspav_dataset/santa-2025.csv
  /home/code/datasets/chistyakov/submission_best.csv
  /home/code/datasets/chistyakov_packed/submission_best.csv
  /home/code/datasets/eazy_output/submission.csv
  /home/code/datasets/egortrushin_output/submission.csv


In [3]:
# Analyze the current submission to find worst N values
import math
from numba import njit

@njit
def make_polygon_template():
    tw=0.15; th=0.2; bw=0.7; mw=0.4; ow=0.25
    tip=0.8; t1=0.5; t2=0.25; base=0.0; tbot=-th
    x=np.array([0,ow/2,ow/4,mw/2,mw/4,bw/2,tw/2,tw/2,-tw/2,-tw/2,-bw/2,-mw/4,-mw/2,-ow/4,-ow/2],np.float64)
    y=np.array([tip,t1,t1,t2,t2,base,base,tbot,tbot,base,base,t2,t2,t1,t1],np.float64)
    return x,y

@njit
def score_group(xs, ys, degs, tx, ty):
    n = xs.size
    V = tx.size
    mnx = 1e300; mny = 1e300; mxx = -1e300; mxy = -1e300
    for i in range(n):
        r = degs[i] * math.pi / 180.0
        c = math.cos(r); s = math.sin(r)
        xi = xs[i]; yi = ys[i]
        for j in range(V):
            X = c * tx[j] - s * ty[j] + xi
            Y = s * tx[j] + c * ty[j] + yi
            if X < mnx: mnx = X
            if X > mxx: mxx = X
            if Y < mny: mny = Y
            if Y > mxy: mxy = Y
    side = max(mxx - mnx, mxy - mny)
    return side * side / n

def strip(val):
    return float(str(val).replace('s', ''))

tx, ty = make_polygon_template()

# Load current submission
df = pd.read_csv('/home/submission/submission.csv')

# Calculate score per N
scores = {}
for n in range(1, 201):
    group = df[df['id'].str.startswith(f'{n:03d}_')]
    xs = np.array([strip(x) for x in group['x']], dtype=np.float64)
    ys = np.array([strip(y) for y in group['y']], dtype=np.float64)
    degs = np.array([strip(d) for d in group['deg']], dtype=np.float64)
    scores[n] = score_group(xs, ys, degs, tx, ty)

print(f'Total score: {sum(scores.values()):.6f}')
print(f'Target: 68.922808')
print(f'Gap: {sum(scores.values()) - 68.922808:.6f}')

Total score: 70.676145
Target: 68.922808
Gap: 1.753337


In [4]:
# Find worst N values (highest contribution to score)
sorted_scores = sorted(scores.items(), key=lambda x: -x[1])

print('\nTop 20 worst N values (highest score contribution):')
print('N\tScore\t\tCumulative')
cumulative = 0
for n, sc in sorted_scores[:20]:
    cumulative += sc
    print(f'{n}\t{sc:.6f}\t{cumulative:.6f}')

print(f'\nTop 20 contribute: {cumulative:.6f} ({cumulative/sum(scores.values())*100:.1f}% of total)')


Top 20 worst N values (highest score contribution):
N	Score		Cumulative
1	0.661250	0.661250
2	0.450779	1.112029
3	0.434745	1.546774
5	0.416850	1.963624
4	0.416545	2.380169
7	0.399897	2.780065
6	0.399610	3.179676
9	0.387415	3.567091
8	0.385407	3.952498
15	0.379203	4.331701
10	0.376630	4.708331
21	0.376451	5.084782
20	0.376057	5.460839
11	0.375736	5.836575
22	0.375258	6.211833
16	0.374128	6.585961
26	0.373997	6.959958
12	0.372724	7.332682
13	0.372323	7.705005
25	0.372144	8.077149

Top 20 contribute: 8.077149 (11.4% of total)


In [5]:
# Calculate theoretical minimum improvement needed
target = 68.922808
current = sum(scores.values())
gap = current - target

print(f'\nImprovement Analysis:')
print(f'Current score: {current:.6f}')
print(f'Target score: {target:.6f}')
print(f'Gap to close: {gap:.6f}')
print(f'Percentage reduction needed: {gap/current*100:.2f}%')

# If we could improve top 20 N values by X%, what would we need?
print(f'\nIf we improve top 20 worst N values:')
top20_total = sum([sc for n, sc in sorted_scores[:20]])
for pct in [5, 10, 15, 20, 25, 30]:
    improvement = top20_total * pct / 100
    new_score = current - improvement
    print(f'  {pct}% improvement: {new_score:.6f} (gap: {new_score - target:.6f})')


Improvement Analysis:
Current score: 70.676145
Target score: 68.922808
Gap to close: 1.753337
Percentage reduction needed: 2.48%

If we improve top 20 worst N values:
  5% improvement: 70.272287 (gap: 1.349479)
  10% improvement: 69.868430 (gap: 0.945622)
  15% improvement: 69.464573 (gap: 0.541765)
  20% improvement: 69.060715 (gap: 0.137907)
  25% improvement: 68.656858 (gap: -0.265950)
  30% improvement: 68.253000 (gap: -0.669808)


## Strategy Analysis

### Key Insights:
1. **CV = LB exactly** - No distribution shift, pure optimization problem
2. **Target is below current #1** - Need novel techniques, not just ensembling public sources
3. **Small N values dominate** - N=1 to N=20 contribute disproportionately

### Approaches to Try:
1. **Sparrow optimizer** - Rust-based strip packing for specific N values
2. **Extended C++ optimization** - Run bbox3 for hours with high parameters
3. **Crystalline packing for large N** - Mathematical lattice structures for N > 58
4. **Manual optimization** - Interactive editor for small N values
5. **Different initial configurations** - Start from scratch with different seeds