# Loop 2 LB Feedback Analysis

## Submission Results
- **CV Score**: 88.33
- **LB Score**: 88.33
- **Gap**: 0.00 (perfect match!)

## Key Insight
The CV-LB gap is ZERO, which means our local scoring is perfectly calibrated. This is expected for an optimization problem where the score is deterministic.

In [1]:
import pandas as pd
import numpy as np
import math

# Load our current submission
submission = pd.read_csv('/home/code/experiments/002_grid_placement/submission.csv')
print(f"Submission shape: {submission.shape}")
print(submission.head())

Submission shape: (20100, 4)
      id      x     y     deg
0  001_0   s0.0  s0.0    s0.0
1  002_0   s0.0  s0.0    s0.0
2  002_1  s0.35  s0.8  s180.0
3  003_0   s0.0  s0.0    s0.0
4  003_1   s0.7  s0.0    s0.0


In [2]:
# Calculate score breakdown by N
def strip(a):
    return np.array([float(str(v).replace('s','')) for v in a], np.float64)

def score_group(xs, ys, degs):
    """Calculate side^2/n for a group of trees."""
    # Tree template
    tw=0.15; th=0.2; bw=0.7; mw=0.4; ow=0.25
    tip=0.8; t1=0.5; t2=0.25; base=0.0; tbot=-th
    tx = np.array([0,ow/2,ow/4,mw/2,mw/4,bw/2,tw/2,tw/2,-tw/2,-tw/2,-bw/2,-mw/4,-mw/2,-ow/4,-ow/2])
    ty = np.array([tip,t1,t1,t2,t2,base,base,tbot,tbot,base,base,t2,t2,t1,t1])
    
    n = len(xs)
    mnx = mny = 1e300
    mxx = mxy = -1e300
    
    for i in range(n):
        r = degs[i] * math.pi / 180.0
        c, s = math.cos(r), math.sin(r)
        for j in range(len(tx)):
            X = c*tx[j] - s*ty[j] + xs[i]
            Y = s*tx[j] + c*ty[j] + ys[i]
            mnx = min(mnx, X)
            mxx = max(mxx, X)
            mny = min(mny, Y)
            mxy = max(mxy, Y)
    
    side = max(mxx - mnx, mxy - mny)
    return side * side / n

# Calculate score for each N
submission['N'] = submission['id'].str.split('_').str[0].astype(int)
scores = []
for n, g in submission.groupby('N'):
    xs = strip(g['x'].values)
    ys = strip(g['y'].values)
    ds = strip(g['deg'].values)
    sc = score_group(xs, ys, ds)
    scores.append({'N': n, 'score': sc, 'side': math.sqrt(sc * n)})

scores_df = pd.DataFrame(scores)
print(f"Total score: {scores_df['score'].sum():.6f}")
print(f"Target: 68.947559")
print(f"Gap: {scores_df['score'].sum() - 68.947559:.6f}")

Total score: 88.329998
Target: 68.947559
Gap: 19.382439


In [3]:
# Analyze which N values contribute most to the gap
# Compare with theoretical optimal (target / 200 per N on average)
avg_target = 68.947559 / 200
scores_df['gap_from_avg'] = scores_df['score'] - avg_target
scores_df['contribution_pct'] = scores_df['score'] / scores_df['score'].sum() * 100

print("Top 20 N values by score contribution:")
print(scores_df.nlargest(20, 'score')[['N', 'score', 'side', 'contribution_pct']])

Top 20 N values by score contribution:
     N     score  side  contribution_pct
0    1  1.000000  1.00          1.132118
4    5  0.800000  2.00          0.905695
3    4  0.765625  1.75          0.866778
1    2  0.720000  1.20          0.815125
5    6  0.666667  2.00          0.754745
2    3  0.653333  1.40          0.739651
6    7  0.630000  2.10          0.713234
12  13  0.603077  2.80          0.682754
14  15  0.600000  3.00          0.679271
15  16  0.562500  3.00          0.636817
13  14  0.560000  2.80          0.633986
7    8  0.551250  2.10          0.624080
10  11  0.545682  2.45          0.617776
8    9  0.537778  2.20          0.608828
16  17  0.529412  3.00          0.599357
27  28  0.529375  3.85          0.599315
18  19  0.522237  3.15          0.591234
30  31  0.516129  4.00          0.584319
28  29  0.511121  3.85          0.578649
11  12  0.500208  2.45          0.566295


In [4]:
# Small N values are weighted more heavily (1/n factor)
# Let's see the score breakdown by N ranges
ranges = [(1, 10), (11, 50), (51, 100), (101, 150), (151, 200)]
for start, end in ranges:
    mask = (scores_df['N'] >= start) & (scores_df['N'] <= end)
    range_score = scores_df[mask]['score'].sum()
    range_pct = range_score / scores_df['score'].sum() * 100
    print(f"N={start:3d}-{end:3d}: score={range_score:.4f} ({range_pct:.1f}%)")

print(f"\nTotal: {scores_df['score'].sum():.4f}")

N=  1- 10: score=6.8087 (7.7%)
N= 11- 50: score=19.3601 (21.9%)
N= 51-100: score=21.4551 (24.3%)
N=101-150: score=20.5883 (23.3%)
N=151-200: score=20.1178 (22.8%)

Total: 88.3300


In [5]:
# Check N=1 specifically - should be 45 degrees for optimal
n1_row = submission[submission['N'] == 1]
print("N=1 configuration:")
print(n1_row)

# Calculate optimal N=1 score with 45 degree rotation
# Tree at 45 degrees: bounding box is sqrt(2) * max_dimension
# Tree dimensions: width=0.7, height=1.0 (from -0.2 to 0.8)
# At 0 degrees: side = max(0.7, 1.0) = 1.0
# At 45 degrees: diagonal = sqrt(0.7^2 + 1.0^2) / sqrt(2) ≈ 0.86

import math
print(f"\nN=1 at 0°: side = 1.0, score = 1.0")
print(f"N=1 at 45°: side ≈ {math.sqrt(0.7**2 + 1.0**2) / math.sqrt(2):.4f}")

# Actually calculate it properly
xs = np.array([0.0])
ys = np.array([0.0])
ds_0 = np.array([0.0])
ds_45 = np.array([45.0])

score_0 = score_group(xs, ys, ds_0)
score_45 = score_group(xs, ys, ds_45)
print(f"\nActual N=1 score at 0°: {score_0:.6f}")
print(f"Actual N=1 score at 45°: {score_45:.6f}")
print(f"Improvement from 45°: {score_0 - score_45:.6f}")

N=1 configuration:
      id     x     y   deg  N
0  001_0  s0.0  s0.0  s0.0  1

N=1 at 0°: side = 1.0, score = 1.0
N=1 at 45°: side ≈ 0.8631

Actual N=1 score at 0°: 1.000000
Actual N=1 score at 45°: 0.661250
Improvement from 45°: 0.338750


In [6]:
# Key insight: We need to improve by 19.38 points
# Let's estimate what percentage improvement we need
current_score = 88.33
target_score = 68.947559
improvement_needed = current_score - target_score
improvement_pct = improvement_needed / current_score * 100

print(f"Current score: {current_score:.2f}")
print(f"Target score: {target_score:.2f}")
print(f"Improvement needed: {improvement_needed:.2f} points")
print(f"Improvement needed: {improvement_pct:.1f}%")

# If we can reduce each N's side by X%, the score reduces by 2X%
# (since score = side^2/n)
side_reduction_needed = 1 - math.sqrt(target_score / current_score)
print(f"\nAverage side length reduction needed: {side_reduction_needed*100:.1f}%")

Current score: 88.33
Target score: 68.95
Improvement needed: 19.38 points
Improvement needed: 21.9%

Average side length reduction needed: 11.7%


## Analysis Summary

### Key Findings:
1. **CV-LB gap is 0** - Our scoring is perfectly calibrated
2. **N=1 quick win** - Changing from 0° to 45° saves ~0.17 points
3. **Small N values matter most** - N=1-10 contribute disproportionately to score
4. **Need ~22% improvement** - From 88.33 to 68.95
5. **Side reduction ~11%** - Need to reduce average side length by ~11%

### Recommended Next Steps:
1. **Fix N=1** - Use 45° rotation (quick win: ~0.17 points)
2. **Apply rotation tightening** - Rotate entire config to minimize bbox (5-10% improvement)
3. **Local search** - Move trees toward center in small steps
4. **Fractional translation** - Fine-tune with 0.001, 0.0005, etc. steps
5. **Consider C++ implementation** - For faster iteration