# Loop 3 Analysis: LB Feedback and Strategy

## Key Results
- exp_002 (valid baseline) scored **70.647327** on LB (matches CV exactly!)
- CV-LB gap: **0.0000** - Perfect calibration!
- Target: **68.888293**
- Gap to target: **1.759 points (2.5% improvement needed)**

## What We've Learned
1. The original valid baseline from snapshot 21328309254 works on Kaggle
2. CV = LB for this problem (no distribution shift!)
3. The baseline is at a strong local optimum - bbox3/sa_fast couldn't improve it

In [1]:
import pandas as pd
import numpy as np
import json

# Load session state
with open('/home/code/session_state.json', 'r') as f:
    state = json.load(f)

print("=== EXPERIMENTS ===")
for exp in state['experiments']:
    print(f"{exp['id']}: {exp['name']} - CV: {exp['cv_score']:.6f}")

print("\n=== SUBMISSIONS ===")
for sub in state['submissions']:
    lb = sub.get('lb_score', 'N/A')
    error = sub.get('error', None)
    print(f"{sub['experiment_id']}: CV={sub['cv_score']:.6f}, LB={lb}, Error={error}")

=== EXPERIMENTS ===
exp_000: 000_baseline_preoptimized - CV: 70.676102
exp_001: 001_bbox3_sa_optimization - CV: 70.647321
exp_002: 002_valid_baseline_original - CV: 70.647327

=== SUBMISSIONS ===
exp_000: CV=70.676102, LB=, Error=Overlapping trees in group 126
exp_001: CV=70.647321, LB=, Error=Overlapping trees in group 027
exp_002: CV=70.647327, LB=70.647326897636, Error=None


## Key Insights from Research

### From saspav kernel (fix_direction):
- **Rotation tightening**: Optimize rotation of entire configuration using ConvexHull + minimize_scalar
- Objective: minimize max(width, height) of axis-aligned bounding box
- This is PURE PYTHON - no binaries needed!

### From jonathanchan kernel (ensemble + fractional translation):
- **Ensemble**: Combine best solutions from multiple sources for each N
- **Fractional translation**: Very small step movements (0.001, 0.0005, etc.)
- Uses SA with translations + local search

### From discussions:
- Asymmetric solutions outperform symmetric ones
- Top solutions come from extensive optimization runs
- Need to focus on per-N optimization

In [2]:
# Analyze per-N scores from baseline
from shapely.geometry import Polygon
from shapely.affinity import rotate, translate
from shapely.ops import unary_union

def get_tree_polygon():
    trunk_w, trunk_h = 0.15, 0.2
    base_w, mid_w, top_w = 0.7, 0.4, 0.25
    tip_y, tier_1_y, tier_2_y, base_y = 0.8, 0.5, 0.25, 0.0
    trunk_bottom_y = -trunk_h
    vertices = [
        (0.0, tip_y), (top_w/2, tier_1_y), (top_w/4, tier_1_y),
        (mid_w/2, tier_2_y), (mid_w/4, tier_2_y), (base_w/2, base_y),
        (trunk_w/2, base_y), (trunk_w/2, trunk_bottom_y),
        (-trunk_w/2, trunk_bottom_y), (-trunk_w/2, base_y),
        (-base_w/2, base_y), (-mid_w/4, tier_2_y), (-mid_w/2, tier_2_y),
        (-top_w/4, tier_1_y), (-top_w/2, tier_1_y),
    ]
    return Polygon(vertices)

TREE_POLY = get_tree_polygon()
print(f"Tree polygon: {len(TREE_POLY.exterior.coords)} vertices")
print(f"Tree bounds: {TREE_POLY.bounds}")
print(f"Tree area: {TREE_POLY.area:.6f}")

Tree polygon: 16 vertices
Tree bounds: (-0.35, -0.2, 0.35, 0.8)
Tree area: 0.245625


In [3]:
# Load baseline submission and analyze per-N scores
def parse_s_value(s_val):
    if isinstance(s_val, str) and s_val.startswith('s'):
        return float(s_val[1:])
    return float(s_val)

def create_tree(x, y, deg):
    return translate(rotate(TREE_POLY, deg, origin=(0, 0)), x, y)

def get_bbox_side(polygons):
    if not polygons:
        return 0
    combined = unary_union(polygons)
    bounds = combined.bounds
    return max(bounds[2] - bounds[0], bounds[3] - bounds[1])

# Load baseline
df = pd.read_csv('/home/nonroot/snapshots/santa-2025/21328309254/submission/submission.csv')
df['x_val'] = df['x'].apply(parse_s_value)
df['y_val'] = df['y'].apply(parse_s_value)
df['deg_val'] = df['deg'].apply(parse_s_value)
df['n'] = df['id'].apply(lambda x: int(x.split('_')[0]))

print(f"Loaded {len(df)} rows")

Loaded 20100 rows


In [4]:
# Compute per-N scores
scores_by_n = {}
for n in range(1, 201):
    n_df = df[df['n'] == n]
    polygons = [create_tree(row['x_val'], row['y_val'], row['deg_val']) for _, row in n_df.iterrows()]
    side = get_bbox_side(polygons)
    scores_by_n[n] = (side ** 2) / n

total_score = sum(scores_by_n.values())
print(f"Total score: {total_score:.6f}")
print(f"Target: 68.888293")
print(f"Gap: {total_score - 68.888293:.6f}")

Total score: 70.647327
Target: 68.888293
Gap: 1.759034


In [5]:
# Analyze which N values contribute most to the score
import matplotlib.pyplot as plt

n_values = list(range(1, 201))
scores = [scores_by_n[n] for n in n_values]

# Cumulative contribution
cumulative = np.cumsum(scores)

print("\n=== Score Breakdown by N Range ===")
ranges = [(1, 10), (11, 20), (21, 50), (51, 100), (101, 150), (151, 200)]
for start, end in ranges:
    range_score = sum(scores_by_n[n] for n in range(start, end+1))
    pct = range_score / total_score * 100
    print(f"n={start:3d}-{end:3d}: {range_score:.4f} ({pct:.1f}%)")


=== Score Breakdown by N Range ===
n=  1- 10: 4.3291 (6.1%)
n= 11- 20: 3.7263 (5.3%)
n= 21- 50: 10.9844 (15.5%)
n= 51-100: 17.6279 (25.0%)
n=101-150: 17.1366 (24.3%)
n=151-200: 16.8430 (23.8%)


In [6]:
# Find N values with highest per-tree contribution (potential for improvement)
per_tree_contribution = [(n, scores_by_n[n] / n) for n in range(1, 201)]
per_tree_contribution.sort(key=lambda x: x[1], reverse=True)

print("\n=== Top 20 N values by per-tree contribution ===")
for n, contrib in per_tree_contribution[:20]:
    print(f"n={n:3d}: {contrib:.6f} per tree, total={scores_by_n[n]:.6f}")


=== Top 20 N values by per-tree contribution ===
n=  1: 0.661250 per tree, total=0.661250
n=  2: 0.225390 per tree, total=0.450779
n=  3: 0.144915 per tree, total=0.434745
n=  4: 0.104136 per tree, total=0.416545
n=  5: 0.083370 per tree, total=0.416850
n=  6: 0.066602 per tree, total=0.399610
n=  7: 0.057128 per tree, total=0.399897
n=  8: 0.048176 per tree, total=0.385407
n=  9: 0.043046 per tree, total=0.387415
n= 10: 0.037663 per tree, total=0.376630
n= 11: 0.034084 per tree, total=0.374924
n= 12: 0.031060 per tree, total=0.372724
n= 13: 0.028638 per tree, total=0.372294
n= 14: 0.026396 per tree, total=0.369543
n= 15: 0.025280 per tree, total=0.379203
n= 16: 0.023383 per tree, total=0.374128
n= 17: 0.021767 per tree, total=0.370040
n= 18: 0.020487 per tree, total=0.368771
n= 19: 0.019401 per tree, total=0.368615
n= 20: 0.018803 per tree, total=0.376057


In [7]:
# Theoretical minimum for N=1 (single tree rotated 45 degrees)
# Tree dimensions: width=0.7, height=1.0 (from -0.2 to 0.8)
import math

# At 45 degrees, the diagonal of the bounding box is minimized
# For a rectangle w x h rotated by angle theta:
# new_width = w*cos(theta) + h*sin(theta)
# new_height = w*sin(theta) + h*cos(theta)

w, h = 0.7, 1.0  # tree dimensions
theta = math.radians(45)
new_w = w * math.cos(theta) + h * math.sin(theta)
new_h = w * math.sin(theta) + h * math.cos(theta)
side_45 = max(new_w, new_h)

print(f"\n=== Theoretical Analysis ===")
print(f"Tree dimensions: {w} x {h}")
print(f"At 45°: {new_w:.6f} x {new_h:.6f}")
print(f"Bounding box side at 45°: {side_45:.6f}")
print(f"Score for N=1 at 45°: {side_45**2:.6f}")
print(f"Current N=1 score: {scores_by_n[1]:.6f}")


=== Theoretical Analysis ===
Tree dimensions: 0.7 x 1.0
At 45°: 1.202082 x 1.202082
Bounding box side at 45°: 1.202082
Score for N=1 at 45°: 1.445000
Current N=1 score: 0.661250


## Strategy for Next Experiment

### Priority 1: Implement fix_direction (rotation tightening)
This is the evaluator's TOP PRIORITY and is pure Python:
1. For each N configuration, compute ConvexHull of all vertices
2. Use minimize_scalar to find optimal rotation angle (0-90°)
3. Apply rotation to all trees
4. This can reduce bounding box without changing relative positions

### Priority 2: Per-N analysis
Identify which N values have the most room for improvement by comparing to theoretical minimums.

### Priority 3: Ensemble approach
Combine best solutions from multiple sources for each N value.