# Loop 2 LB Feedback Analysis

**LB Score: 71.8128** (matches CV exactly - perfect calibration!)

## Key Insights:
1. CV-LB gap is 0.0000 - our scoring is perfectly calibrated
2. The valid submission (non-touching) scores 71.81
3. The touching submission would score 70.65 but is rejected
4. Gap to target: 71.81 - 68.92 = 2.89 points (4.2%)

## Strategy Analysis:
- The 1.17 point penalty for non-touching vs touching is significant
- We need to either:
  1. Find better valid configurations from scratch
  2. Apply micro-separation to touching trees to minimize score impact
  3. Run optimization with gap constraints from the start

In [1]:
import pandas as pd
import numpy as np
from decimal import Decimal, getcontext
from shapely import affinity
from shapely.geometry import Polygon
from itertools import combinations
import json
import os

getcontext().prec = 30

# Christmas Tree class
class ChristmasTree:
    def __init__(self, center_x='0', center_y='0', angle='0'):
        self.center_x = Decimal(str(center_x))
        self.center_y = Decimal(str(center_y))
        self.angle = Decimal(str(angle))
        
        initial_polygon = Polygon([
            (0.0, 0.8), (0.125, 0.5), (0.0625, 0.5),
            (0.2, 0.25), (0.1, 0.25), (0.35, 0.0),
            (0.075, 0.0), (0.075, -0.2), (-0.075, -0.2),
            (-0.075, 0.0), (-0.35, 0.0), (-0.1, 0.25),
            (-0.2, 0.25), (-0.0625, 0.5), (-0.125, 0.5),
        ])
        rotated = affinity.rotate(initial_polygon, float(self.angle), origin=(0, 0))
        self.polygon = affinity.translate(rotated, xoff=float(self.center_x), yoff=float(self.center_y))

def parse_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return val[1:]
    return str(val)

def load_trees_for_n(df, n):
    prefix = f"{n:03d}_"
    rows = df[df['id'].str.startswith(prefix)]
    trees = []
    for _, row in rows.iterrows():
        x = parse_value(row['x'])
        y = parse_value(row['y'])
        deg = parse_value(row['deg'])
        trees.append(ChristmasTree(x, y, deg))
    return trees

print("Classes defined")

Classes defined


In [2]:
# Load both submissions
valid_df = pd.read_csv('/home/code/experiments/002_valid_submission/submission.csv')
touching_df = pd.read_csv('/home/code/experiments/002_valid_ensemble/submission.csv')

print(f"Valid submission: {len(valid_df)} rows")
print(f"Touching submission: {len(touching_df)} rows")

Valid submission: 20100 rows
Touching submission: 20100 rows


In [3]:
def get_bounding_box_side(trees):
    all_points = []
    for tree in trees:
        coords = np.array(tree.polygon.exterior.coords)
        all_points.append(coords)
    all_points = np.vstack(all_points)
    min_x, min_y = all_points.min(axis=0)
    max_x, max_y = all_points.max(axis=0)
    return max(max_x - min_x, max_y - min_y)

def get_min_distance(trees):
    if len(trees) <= 1:
        return float('inf')
    min_dist = float('inf')
    for i, j in combinations(range(len(trees)), 2):
        dist = trees[i].polygon.distance(trees[j].polygon)
        min_dist = min(min_dist, dist)
    return min_dist

# Compare scores for each N
comparison = []
for n in range(1, 201):
    valid_trees = load_trees_for_n(valid_df, n)
    touching_trees = load_trees_for_n(touching_df, n)
    
    valid_side = get_bounding_box_side(valid_trees)
    touching_side = get_bounding_box_side(touching_trees)
    
    valid_score = (valid_side ** 2) / n
    touching_score = (touching_side ** 2) / n
    
    valid_min_dist = get_min_distance(valid_trees)
    touching_min_dist = get_min_distance(touching_trees)
    
    comparison.append({
        'n': n,
        'valid_score': valid_score,
        'touching_score': touching_score,
        'gap': valid_score - touching_score,
        'valid_min_dist': valid_min_dist,
        'touching_min_dist': touching_min_dist
    })

comp_df = pd.DataFrame(comparison)
print(f"Total valid score: {comp_df['valid_score'].sum():.6f}")
print(f"Total touching score: {comp_df['touching_score'].sum():.6f}")
print(f"Total gap: {comp_df['gap'].sum():.6f}")

Total valid score: 71.812779
Total touching score: 70.646824
Total gap: 1.165956


In [4]:
# Find N values with largest gaps
comp_df_sorted = comp_df.sort_values('gap', ascending=False)
print("Top 20 N values with largest gap (valid - touching):")
print(comp_df_sorted.head(20).to_string(index=False))

Top 20 N values with largest gap (valid - touching):
  n  valid_score  touching_score      gap  valid_min_dist  touching_min_dist
181     0.369480        0.329945 0.039535    1.174132e-07       1.033920e-11
168     0.368960        0.332475 0.036485    5.109866e-07       1.650286e-12
194     0.367806        0.332999 0.034807    9.590557e-09       6.498541e-15
165     0.360572        0.335569 0.025004    2.852034e-09       6.938894e-17
166     0.358650        0.334819 0.023832    1.527768e-09       4.270089e-17
167     0.356501        0.332835 0.023666    4.109167e-09       5.298539e-13
144     0.365245        0.342276 0.022968    1.354983e-06       8.310107e-12
138     0.362978        0.341028 0.021950    2.643973e-09       3.784851e-17
164     0.358542        0.337328 0.021215    8.326255e-07       2.882310e-16
 96     0.367014        0.346397 0.020617    1.710374e-08       1.029796e-12
 91     0.368029        0.347911 0.020118    3.293714e-09       3.873075e-13
139     0.360962       

In [5]:
# Analyze: which N values have the worst valid scores?
comp_df['valid_efficiency'] = comp_df['valid_score'] / comp_df['n']
print("\nN values with worst valid efficiency (score/n):")
print(comp_df.sort_values('valid_efficiency', ascending=False).head(20)[['n', 'valid_score', 'valid_efficiency']].to_string(index=False))


N values with worst valid efficiency (score/n):
 n  valid_score  valid_efficiency
 1     0.661250          0.661250
 2     0.450779          0.225390
 3     0.434745          0.144915
 4     0.416632          0.104158
 5     0.417047          0.083409
 6     0.399825          0.066638
 7     0.400198          0.057171
 8     0.386583          0.048323
 9     0.387545          0.043061
10     0.377070          0.037707
11     0.376996          0.034272
12     0.375245          0.031270
13     0.373180          0.028706
14     0.382126          0.027295
15     0.379466          0.025298
16     0.375996          0.023500
17     0.370840          0.021814
18     0.370876          0.020604
19     0.378862          0.019940
20     0.378602          0.018930


In [6]:
# Key insight: The gap comes from configurations where touching trees were separated
# Let's see if we can apply micro-separation to touching trees

# For a touching configuration, we need to:
# 1. Find all pairs of touching trees
# 2. Calculate the minimum separation vector
# 3. Apply half the vector to each tree
# 4. Re-optimize the bounding box rotation

print("\nAnalyzing touching configurations...")
print(f"N values where touching_min_dist < 1e-9: {(comp_df['touching_min_dist'] < 1e-9).sum()}")
print(f"N values where valid_min_dist < 1e-9: {(comp_df['valid_min_dist'] < 1e-9).sum()}")


Analyzing touching configurations...
N values where touching_min_dist < 1e-9: 199
N values where valid_min_dist < 1e-9: 0
