# Loop 5 LB Feedback Analysis

**Latest submission:** exp_004 (005_extended_optimization)
- CV: 84.712432
- LB: 84.712432 (gap: 0.0000)

**Key observations:**
1. Perfect CV-LB alignment (expected for deterministic optimization)
2. Extended optimization (57 min) yielded only 0.19 points improvement
3. Diminishing returns signal we're hitting local optimum
4. Gap to target: 15.78 points (22.9%)

**Analysis goals:**
1. Understand worst-performing N values
2. Identify opportunities for aggressive backward propagation
3. Analyze grid-based initialization potential

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load current best submission
df = pd.read_csv('/home/submission/submission.csv')
print(f"Loaded {len(df)} rows")

# Parse values
def strip_s(val):
    s = str(val)
    return float(s[1:] if s.startswith('s') else s)

df['N'] = df['id'].astype(str).str.split('_').str[0].astype(int)
df['x_val'] = df['x'].apply(strip_s)
df['y_val'] = df['y'].apply(strip_s)
df['deg_val'] = df['deg'].apply(strip_s)

In [None]:
# Fast scoring
TX = np.array([0,0.125,0.0625,0.2,0.1,0.35,0.075,0.075,-0.075,-0.075,-0.35,-0.1,-0.2,-0.0625,-0.125])
TY = np.array([0.8,0.5,0.5,0.25,0.25,0,0,-0.2,-0.2,0,0,0.25,0.25,0.5,0.5])

def score_group_fast(xs, ys, degs):
    n = len(xs)
    if n == 0:
        return float('inf'), 0
    all_x, all_y = [], []
    for i in range(n):
        rad = np.radians(degs[i])
        c, s = np.cos(rad), np.sin(rad)
        px = TX * c - TY * s + xs[i]
        py = TX * s + TY * c + ys[i]
        all_x.extend(px)
        all_y.extend(py)
    all_x, all_y = np.array(all_x), np.array(all_y)
    side = max(all_x.max() - all_x.min(), all_y.max() - all_y.min())
    return side * side / n, side

# Calculate scores for each N
scores = []
for n in range(1, 201):
    g = df[df['N'] == n]
    if len(g) == n:
        score, side = score_group_fast(g['x_val'].values, g['y_val'].values, g['deg_val'].values)
        scores.append({'N': n, 'score': score, 'side': side, 'contribution': score})

scores_df = pd.DataFrame(scores)
total_score = scores_df['score'].sum()
print(f"Total score: {total_score:.6f}")
print(f"Target: 68.931058")
print(f"Gap: {total_score - 68.931058:.6f} ({(total_score - 68.931058)/68.931058*100:.1f}%)")

In [None]:
# Analyze worst performing N values
scores_df['pct_contribution'] = scores_df['score'] / total_score * 100
scores_df = scores_df.sort_values('score', ascending=False)

print("\n=== TOP 20 WORST N VALUES ===")
print(scores_df.head(20).to_string())

print(f"\nTop 20 worst contribute: {scores_df.head(20)['pct_contribution'].sum():.2f}% of total")
print(f"Top 10 worst contribute: {scores_df.head(10)['pct_contribution'].sum():.2f}% of total")

In [None]:
# Analyze score distribution by N
plt.figure(figsize=(14, 5))

plt.subplot(1, 2, 1)
plt.scatter(scores_df['N'], scores_df['score'], alpha=0.6, s=20)
plt.xlabel('N (number of trees)')
plt.ylabel('Score (side²/N)')
plt.title('Score by N')
plt.axhline(y=68.931058/200, color='r', linestyle='--', label='Target avg')
plt.legend()

plt.subplot(1, 2, 2)
plt.scatter(scores_df['N'], scores_df['side'], alpha=0.6, s=20)
plt.xlabel('N (number of trees)')
plt.ylabel('Side length')
plt.title('Side length by N')

plt.tight_layout()
plt.savefig('/home/code/exploration/loop5_score_analysis.png', dpi=100)
plt.show()
print("Saved to loop5_score_analysis.png")

In [None]:
# Analyze potential for backward propagation
# For each N, compare side length to N+1, N+2, etc.
scores_sorted = scores_df.sort_values('N').reset_index(drop=True)

print("\n=== BACKWARD PROPAGATION POTENTIAL ===")
print("Looking for N values where larger N has SMALLER side length (propagation opportunity)")

propagation_opportunities = []
for i, row in scores_sorted.iterrows():
    n = row['N']
    current_side = row['side']
    
    # Check if any larger N has smaller side
    larger = scores_sorted[scores_sorted['N'] > n]
    if len(larger) > 0:
        min_larger_side = larger['side'].min()
        min_larger_n = larger.loc[larger['side'].idxmin(), 'N']
        if min_larger_side < current_side:
            improvement = (current_side - min_larger_side) / current_side * 100
            propagation_opportunities.append({
                'N': n,
                'current_side': current_side,
                'best_larger_N': min_larger_n,
                'best_larger_side': min_larger_side,
                'potential_improvement_pct': improvement
            })

if propagation_opportunities:
    prop_df = pd.DataFrame(propagation_opportunities)
    prop_df = prop_df.sort_values('potential_improvement_pct', ascending=False)
    print(f"\nFound {len(prop_df)} N values with propagation potential:")
    print(prop_df.head(20).to_string())
else:
    print("No propagation opportunities found (all N have optimal or near-optimal side lengths)")

In [None]:
# Analyze theoretical minimum vs current
# Theoretical: trees packed perfectly with no wasted space
# Tree area ≈ 0.35 * 0.7 + 0.15 * 0.2 = 0.245 + 0.03 = 0.275 (rough estimate)
# Actually tree bounding box is about 0.7 x 1.0 = 0.7

print("\n=== EFFICIENCY ANALYSIS ===")
for n in [1, 10, 50, 100, 150, 200]:
    row = scores_sorted[scores_sorted['N'] == n].iloc[0]
    side = row['side']
    area = side * side
    tree_area = 0.275 * n  # rough estimate
    efficiency = tree_area / area * 100
    print(f"N={n:3d}: side={side:.4f}, area={area:.4f}, efficiency≈{efficiency:.1f}%")

In [None]:
# Check submission history
print("\n=== SUBMISSION HISTORY ===")
print("#1: exp_000 | CV: 135.8191 | LB: 135.8191")
print("#2: exp_001 | CV: 117.2815 | LB: 117.2815")
print("#3: exp_002 | CV: 84.8940 | LB: FAILED (overlap)")
print("#4: exp_003 | CV: 84.9010 | LB: 84.9010")
print("#5: exp_004 | CV: 84.7124 | LB: 84.7124")

print("\n=== CV-LB RELATIONSHIP ===")
print("Perfect alignment (gap = 0) for all valid submissions")
print("This is expected for deterministic optimization problem")
print("No distribution shift concerns")

print("\n=== PROGRESS TRAJECTORY ===")
print("exp_000 → exp_001: 18.5 point improvement (13.6%)")
print("exp_001 → exp_002: 32.4 point improvement (27.6%)")
print("exp_002 → exp_004: 0.19 point improvement (0.22%)")
print("\nDIMINISHING RETURNS DETECTED - need new approach!")

In [None]:
# Summary and recommendations
print("\n" + "="*60)
print("SUMMARY AND RECOMMENDATIONS")
print("="*60)

print(f"""
Current Status:
- Best score: 84.712432
- Target: 68.931058
- Gap: 15.78 points (22.9%)
- LB #1: 71.19 (crodoc achieved 74.75 with BackPacking)

Key Findings:
1. Extended optimization (57 min) yielded only 0.19 points
2. Diminishing returns from current SA approach
3. Top 20 worst N values contribute ~11% of total score
4. Perfect CV-LB alignment (no distribution shift)

Recommended Next Steps (Priority Order):

1. AGGRESSIVE BACKWARD PROPAGATION (crodoc style)
   - Current backward prop only found 2 improvements
   - crodoc achieved 74.75 with this technique
   - Key: When N performs poorly, COPY best config from larger N and drop trees
   - This propagates successful patterns across all N values

2. GRID-BASED INITIALIZATION (zaburo style)
   - For worst-performing N values (N=1, 19, 49, 21, 31...)
   - Place trees in alternating rows (0° and 180° rotation)
   - Optimize number of trees per row
   - Creates well-aligned starting point

3. TARGET WORST N VALUES
   - Focus optimization budget on top 20 worst N values
   - These contribute 11% of score but have most room for improvement
   - Run dedicated optimization with 10x iterations

4. ENSEMBLE MORE SOURCES
   - Look for additional pre-computed solutions
   - crodoc/santa2025submission dataset may have better configs
""")

In [None]:
# Check what datasets we have access to
import os

print("\n=== AVAILABLE DATA SOURCES ===")
for path in ['/home/code/data', '/home/code/research/snapshots']:
    if os.path.exists(path):
        files = os.listdir(path)
        print(f"\n{path}:")
        for f in files[:20]:
            print(f"  {f}")
        if len(files) > 20:
            print(f"  ... and {len(files) - 20} more")