# Loop 8 LB Feedback Analysis

## Submission Results
- exp_007 (rotation_backprop): CV=70.6151, LB=70.6151 (EXACT MATCH!)
- Gap to target: 1.727 points (2.45%)

## Key Observations
1. CV = LB exactly - our validation is perfect
2. Improvements are diminishing: 0.099 → 0.0067 → 0.000637
3. Local search (rotation, backward propagation) finds minimal improvements
4. The baseline is at a tight local optimum

In [1]:
import pandas as pd
import numpy as np
import json

# Load session state to analyze experiments
with open('/home/code/session_state.json', 'r') as f:
    state = json.load(f)

print("Experiment History:")
print("="*60)
for exp in state['experiments']:
    print(f"{exp['name']}: CV={exp['cv_score']:.6f}")

print("\nSubmission History:")
print("="*60)
for sub in state['submissions']:
    lb = sub.get('lb_score', 'N/A')
    error = sub.get('error', None)
    print(f"{sub['model_name']}: CV={sub['cv_score']:.6f}, LB={lb}, Error={error}")

Experiment History:
000_baseline: CV=70.615791
001_fix_overlaps: CV=70.622435
002_python_optimization: CV=70.622435
003_simulated_annealing: CV=70.622435
004_ensemble_constructive: CV=70.622435
005_multi_source_ensemble: CV=70.523320
006_validated_ensemble: CV=70.615744
007_rotation_backprop: CV=70.615107

Submission History:
000_baseline: CV=70.615791, LB=, Error=Overlapping trees in group 040
001_fix_overlaps: CV=70.622435, LB=70.622434913735, Error=None
002_python_optimization: CV=70.622435, LB=70.622434913735, Error=None
005_multi_source_ensemble: CV=70.523320, LB=, Error=Overlapping trees in group 002
006_validated_ensemble: CV=70.615744, LB=70.615743775752, Error=None
007_rotation_backprop: CV=70.615107, LB=70.615106516706, Error=None


In [2]:
# Analyze improvement trajectory
scores = [exp['cv_score'] for exp in state['experiments']]
names = [exp['name'] for exp in state['experiments']]

print("\nScore Trajectory:")
print("="*60)
for i, (name, score) in enumerate(zip(names, scores)):
    if i > 0:
        diff = scores[i-1] - score
        print(f"{name}: {score:.6f} (Δ={diff:+.6f})")
    else:
        print(f"{name}: {score:.6f} (baseline)")

print(f"\nBest score: {min(scores):.6f}")
print(f"Target: 68.888293")
print(f"Gap: {min(scores) - 68.888293:.6f} points ({(min(scores) - 68.888293)/68.888293*100:.2f}%)")


Score Trajectory:
000_baseline: 70.615791 (baseline)
001_fix_overlaps: 70.622435 (Δ=-0.006644)
002_python_optimization: 70.622435 (Δ=+0.000000)
003_simulated_annealing: 70.622435 (Δ=+0.000000)
004_ensemble_constructive: 70.622435 (Δ=+0.000000)
005_multi_source_ensemble: 70.523320 (Δ=+0.099115)
006_validated_ensemble: 70.615744 (Δ=-0.092424)
007_rotation_backprop: 70.615107 (Δ=+0.000637)

Best score: 70.523320
Target: 68.888293
Gap: 1.635027 points (2.37%)


In [3]:
# Key insight: The jonathanchan kernel uses 15+ diverse sources
# We only have 88 snapshots, and they're all from similar optimization approaches

# Let's check what sources we have vs what jonathanchan uses
print("Sources jonathanchan uses:")
print("="*60)
sources = [
    "GitHub: SmartManoj/Santa-Scoreboard",
    "Kaggle dataset: bucket-of-chump",
    "Kaggle dataset: telegram-public-shared-solution",
    "Kaggle dataset: santa25-public",
    "Kaggle dataset: santa-2025-try3",
    "15+ different notebooks with different optimization approaches"
]
for s in sources:
    print(f"  - {s}")

print("\nSources we have:")
print("="*60)
print("  - 88 snapshot submissions (similar optimization approaches)")
print("  - GitHub SmartManoj (can download!)")
print("  - Our own experiments (same local optimum)")

print("\nGAP: We need MORE DIVERSE sources!")

Sources jonathanchan uses:
  - GitHub: SmartManoj/Santa-Scoreboard
  - Kaggle dataset: bucket-of-chump
  - Kaggle dataset: telegram-public-shared-solution
  - Kaggle dataset: santa25-public
  - Kaggle dataset: santa-2025-try3
  - 15+ different notebooks with different optimization approaches

Sources we have:
  - 88 snapshot submissions (similar optimization approaches)
  - GitHub SmartManoj (can download!)
  - Our own experiments (same local optimum)

GAP: We need MORE DIVERSE sources!


In [4]:
# Download GitHub source and analyze
import subprocess

result = subprocess.run(
    ['curl', '-s', 'https://raw.githubusercontent.com/SmartManoj/Santa-Scoreboard/main/submission.csv'],
    capture_output=True, text=True
)

if result.returncode == 0:
    # Save to file
    with open('/home/code/github_smartmanoj.csv', 'w') as f:
        f.write(result.stdout)
    print("Downloaded GitHub SmartManoj submission!")
    
    # Load and analyze
    df = pd.read_csv('/home/code/github_smartmanoj.csv')
    print(f"Shape: {df.shape}")
    print(f"Columns: {df.columns.tolist()}")
    print(df.head())
else:
    print(f"Failed to download: {result.stderr}")

Downloaded GitHub SmartManoj submission!
Shape: (20100, 4)
Columns: ['id', 'x', 'y', 'deg']
      id                       x                      y  \
0  001_0  s-48.19608619421424578  s58.77098461521422479   
1  002_0    s0.15409706962136058  s-0.03854074269477708   
2  002_1   s-0.15409706962135647  s-0.56145925730522794   
3  003_0    s1.12365581614030097   s0.78110181599256301   
4  003_1    s1.23405569584216002   s1.27599950066375900   

                      deg  
0   s45.00000000000000000  
1  s203.62937773064953717  
2   s23.62937773064970415  
3  s111.12513229289299943  
4   s66.37062226934300213  


In [5]:
# Score the GitHub submission
from numba import njit
import math

@njit
def make_polygon_template():
    tw=0.15; th=0.2; bw=0.7; mw=0.4; ow=0.25
    tip=0.8; t1=0.5; t2=0.25; base=0.0; tbot=-th
    x=np.array([0,ow/2,ow/4,mw/2,mw/4,bw/2,tw/2,tw/2,-tw/2,-tw/2,-bw/2,-mw/4,-mw/2,-ow/4,-ow/2],np.float64)
    y=np.array([tip,t1,t1,t2,t2,base,base,tbot,tbot,base,base,t2,t2,t1,t1],np.float64)
    return x,y

@njit
def score_group(xs,ys,degs,tx,ty):
    n=xs.size; V=tx.size
    mnx=1e300; mny=1e300; mxx=-1e300; mxy=-1e300
    for i in range(n):
        r=degs[i]*math.pi/180.0
        c=math.cos(r); s=math.sin(r)
        xi=xs[i]; yi=ys[i]
        for j in range(V):
            X=c*tx[j]-s*ty[j]+xi
            Y=s*tx[j]+c*ty[j]+yi
            if X<mnx: mnx=X
            if X>mxx: mxx=X
            if Y<mny: mny=Y
            if Y>mxy: mxy=Y
    side=max(mxx-mnx,mxy-mny)
    return side*side/n

def strip(a):
    return np.array([float(str(v).replace("s","")) for v in a],np.float64)

tx, ty = make_polygon_template()

# Score GitHub submission
df = pd.read_csv('/home/code/github_smartmanoj.csv')
df['N'] = df['id'].astype(str).str.split('_').str[0].astype(int)

github_scores = {}
for n, g in df.groupby('N'):
    xs = strip(g['x'].to_numpy())
    ys = strip(g['y'].to_numpy())
    ds = strip(g['deg'].to_numpy())
    github_scores[n] = score_group(xs, ys, ds, tx, ty)

github_total = sum(github_scores.values())
print(f"GitHub SmartManoj total score: {github_total:.6f}")
print(f"Our best score: {min(scores):.6f}")
print(f"Difference: {min(scores) - github_total:.6f}")

GitHub SmartManoj total score: 70.743774
Our best score: 70.523320
Difference: -0.220454


In [6]:
# Compare per-N scores between GitHub and our best
our_best = pd.read_csv('/home/code/experiments/007_rotation_backprop/submission.csv')
our_best['N'] = our_best['id'].astype(str).str.split('_').str[0].astype(int)

our_scores = {}
for n, g in our_best.groupby('N'):
    xs = strip(g['x'].to_numpy())
    ys = strip(g['y'].to_numpy())
    ds = strip(g['deg'].to_numpy())
    our_scores[n] = score_group(xs, ys, ds, tx, ty)

print("Per-N comparison (GitHub vs Ours):")
print("="*60)
github_better = []
ours_better = []
for n in range(1, 201):
    g_score = github_scores.get(n, float('inf'))
    o_score = our_scores.get(n, float('inf'))
    diff = o_score - g_score
    if diff > 0.0001:
        github_better.append((n, diff))
    elif diff < -0.0001:
        ours_better.append((n, -diff))

print(f"\nN values where GitHub is better: {len(github_better)}")
for n, diff in sorted(github_better, key=lambda x: -x[1])[:10]:
    print(f"  N={n}: GitHub better by {diff:.6f}")

print(f"\nN values where we are better: {len(ours_better)}")
for n, diff in sorted(ours_better, key=lambda x: -x[1])[:10]:
    print(f"  N={n}: We are better by {diff:.6f}")

Per-N comparison (GitHub vs Ours):

N values where GitHub is better: 0

N values where we are better: 78
  N=14: We are better by 0.010855
  N=74: We are better by 0.005957
  N=93: We are better by 0.005833
  N=57: We are better by 0.005594
  N=54: We are better by 0.005060
  N=197: We are better by 0.004210
  N=87: We are better by 0.003890
  N=193: We are better by 0.003858
  N=143: We are better by 0.003551
  N=73: We are better by 0.003326


## Strategy for Next Experiment

### Key Insight
The jonathanchan kernel achieves better scores by:
1. **Ensembling from 15+ diverse sources** - not just snapshots
2. **Running C++ simulated annealing** with fractional translation
3. **Accumulating improvements** over many submissions

### What We Need to Do
1. **Download more diverse sources** - GitHub, Kaggle datasets
2. **Implement fractional translation in Python** - fine-tuning technique
3. **Create a better ensemble** with more diverse sources

### The Gap Analysis
- Our best: 70.615107
- Target: 68.888293
- Gap: 1.727 points (2.45%)

This gap is too large for local search. We need:
1. More diverse sources for ensemble
2. Or fundamentally different optimization approach