# Loop 20 Strategic Analysis

## Current Situation
- **Best LB**: 70.3535 (exp_016)
- **Best CV**: 70.3166 (020_optimal_whynot_ensemble) - NOT SUBMITTED!
- **Target**: 68.8768
- **Gap**: 1.44 points (2.1%)

## Key Findings
1. The MIN_IMPROVEMENT=0.001 threshold was too conservative
2. There's a better ensemble (70.3166) ready to submit
3. 5/9 submissions failed due to overlaps - precision is critical
4. The evaluator correctly identified that why-not submission should be the base

In [1]:
import pandas as pd
import numpy as np
import json

# Load session state
with open('/home/code/session_state.json') as f:
    state = json.load(f)

# Analyze submissions
print("=== SUBMISSION HISTORY ===")
for s in state.get('submissions', []):
    status = '✅' if s.get('lb_score') else '❌'
    lb = s.get('lb_score', 'FAILED')
    error = s.get('error', '')
    print(f"{status} {s['experiment_id']}: CV={s['cv_score']:.4f}, LB={lb}, error={error[:30] if error else 'None'}")

print(f"\nTotal submissions: {len(state.get('submissions', []))}")
print(f"Successful: {sum(1 for s in state.get('submissions', []) if s.get('lb_score'))}")
print(f"Failed: {sum(1 for s in state.get('submissions', []) if not s.get('lb_score'))}")
print(f"Remaining: {state.get('remaining_submissions', 100)}")

# Best scores
print("\n=== BEST SCORES ===")
exps = sorted(state['experiments'], key=lambda x: x.get('cv_score', 999))
for e in exps[:5]:
    print(f"{e['name']}: CV={e.get('cv_score', 'N/A'):.6f}")

=== SUBMISSION HISTORY ===
❌ exp_000: CV=70.5233, LB=, error=Overlapping trees in group 002
✅ exp_001: CV=70.6151, LB=70.615101885765, error=None
✅ exp_002: CV=70.6151, LB=70.615101423027, error=None
❌ exp_007: CV=70.2657, LB=, error=Evaluation metric raised an un
❌ exp_008: CV=70.3732, LB=, error=Overlapping trees in group 002
❌ exp_009: CV=70.3411, LB=, error=Overlapping trees in group 123
✅ exp_010: CV=70.3651, LB=70.365091304619, error=None
❌ exp_013: CV=70.3421, LB=, error=Overlapping trees in group 089
✅ exp_016: CV=70.3535, LB=70.353515934637, error=None

Total submissions: 9
Successful: 4
Failed: 5
Remaining: 100

=== BEST SCORES ===
007_ensemble_fractional: CV=70.265730
009_highprec_ensemble: CV=70.341099
013_selective_threshold: CV=70.342140
019_comprehensive_external_ensemble: CV=70.343408
016_mega_ensemble_external: CV=70.353516


In [2]:
# Check the 020_optimal_whynot_ensemble
import os

ensemble_path = '/home/code/experiments/020_optimal_whynot_ensemble'
if os.path.exists(ensemble_path):
    with open(f'{ensemble_path}/metrics.json') as f:
        metrics = json.load(f)
    print("=== 020_optimal_whynot_ensemble ===")
    print(f"CV Score: {metrics.get('cv_score', 'N/A')}")
    print(f"Improvements: {metrics.get('num_improvements', 'N/A')}")
    print(f"\nThis is BETTER than best LB (70.3535) but NOT SUBMITTED!")
    print(f"Gap to target: {metrics.get('cv_score', 70.3166) - 68.8768:.4f} points")

=== 020_optimal_whynot_ensemble ===
CV Score: 70.316579
Improvements: 156

This is BETTER than best LB (70.3535) but NOT SUBMITTED!
Gap to target: 1.4398 points


In [3]:
# Analyze per-N scores to find where we can improve
from numba import njit
import math

@njit
def make_polygon_template():
    tw=0.15; th=0.2; bw=0.7; mw=0.4; ow=0.25
    tip=0.8; t1=0.5; t2=0.25; base=0.0; tbot=-th
    x=np.array([0,ow/2,ow/4,mw/2,mw/4,bw/2,tw/2,tw/2,-tw/2,-tw/2,-bw/2,-mw/4,-mw/2,-ow/4,-ow/2],np.float64)
    y=np.array([tip,t1,t1,t2,t2,base,base,tbot,tbot,base,base,t2,t2,t1,t1],np.float64)
    return x,y

@njit
def score_group(xs,ys,degs,tx,ty):
    n=xs.size; V=tx.size
    mnx=1e300; mny=1e300; mxx=-1e300; mxy=-1e300
    for i in range(n):
        r=degs[i]*math.pi/180.0
        c=math.cos(r); s=math.sin(r)
        xi=xs[i]; yi=ys[i]
        for j in range(V):
            X=c*tx[j]-s*ty[j]+xi
            Y=s*tx[j]+c*ty[j]+yi
            if X<mnx: mnx=X
            if X>mxx: mxx=X
            if Y<mny: mny=Y
            if Y>mxy: mxy=Y
    side=max(mxx-mnx,mxy-mny)
    return side*side/n

def strip(a):
    return np.array([float(str(v).replace("s","")) for v in a],np.float64)

tx, ty = make_polygon_template()

# Load current best submission
df = pd.read_csv('/home/submission/submission.csv')
df['N'] = df['id'].str.split('_').str[0].astype(int)

per_n_scores = {}
for n in range(1, 201):
    g = df[df['N'] == n]
    xs = strip(g['x'].to_numpy())
    ys = strip(g['y'].to_numpy())
    ds = strip(g['deg'].to_numpy())
    per_n_scores[n] = score_group(xs, ys, ds, tx, ty)

print("=== TOP 20 HIGHEST SCORING N VALUES ===")
sorted_n = sorted(per_n_scores.items(), key=lambda x: x[1], reverse=True)
for n, score in sorted_n[:20]:
    print(f"N={n}: {score:.6f}")

print(f"\nTotal: {sum(per_n_scores.values()):.6f}")

=== TOP 20 HIGHEST SCORING N VALUES ===
N=1: 0.661250
N=2: 0.450779
N=3: 0.434745
N=5: 0.416850
N=4: 0.416545
N=7: 0.399842
N=6: 0.399610
N=8: 0.385407
N=9: 0.383047
N=10: 0.376630
N=11: 0.374921
N=15: 0.374381
N=12: 0.372724
N=13: 0.372267
N=20: 0.371795
N=16: 0.370191
N=17: 0.370040
N=22: 0.369818
N=14: 0.369543
N=33: 0.369347

Total: 70.316579


In [4]:
# Calculate theoretical lower bound
# For N trees, the minimum bounding box is limited by the tree size
# Tree dimensions: width ~0.7, height ~1.0 (from -0.2 to 0.8)

print("=== THEORETICAL ANALYSIS ===")
print("Tree dimensions: width=0.7, height=1.0")
print("Tree area: ~0.245 (approximate)")
print("")

# For N=1, optimal is a single tree rotated 45 degrees
# Bounding box side = sqrt(0.7^2 + 1.0^2) = 1.22 at 45 degrees
# But at 0 degrees, side = 1.0 (height)
# At 45 degrees, side = sqrt(2) * max(0.7, 1.0) / sqrt(2) = 1.0... no wait
# Let me calculate properly
import math

def tree_bbox_at_angle(angle_deg):
    """Calculate bounding box of tree at given angle"""
    # Tree vertices (simplified)
    vertices = [
        (0, 0.8),  # tip
        (0.35, 0), (0.075, -0.2),  # right side
        (-0.075, -0.2), (-0.35, 0),  # left side
    ]
    
    rad = math.radians(angle_deg)
    c, s = math.cos(rad), math.sin(rad)
    
    xs = [c*x - s*y for x, y in vertices]
    ys = [s*x + c*y for x, y in vertices]
    
    return max(xs) - min(xs), max(ys) - min(ys)

# Find optimal angle for N=1
best_angle = 0
best_side = float('inf')
for angle in range(0, 360):
    w, h = tree_bbox_at_angle(angle)
    side = max(w, h)
    if side < best_side:
        best_side = side
        best_angle = angle

print(f"N=1 optimal angle: {best_angle} degrees")
print(f"N=1 optimal side: {best_side:.4f}")
print(f"N=1 optimal score: {best_side**2:.4f}")
print(f"N=1 current score: {per_n_scores[1]:.4f}")
print(f"N=1 is already optimal: {abs(per_n_scores[1] - best_side**2) < 0.001}")

=== THEORETICAL ANALYSIS ===
Tree dimensions: width=0.7, height=1.0
Tree area: ~0.245 (approximate)

N=1 optimal angle: 45 degrees
N=1 optimal side: 0.8132
N=1 optimal score: 0.6613
N=1 current score: 0.6612
N=1 is already optimal: True


## Strategic Recommendations

### IMMEDIATE ACTION: Submit 020_optimal_whynot_ensemble
The current submission in /home/submission/ has score 70.3166, which is 0.037 better than the best LB (70.3535). This should be submitted immediately.

### Gap Analysis
- Current best: 70.3166
- Target: 68.8768
- Gap: 1.44 points (2.1%)

### What Has Worked
1. **Ensemble approach** - combining best per-N from multiple sources
2. **External data** - downloading kernel outputs and datasets
3. **Lowering MIN_IMPROVEMENT threshold** - from 0.001 to 1e-10

### What Has NOT Worked
1. **Python SA/GA optimization** - baseline is at strong local optimum
2. **Exhaustive search** - too slow for N>10
3. **Random initialization** - cannot generate valid configurations

### Path Forward
1. **Submit the current best** (70.3166) to get LB feedback
2. **Continue ensemble building** with more external sources
3. **Focus on high-scoring N values** (N=1-10 contribute most to score)
4. **Consider running bbox3** for extended periods if available