# Loop 20 LB Feedback Analysis

## Submission Results
- **Experiment**: 022_cpp_sa_2M_all_n
- **CV Score**: 70.6276
- **LB Score**: 70.6276
- **Gap**: 0.0000 (perfect CV-LB alignment - expected for deterministic optimization)

## Current Status
- **Best Score**: 70.6276
- **Target**: 68.919154
- **Gap to Target**: 1.708 (2.48%)

## Key Observations from 22 Experiments
1. All SA-based approaches converge to ~70.63
2. Asymmetric solutions from scratch FAILED (worse than baseline)
3. C++ SA with 2M iterations finds tiny improvements (0.0003 per run)
4. At current rate, would need ~6,500 runs to reach target (infeasible)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Analyze submission history
submissions = [
    ('exp_000', 70.6473, 70.6473),
    ('exp_002', 70.6473, 70.6473),
    ('exp_009', 70.6305, 70.6305),
    ('exp_010', 70.6305, 70.6305),
    ('exp_017', 70.6305, 70.6305),
    ('exp_018', 70.6305, 70.6305),
    ('exp_019', 70.6279, 70.6279),
    ('exp_020', 70.6276, 70.6276),
]

print('Submission History Analysis:')
print('=' * 50)
for exp, cv, lb in submissions:
    print(f'{exp}: CV={cv:.4f}, LB={lb:.4f}, gap={lb-cv:.6f}')

# Calculate improvement rate
improvements = []
for i in range(1, len(submissions)):
    imp = submissions[i-1][1] - submissions[i][1]
    if imp > 0:
        improvements.append(imp)
        
print(f'\nTotal improvement: {submissions[0][1] - submissions[-1][1]:.6f}')
print(f'Average improvement per submission: {np.mean(improvements):.6f}')
print(f'\nGap to target: {submissions[-1][1] - 68.919154:.6f}')
print(f'Submissions needed at current rate: {(submissions[-1][1] - 68.919154) / np.mean(improvements):.0f}')

In [None]:
# Analyze per-N scores to find optimization opportunities
from shapely.geometry import Polygon
from shapely.affinity import rotate, translate

TREE_TEMPLATE = [
    (0.0, 0.8), (0.125, 0.5), (0.0625, 0.5), (0.2, 0.25), (0.1, 0.25),
    (0.35, 0.0), (0.075, 0.0), (0.075, -0.2), (-0.075, -0.2), (-0.075, 0.0),
    (-0.35, 0.0), (-0.1, 0.25), (-0.2, 0.25), (-0.0625, 0.5), (-0.125, 0.5)
]

def parse_s_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return float(val[1:])
    return float(val)

def create_tree_polygon(x, y, angle):
    tree = Polygon(TREE_TEMPLATE)
    tree = rotate(tree, angle, origin=(0, 0), use_radians=False)
    tree = translate(tree, x, y)
    return tree

def get_n_side(df, n):
    group = df[df['n'] == n]
    all_x, all_y = [], []
    for _, row in group.iterrows():
        tree = create_tree_polygon(row['x'], row['y'], row['deg'])
        minx, miny, maxx, maxy = tree.bounds
        all_x.extend([minx, maxx])
        all_y.extend([miny, maxy])
    return max(max(all_x) - min(all_x), max(all_y) - min(all_y)) if all_x else 0

# Load current best
df = pd.read_csv('/home/submission/submission.csv')
df['x'] = df['x'].apply(parse_s_value)
df['y'] = df['y'].apply(parse_s_value)
df['deg'] = df['deg'].apply(parse_s_value)
df['n'] = df['id'].apply(lambda x: int(x.split('_')[0]))

print('Current best score:', sum((get_n_side(df, n)**2)/n for n in range(1, 201)))

In [None]:
# Calculate theoretical minimum and efficiency gap
# Theoretical minimum: if all trees could be packed with perfect efficiency

# Tree area (approximate)
tree_area = 0.5 * 0.7 * 1.0  # Rough triangle approximation
print(f'Approximate tree area: {tree_area:.4f}')

# For N trees, minimum square side would be sqrt(N * tree_area)
theoretical_scores = []
actual_scores = []
efficiency_gaps = []

for n in range(1, 201):
    # Theoretical minimum (perfect packing)
    min_side = np.sqrt(n * tree_area)
    theoretical_score = (min_side ** 2) / n
    
    # Actual score
    actual_side = get_n_side(df, n)
    actual_score = (actual_side ** 2) / n
    
    theoretical_scores.append(theoretical_score)
    actual_scores.append(actual_score)
    efficiency_gaps.append(actual_score - theoretical_score)

print(f'\nTheoretical minimum total: {sum(theoretical_scores):.4f}')
print(f'Actual total: {sum(actual_scores):.4f}')
print(f'Efficiency gap: {sum(efficiency_gaps):.4f}')
print(f'\nTarget: 68.919154')
print(f'Theoretical minimum: {sum(theoretical_scores):.4f}')
print(f'Gap between target and theoretical: {68.919154 - sum(theoretical_scores):.4f}')

In [None]:
# Identify N values with largest efficiency gaps (most room for improvement)
n_analysis = []
for n in range(1, 201):
    actual_side = get_n_side(df, n)
    actual_score = (actual_side ** 2) / n
    
    # Efficiency = side / sqrt(n) - lower is better
    efficiency = actual_side / np.sqrt(n)
    
    n_analysis.append({
        'n': n,
        'side': actual_side,
        'score': actual_score,
        'efficiency': efficiency
    })

df_analysis = pd.DataFrame(n_analysis)

# Sort by efficiency (worst first)
df_worst = df_analysis.sort_values('efficiency', ascending=False).head(20)
print('Top 20 N values with worst efficiency (most room for improvement):')
print(df_worst.to_string(index=False))

# Sort by score contribution (highest first)
df_highest = df_analysis.sort_values('score', ascending=False).head(20)
print('\nTop 20 N values with highest score contribution:')
print(df_highest.to_string(index=False))

## Strategic Analysis

### What We've Tried (22 experiments):
1. **Ensemble from multiple sources** - Found saspav_best.csv with 14 better N values (0.017 improvement)
2. **bbox3 optimization** - Negligible improvement (0.000001)
3. **SA optimization** - Converges to same local optimum
4. **Zaburo grid** - 25% worse than baseline
5. **Tessellation** - Worse than baseline
6. **Random restart SA** - No improvement
7. **Basin hopping** - No improvement
8. **Constraint programming** - No improvement
9. **Asymmetric solutions** - FAILED (worse than baseline)
10. **C++ SA (nicupetridean)** - Small improvements (0.0003 per run)

### What's Working:
- C++ SA finds tiny improvements on specific N values (N=35, N=64, N=88)
- Ensemble from multiple sources

### What's NOT Working:
- Generating solutions from scratch (all worse than baseline)
- Asymmetric placements
- Different initial configurations

### Key Insight:
The baseline is at an EXTREMELY strong local optimum. All approaches that try to generate new solutions from scratch produce WORSE results. The only way to improve is to:
1. Find better pre-optimized solutions from other sources
2. Run very long SA on the current best

### Gap Analysis:
- Current: 70.6276
- Target: 68.919154
- Gap: 1.708 (2.48%)
- At current improvement rate (0.0003/run), need ~5,700 runs

In [None]:
# Check if there are any unexplored sources
import os

# List all CSV files in snapshots
snapshots_dir = '/home/nonroot/snapshots'
if os.path.exists(snapshots_dir):
    csv_files = [f for f in os.listdir(snapshots_dir) if f.endswith('.csv')]
    print(f'Found {len(csv_files)} CSV files in snapshots')
    
    # Check a few for their scores
    for f in csv_files[:5]:
        try:
            snap_df = pd.read_csv(os.path.join(snapshots_dir, f))
            snap_df['x'] = snap_df['x'].apply(parse_s_value)
            snap_df['y'] = snap_df['y'].apply(parse_s_value)
            snap_df['deg'] = snap_df['deg'].apply(parse_s_value)
            snap_df['n'] = snap_df['id'].apply(lambda x: int(x.split('_')[0]))
            score = sum((get_n_side(snap_df, n)**2)/n for n in range(1, 201))
            print(f'  {f}: {score:.6f}')
        except Exception as e:
            print(f'  {f}: Error - {e}')
else:
    print('No snapshots directory found')

## Conclusion

The target of 68.919154 requires a **1.708 point improvement** (2.48%).

Current approaches are finding **0.0003 improvement per run** - this is **5,700x too slow**.

### The Problem:
1. All public solutions are at the same local optimum (~70.63)
2. Generating new solutions from scratch produces WORSE results
3. SA-based optimization finds only tiny improvements

### Potential Solutions:
1. **Find better pre-optimized solutions** - Check if there are private solutions with better scores
2. **Implement fundamentally different algorithms** - Not SA variations
3. **Focus on specific N values** - Some N values may have more room for improvement
4. **Longer optimization runs** - But diminishing returns

### Next Steps:
1. Submit current best (70.6276) to verify LB score
2. Research what techniques top teams use (scores below 69)
3. Try implementing Chebyshev distance / scan-line packing mentioned in research