# Loop 16 Strategic Analysis

## Key Findings from Research

1. **bbox3 parameters**: Top kernels use `-n 1000-2000` iterations and `-r 30-90` restarts
2. **Our bbox3 run**: We ran bbox3 but didn't log parameters - likely default settings
3. **Gap**: 70.365 → 68.878 = 1.49 points (2.1%)
4. **Improvement from bbox3**: Only 0.000045 (0.00006% of gap)

## Critical Issue: We're at a LOCAL OPTIMUM

All optimization approaches (SA, bbox3, fractional translation) give tiny improvements.
This means the baseline is at a STRONG local optimum.

## What Top Teams Do Differently

1. **Asymmetric solutions** - Discussion says winning solutions will be asymmetric
2. **Multiple external data sources** - 15+ sources vs our 5
3. **Aggressive bbox3 runs** - 3 hours of continuous optimization
4. **fix_direction** - Rotation tightening after optimization

In [None]:
import sys
sys.path.insert(0, '/home/code')
import pandas as pd
import numpy as np
import json
from pathlib import Path

# Load current best submission
df = pd.read_csv('/home/code/experiments/010_safe_ensemble/submission.csv')
print(f'Total rows: {len(df)}')
print(df.head())

In [None]:
# Analyze score distribution by N
from code.tree_geometry import calculate_score
from code.utils import parse_submission

configs = parse_submission(df)

scores = []
for n in range(1, 201):
    score = calculate_score(configs[n])
    scores.append({'n': n, 'score': score, 'contribution': score})

scores_df = pd.DataFrame(scores)
print('Top 20 N values by score contribution:')
print(scores_df.nlargest(20, 'score'))

In [None]:
# Check what N values have the most room for improvement
# Theoretical minimum: side = sqrt(n * tree_area) for perfect packing
# Tree area ≈ 0.35 * 0.8 = 0.28 (rough estimate)

tree_area = 0.28  # approximate

scores_df['theoretical_min_side'] = np.sqrt(scores_df['n'] * tree_area)
scores_df['theoretical_min_score'] = scores_df['theoretical_min_side']**2 / scores_df['n']
scores_df['gap_to_theoretical'] = scores_df['score'] - scores_df['theoretical_min_score']

print('N values with largest gap to theoretical minimum:')
print(scores_df.nlargest(20, 'gap_to_theoretical')[['n', 'score', 'theoretical_min_score', 'gap_to_theoretical']])

In [None]:
# Check total score
total_score = scores_df['score'].sum()
theoretical_total = scores_df['theoretical_min_score'].sum()

print(f'Current total score: {total_score:.6f}')
print(f'Theoretical minimum: {theoretical_total:.6f}')
print(f'Gap: {total_score - theoretical_total:.6f}')
print(f'Gap %: {(total_score - theoretical_total) / total_score * 100:.2f}%')

In [None]:
# Check what external data sources we have
import os

external_dir = '/home/code/external_data'
if os.path.exists(external_dir):
    files = os.listdir(external_dir)
    print(f'External data files: {len(files)}')
    for f in files[:20]:
        print(f'  - {f}')
else:
    print('No external_data directory')

In [None]:
# Check snapshots
snapshots_dir = '/home/nonroot/snapshots'
if os.path.exists(snapshots_dir):
    snapshot_files = []
    for root, dirs, files in os.walk(snapshots_dir):
        for f in files:
            if f.endswith('.csv'):
                snapshot_files.append(os.path.join(root, f))
    print(f'Total snapshot CSV files: {len(snapshot_files)}')
else:
    print('No snapshots directory')

In [None]:
# Key insight: We need to find DIFFERENT solutions, not optimize existing ones
# The top kernels use 15+ external data sources
# Let's check what datasets are available on Kaggle

print('\n=== STRATEGIC ANALYSIS ===')
print('\nCurrent situation:')
print(f'  - Best LB score: 70.365091')
print(f'  - Target: 68.878195')
print(f'  - Gap: 1.487 points (2.1%)')
print(f'  - bbox3 improvement: 0.000045 (0.003% of gap)')

print('\nProblem:')
print('  - We are at a STRONG local optimum')
print('  - All optimization approaches give tiny improvements')
print('  - At current rate, would need 33,000 bbox3 runs to close gap')

print('\nSolution paths:')
print('  1. MORE EXTERNAL DATA - Top kernels use 15+ sources')
print('  2. ASYMMETRIC SOLUTIONS - Discussion says winning solutions are asymmetric')
print('  3. LONGER BBOX3 RUNS - 3 hours with proper parameters')
print('  4. FIX_DIRECTION - Rotation tightening after optimization')

In [None]:
# Check if we have the fix_direction capability
# This is a key technique from top kernels

print('\n=== FIX_DIRECTION ANALYSIS ===')
print('\nWhat fix_direction does:')
print('  - After placing trees, rotate entire configuration')
print('  - Find angle that minimizes bounding box')
print('  - Can give 0.1-0.5% improvement per N')

print('\nImplementation needed:')
print('  1. Get convex hull of all tree polygons')
print('  2. Use scipy.optimize.minimize_scalar to find best rotation angle')
print('  3. Apply rotation to all trees')
print('  4. Recalculate score')