# Loop 16 Strategic Analysis

## Key Findings from Research

1. **bbox3 parameters**: Top kernels use `-n 1000-2000` iterations and `-r 30-90` restarts
2. **Our bbox3 run**: We ran bbox3 but didn't log parameters - likely default settings
3. **Gap**: 70.365 → 68.878 = 1.49 points (2.1%)
4. **Improvement from bbox3**: Only 0.000045 (0.00006% of gap)

## Critical Issue: We're at a LOCAL OPTIMUM

All optimization approaches (SA, bbox3, fractional translation) give tiny improvements.
This means the baseline is at a STRONG local optimum.

## What Top Teams Do Differently

1. **Asymmetric solutions** - Discussion says winning solutions will be asymmetric
2. **Multiple external data sources** - 15+ sources vs our 5
3. **Aggressive bbox3 runs** - 3 hours of continuous optimization
4. **fix_direction** - Rotation tightening after optimization

In [1]:
import sys
sys.path.insert(0, '/home/code')
import pandas as pd
import numpy as np
import json
from pathlib import Path

# Load current best submission
df = pd.read_csv('/home/code/experiments/010_safe_ensemble/submission.csv')
print(f'Total rows: {len(df)}')
print(df.head())

Total rows: 20100
      id                          x                         y  \
0  001_0  s-48.19608619421424577922  s58.77098461521422478882   
1  002_0    s0.15409706962136429653  s-0.03854074269478543341   
2  002_1   s-0.15409706962136429653  s-0.56145925730521462071   
3  003_0    s1.12365581614030096702   s0.78110181599256300888   
4  003_1    s1.23405569584216001644   s1.27599950066375900093   

                         deg  
0   s45.00000000000000000000  
1  s203.62937773065684154972  
2   s23.62937773065679181173  
3  s111.12513229289299943048  
4   s66.37062226934300213088  


In [2]:
# Analyze score distribution by N
from code.tree_geometry import calculate_score
from code.utils import parse_submission

configs = parse_submission(df)

scores = []
for n in range(1, 201):
    score = calculate_score(configs[n])
    scores.append({'n': n, 'score': score, 'contribution': score})

scores_df = pd.DataFrame(scores)
print('Top 20 N values by score contribution:')
print(scores_df.nlargest(20, 'score'))

Top 20 N values by score contribution:
     n     score  contribution
0    1  0.661250      0.661250
1    2  0.450779      0.450779
2    3  0.434745      0.434745
4    5  0.416850      0.416850
3    4  0.416545      0.416545
6    7  0.399897      0.399897
5    6  0.399610      0.399610
7    8  0.385407      0.385407
8    9  0.383130      0.383130
14  15  0.376949      0.376949
9   10  0.376630      0.376630
10  11  0.374924      0.374924
11  12  0.372724      0.372724
12  13  0.372294      0.372294
20  21  0.372174      0.372174
19  20  0.371795      0.371795
15  16  0.370213      0.370213
16  17  0.370040      0.370040
21  22  0.369832      0.369832
13  14  0.369543      0.369543


In [3]:
# Check what N values have the most room for improvement
# Theoretical minimum: side = sqrt(n * tree_area) for perfect packing
# Tree area ≈ 0.35 * 0.8 = 0.28 (rough estimate)

tree_area = 0.28  # approximate

scores_df['theoretical_min_side'] = np.sqrt(scores_df['n'] * tree_area)
scores_df['theoretical_min_score'] = scores_df['theoretical_min_side']**2 / scores_df['n']
scores_df['gap_to_theoretical'] = scores_df['score'] - scores_df['theoretical_min_score']

print('N values with largest gap to theoretical minimum:')
print(scores_df.nlargest(20, 'gap_to_theoretical')[['n', 'score', 'theoretical_min_score', 'gap_to_theoretical']])

N values with largest gap to theoretical minimum:
     n     score  theoretical_min_score  gap_to_theoretical
0    1  0.661250                   0.28            0.381250
1    2  0.450779                   0.28            0.170779
2    3  0.434745                   0.28            0.154745
4    5  0.416850                   0.28            0.136850
3    4  0.416545                   0.28            0.136545
6    7  0.399897                   0.28            0.119897
5    6  0.399610                   0.28            0.119610
7    8  0.385407                   0.28            0.105407
8    9  0.383130                   0.28            0.103130
14  15  0.376949                   0.28            0.096949
9   10  0.376630                   0.28            0.096630
10  11  0.374924                   0.28            0.094924
11  12  0.372724                   0.28            0.092724
12  13  0.372294                   0.28            0.092294
20  21  0.372174                   0.28           

In [4]:
# Check total score
total_score = scores_df['score'].sum()
theoretical_total = scores_df['theoretical_min_score'].sum()

print(f'Current total score: {total_score:.6f}')
print(f'Theoretical minimum: {theoretical_total:.6f}')
print(f'Gap: {total_score - theoretical_total:.6f}')
print(f'Gap %: {(total_score - theoretical_total) / total_score * 100:.2f}%')

Current total score: 70.365091
Theoretical minimum: 56.000000
Gap: 14.365091
Gap %: 20.42%


In [5]:
# Check what external data sources we have
import os

external_dir = '/home/code/external_data'
if os.path.exists(external_dir):
    files = os.listdir(external_dir)
    print(f'External data files: {len(files)}')
    for f in files[:20]:
        print(f'  - {f}')
else:
    print('No external_data directory')

External data files: 9
  - bbox3
  - submission.csv
  - santa-2025.csv
  - submission visualization.pdf
  - submission_best.csv
  - shake_public
  - 70.378875862989_20260126_045659.csv
  - 72.49.csv
  - 71.97.csv


In [6]:
# Check snapshots
snapshots_dir = '/home/nonroot/snapshots'
if os.path.exists(snapshots_dir):
    snapshot_files = []
    for root, dirs, files in os.walk(snapshots_dir):
        for f in files:
            if f.endswith('.csv'):
                snapshot_files.append(os.path.join(root, f))
    print(f'Total snapshot CSV files: {len(snapshot_files)}')
else:
    print('No snapshots directory')

Total snapshot CSV files: 3810


In [7]:
# Key insight: We need to find DIFFERENT solutions, not optimize existing ones
# The top kernels use 15+ external data sources
# Let's check what datasets are available on Kaggle

print('\n=== STRATEGIC ANALYSIS ===')
print('\nCurrent situation:')
print(f'  - Best LB score: 70.365091')
print(f'  - Target: 68.878195')
print(f'  - Gap: 1.487 points (2.1%)')
print(f'  - bbox3 improvement: 0.000045 (0.003% of gap)')

print('\nProblem:')
print('  - We are at a STRONG local optimum')
print('  - All optimization approaches give tiny improvements')
print('  - At current rate, would need 33,000 bbox3 runs to close gap')

print('\nSolution paths:')
print('  1. MORE EXTERNAL DATA - Top kernels use 15+ sources')
print('  2. ASYMMETRIC SOLUTIONS - Discussion says winning solutions are asymmetric')
print('  3. LONGER BBOX3 RUNS - 3 hours with proper parameters')
print('  4. FIX_DIRECTION - Rotation tightening after optimization')


=== STRATEGIC ANALYSIS ===

Current situation:
  - Best LB score: 70.365091
  - Target: 68.878195
  - Gap: 1.487 points (2.1%)
  - bbox3 improvement: 0.000045 (0.003% of gap)

Problem:
  - We are at a STRONG local optimum
  - All optimization approaches give tiny improvements
  - At current rate, would need 33,000 bbox3 runs to close gap

Solution paths:
  1. MORE EXTERNAL DATA - Top kernels use 15+ sources
  2. ASYMMETRIC SOLUTIONS - Discussion says winning solutions are asymmetric
  3. LONGER BBOX3 RUNS - 3 hours with proper parameters
  4. FIX_DIRECTION - Rotation tightening after optimization


In [8]:
# Check if we have the fix_direction capability
# This is a key technique from top kernels

print('\n=== FIX_DIRECTION ANALYSIS ===')
print('\nWhat fix_direction does:')
print('  - After placing trees, rotate entire configuration')
print('  - Find angle that minimizes bounding box')
print('  - Can give 0.1-0.5% improvement per N')

print('\nImplementation needed:')
print('  1. Get convex hull of all tree polygons')
print('  2. Use scipy.optimize.minimize_scalar to find best rotation angle')
print('  3. Apply rotation to all trees')
print('  4. Recalculate score')


=== FIX_DIRECTION ANALYSIS ===

What fix_direction does:
  - After placing trees, rotate entire configuration
  - Find angle that minimizes bounding box
  - Can give 0.1-0.5% improvement per N

Implementation needed:
  1. Get convex hull of all tree polygons
  2. Use scipy.optimize.minimize_scalar to find best rotation angle
  3. Apply rotation to all trees
  4. Recalculate score


In [9]:
# Check scores of all external data sources
import os
import glob

external_csvs = []
for pattern in ['/home/code/external_data/**/*.csv', '/home/code/external_data/*.csv']:
    external_csvs.extend(glob.glob(pattern, recursive=True))

print(f'Found {len(external_csvs)} external CSV files')

scores = {}
for csv_path in external_csvs:
    try:
        df = pd.read_csv(csv_path)
        if 'id' in df.columns and 'x' in df.columns:
            configs = parse_submission(df)
            total = sum(calculate_score(configs[n]) for n in range(1, 201))
            scores[csv_path] = total
            print(f'{os.path.basename(csv_path)}: {total:.6f}')
    except Exception as e:
        print(f'{os.path.basename(csv_path)}: ERROR - {e}')

Found 18 external CSV files


submission.csv: 70.647327


santa-2025.csv: 70.348933


submission_best.csv: 70.926150


70.378875862989_20260126_045659.csv: 70.378876


72.49.csv: 72.495739


71.97.csv: 71.972027


72.49.csv: 72.495739


71.97.csv: 71.972027


santa-2025.csv: 70.331237


submission_best.csv: 70.926150


70.378875862989_20260126_045659.csv: 70.378876


submission.csv: 70.647327


submission.csv: 70.647327


santa-2025.csv: 70.348933


submission_best.csv: 70.926150


70.378875862989_20260126_045659.csv: 70.378876


72.49.csv: 72.495739


71.97.csv: 71.972027


In [11]:
# Compare saspav santa-2025.csv with our exp_010
saspav_df = pd.read_csv('/home/code/external_data/saspav_csv/santa-2025.csv')
exp010_df = pd.read_csv('/home/code/experiments/010_safe_ensemble/submission.csv')

saspav_configs = parse_submission(saspav_df)
exp010_configs = parse_submission(exp010_df)

print('Per-N comparison (saspav vs exp_010):')
print('N values where saspav is better:')
better_n = []
for n in range(1, 201):
    saspav_score = calculate_score(saspav_configs[n])
    exp010_score = calculate_score(exp010_configs[n])
    diff = exp010_score - saspav_score
    if diff > 0.0001:
        better_n.append((n, diff, saspav_score, exp010_score))
        
print(f'Found {len(better_n)} N values where saspav is better')
for n, diff, saspav_score, exp010_score in sorted(better_n, key=lambda x: -x[1])[:20]:
    print(f'  N={n}: saspav={saspav_score:.6f} vs exp010={exp010_score:.6f} (diff={diff:.6f})')

print(f'\nTotal potential improvement: {sum(d[1] for d in better_n):.6f}')

Per-N comparison (saspav vs exp_010):
N values where saspav is better:
Found 71 N values where saspav is better
  N=21: saspav=0.368687 vs exp010=0.372174 (diff=0.003487)
  N=123: saspav=0.345275 vs exp010=0.347544 (diff=0.002269)
  N=67: saspav=0.348406 vs exp010=0.349775 (diff=0.001369)
  N=187: saspav=0.339006 vs exp010=0.340234 (diff=0.001229)
  N=87: saspav=0.348837 vs exp010=0.349960 (diff=0.001122)
  N=94: saspav=0.345592 vs exp010=0.346691 (diff=0.001100)
  N=69: saspav=0.353528 vs exp010=0.354528 (diff=0.001000)
  N=112: saspav=0.343525 vs exp010=0.344519 (diff=0.000994)
  N=47: saspav=0.356613 vs exp010=0.357493 (diff=0.000880)
  N=116: saspav=0.342636 vs exp010=0.343513 (diff=0.000877)
  N=139: saspav=0.340148 vs exp010=0.340978 (diff=0.000830)
  N=78: saspav=0.350652 vs exp010=0.351401 (diff=0.000749)
  N=50: saspav=0.360004 vs exp010=0.360753 (diff=0.000749)
  N=49: saspav=0.362683 vs exp010=0.363430 (diff=0.000747)
  N=145: saspav=0.342137 vs exp010=0.342839 (diff=0.00070