# Loop 4 Analysis: Strategic Assessment

## Key Questions:
1. What is the safe_ensemble score and does it pass validation?
2. What would happen if we run C++ optimizers on the safe_ensemble (not baseline)?
3. What is the theoretical minimum score and how far are we?
4. What techniques from top kernels haven't been tried?

In [5]:
import sys
sys.path.insert(0, '/home/code')

import pandas as pd
import numpy as np
from utils import load_submission, score_submission, verify_submission_no_overlaps
import json

# Load session state
with open('/home/code/session_state.json', 'r') as f:
    state = json.load(f)

print('=== EXPERIMENT HISTORY ===')
for exp in state['experiments']:
    print(f"{exp['id']}: {exp['name']} | CV: {exp['cv_score']:.6f} | LB: {exp.get('lb_score', 'N/A')}")
    if exp.get('used_baseline_fallback'):
        print(f"   -> FALLBACK: approach_score was {exp.get('approach_score', 'N/A')}")

=== EXPERIMENT HISTORY ===
exp_000: 000_baseline | CV: 70.676102 | LB: None
exp_001: 001_ensemble | CV: 70.615744 | LB: None
exp_002: 002_fixed_ensemble | CV: 70.615786 | LB: None
exp_003: 003_cpp_optimization | CV: 70.676102 | LB: None
   -> FALLBACK: approach_score was 70.676102


In [6]:
print('\n=== SUBMISSION HISTORY ===')
for sub in state['submissions']:
    print(f"{sub['experiment_id']}: CV={sub['cv_score']:.6f} | LB={sub.get('lb_score', 'N/A')} | Error: {sub.get('error', 'None')}")


=== SUBMISSION HISTORY ===
exp_000: CV=70.676102 | LB=70.676102398091 | Error: None
exp_001: CV=70.615744 | LB= | Error: Overlapping trees in group 002
exp_002: CV=70.615786 | LB= | Error: Overlapping trees in group 003


In [7]:
# Check safe_ensemble
safe_df = load_submission('/home/code/experiments/003_safe_ensemble/submission.csv')
safe_score, safe_scores_by_n, _ = score_submission(safe_df, check_overlaps=False)
print(f'\n=== SAFE ENSEMBLE ===')
print(f'Score: {safe_score:.6f}')
print(f'Gap to target (68.888293): {safe_score - 68.888293:.6f} points ({(safe_score - 68.888293)/68.888293*100:.2f}%)')

# Verify no overlaps
is_valid, overlapping = verify_submission_no_overlaps(safe_df)
print(f'Is valid (no overlaps): {is_valid}')
print(f'Overlapping N values: {overlapping}')


=== SAFE ENSEMBLE ===
Score: 70.615788
Gap to target (68.888293): 1.727495 points (2.51%)


Is valid (no overlaps): True
Overlapping N values: []


In [None]:
# Compare baseline vs safe_ensemble per-N
baseline_df = load_submission('/home/code/experiments/000_baseline/submission.csv')
baseline_score, baseline_scores_by_n, _ = score_submission(baseline_df, check_overlaps=False)

print('\n=== PER-N COMPARISON (safe_ensemble vs baseline) ===')\nprint('N values where safe_ensemble is BETTER:')\nbetter_count = 0\nfor n in range(1, 201):\n    safe_n = safe_scores_by_n.get(n, {}).get('score', 0) if isinstance(safe_scores_by_n.get(n), dict) else safe_scores_by_n.get(n, 0)\n    base_n = baseline_scores_by_n.get(n, {}).get('score', 0) if isinstance(baseline_scores_by_n.get(n), dict) else baseline_scores_by_n.get(n, 0)\n    if safe_n < base_n - 1e-8:\n        better_count += 1\n        if better_count <= 10:\n            print(f'  N={n}: safe={safe_n:.6f} vs base={base_n:.6f} (diff={base_n-safe_n:.6f})')\nprint(f'Total N values where safe_ensemble is better: {better_count}')

In [None]:
# Theoretical minimum analysis
print('\n=== THEORETICAL MINIMUM ANALYSIS ===')

# For each N, the minimum possible score is sqrt(N) * tree_area / bounding_box_area
# But we need to calculate the actual minimum bounding box

# Tree dimensions (from the kernel)
tree_height = 1.0  # 0.8 + 0.2 trunk
tree_width = 0.7  # base width

# For N=1, optimal is 45 degree rotation
# Bounding box at 45 degrees: sqrt(2) * max(h, w) = sqrt(2) * 1.0 = 1.414
# Score = 1.414 / sqrt(1) = 1.414... but actual is 0.661?

# Let me check the actual scoring formula
print('Checking N=1 score from baseline:')
n1_score = baseline_scores_by_n.get(1, 0)
print(f'N=1 score: {n1_score:.6f}')

# The score is the side length of the bounding box
# For N=1, the minimum bounding box side is achieved at 45 degrees
# At 45 degrees, the tree fits in a box of side ~0.66 (from the actual data)

In [None]:
# Calculate the total theoretical minimum
# The theoretical minimum is when all trees are packed with 100% efficiency
# For irregular shapes, this is impossible, but we can estimate

print('\n=== SCORE BREAKDOWN BY N RANGE ===')
ranges = [(1, 10), (11, 50), (51, 100), (101, 150), (151, 200)]

for start, end in ranges:
    safe_sum = sum(safe_scores_by_n.get(n, 0) for n in range(start, end+1))
    base_sum = sum(baseline_scores_by_n.get(n, 0) for n in range(start, end+1))
    print(f'N={start}-{end}: safe={safe_sum:.4f} vs base={base_sum:.4f} (diff={base_sum-safe_sum:.4f})')

print(f'\nTotal: safe={safe_score:.6f} vs base={baseline_score:.6f}')
print(f'Improvement: {baseline_score - safe_score:.6f} points')

In [None]:
# Key insight: The safe_ensemble is 0.060 points better than baseline
# But we need 1.79 points to reach target
# That's 30x more improvement needed!

print('\n=== GAP ANALYSIS ===')
print(f'Current best (safe_ensemble): {safe_score:.6f}')
print(f'Target: 68.888293')
print(f'Gap: {safe_score - 68.888293:.6f} points')
print(f'Improvement so far (from baseline): {baseline_score - safe_score:.6f} points')
print(f'Improvement still needed: {safe_score - 68.888293:.6f} points')
print(f'Ratio: {(safe_score - 68.888293) / (baseline_score - safe_score):.1f}x more improvement needed')

In [None]:
# What would it take to reach the target?
# We need to improve by 1.727 points across 200 N values
# That's an average of 0.0086 points per N value

print('\n=== WHAT WOULD IT TAKE? ===')
needed_improvement = safe_score - 68.888293
avg_per_n = needed_improvement / 200
print(f'Average improvement needed per N: {avg_per_n:.6f}')

# Check which N values have the most room for improvement
print('\nN values with highest scores (most room for improvement):')
n_scores = [(n, safe_scores_by_n.get(n, 0)) for n in range(1, 201)]
n_scores.sort(key=lambda x: x[1], reverse=True)
for n, score in n_scores[:10]:
    print(f'  N={n}: score={score:.6f}')

In [None]:
# Summary
print('\n' + '='*60)
print('LOOP 4 ANALYSIS SUMMARY')
print('='*60)
print(f'''
1. SAFE ENSEMBLE STATUS:
   - Score: {safe_score:.6f}
   - Valid (no overlaps): {is_valid}
   - NOT YET SUBMITTED to Kaggle
   - Should be submitted to verify it passes validation

2. IMPROVEMENT ANALYSIS:
   - Baseline: {baseline_score:.6f}
   - Safe ensemble: {safe_score:.6f}
   - Improvement: {baseline_score - safe_score:.6f} points
   - Target: 68.888293
   - Gap: {safe_score - 68.888293:.6f} points
   - Need {(safe_score - 68.888293) / (baseline_score - safe_score):.1f}x more improvement

3. KEY INSIGHT:
   - The C++ optimizers were run on BASELINE (local optimum)
   - They should be run on SAFE_ENSEMBLE (different starting point)
   - The safe_ensemble has different configurations that may be improvable

4. NEXT STEPS:
   a) Submit safe_ensemble to verify it passes Kaggle
   b) If it passes, run C++ optimizers on safe_ensemble
   c) Implement fix_direction from bbox3 runner kernel
   d) Consider tessellation patterns for large N
''')