# Evolver Loop 6 Analysis

## Situation Summary

**Current Status:**
- Best CV/LB score: 70.659959 (saspav baseline)
- Target: 68.919154
- Gap: 1.74 points (2.5%)
- Submissions used: 2/100 (93 remaining)

**Key Findings from 6 experiments:**
1. Saspav baseline (70.659959) is the best VALID source
2. C++ SA optimizer shows 0 improvement on pre-optimized solutions
3. Lattice construction produces WORSE scores (88.33 → 85.93 after SA)
4. Comprehensive ensemble provides 0 improvement - saspav wins for all 200 N values
5. Many CSV files have overlapping trees (invalid)

**Critical Insight:**
We've been searching for pre-optimized solutions but NOT actively optimizing them.
The Eazy optimizer and other sophisticated techniques have NOT been applied to our baseline.

In [None]:
# Check what C++ optimizers are available
import os
import subprocess

# List available optimizers
print("=== Available C++ Optimizers ===")
for root, dirs, files in os.walk('/home/code/experiments'):
    for f in files:
        if f.endswith('.cpp') or (not '.' in f and os.access(os.path.join(root, f), os.X_OK)):
            full_path = os.path.join(root, f)
            size = os.path.getsize(full_path)
            print(f"{full_path}: {size/1024:.1f}KB")

print("\n=== Research Kernels ===")
for d in os.listdir('/home/code/research/kernels'):
    if os.path.isdir(f'/home/code/research/kernels/{d}'):
        print(f"  {d}")

In [None]:
# Check the Eazy optimizer C++ code
with open('/home/code/research/kernels/jazivxt_eazy-optimizer/eazy-optimizer.ipynb', 'r') as f:
    import json
    nb = json.load(f)
    
# Find the C++ code cell
for cell in nb['cells']:
    if cell['cell_type'] == 'code':
        source = ''.join(cell['source'])
        if 'writefile eazy.cpp' in source:
            # Extract key techniques
            print("=== Eazy Optimizer Key Techniques ===")
            if 'apply_square_pressure' in source:
                print("✓ Square Potential Gradient (pushes toward center)")
            if 'Elastic Pulse' in source:
                print("✓ Elastic Pulse (periodic squeeze/relax)")
            if 'Complex Orbital Move' in source:
                print("✓ Complex Orbital Moves (rotation in complex plane)")
            if 'scales' in source and '1e-3' in source:
                print("✓ Multi-scale optimization (1e-3 → 1e-9)")
            if '250000' in source:
                print("✓ 250,000 iterations per N")
            if 'omp parallel' in source:
                print("✓ OpenMP parallelization")
            break

In [None]:
# Check if we can compile the Eazy optimizer
import subprocess

# Create a working directory for the optimizer
os.makedirs('/home/code/experiments/007_eazy_optimizer', exist_ok=True)

# Check if g++ with OpenMP is available
result = subprocess.run(['g++', '--version'], capture_output=True, text=True)
print("G++ version:")
print(result.stdout[:200])

# Check OpenMP support
result = subprocess.run(['g++', '-fopenmp', '-v'], capture_output=True, text=True)
if 'openmp' in result.stderr.lower():
    print("\n✓ OpenMP support available")
else:
    print("\n⚠ OpenMP may not be available")

In [None]:
# Analyze per-N efficiency to identify where improvements are most needed
import pandas as pd
import numpy as np

# Load baseline
df = pd.read_csv('/home/code/external_data/saspav/santa-2025.csv')

# Tree geometry
TX = [0, 0.125, 0.0625, 0.2, 0.1, 0.35, 0.075, 0.075, -0.075, -0.075, -0.35, -0.1, -0.2, -0.0625, -0.125]
TY = [0.8, 0.5, 0.5, 0.25, 0.25, 0, 0, -0.2, -0.2, 0, 0, 0.25, 0.25, 0.5, 0.5]

def parse_value(s):
    if isinstance(s, str) and s.startswith('s'):
        return float(s[1:])
    return float(s)

def compute_side_for_n(df, n):
    prefix = f"{n:03d}_"
    trees = df[df['id'].str.startswith(prefix)]
    if len(trees) != n:
        return float('inf')
    
    all_points = []
    for _, row in trees.iterrows():
        x = parse_value(row['x'])
        y = parse_value(row['y'])
        deg = parse_value(row['deg'])
        angle_rad = np.radians(deg)
        cos_a, sin_a = np.cos(angle_rad), np.sin(angle_rad)
        for tx, ty in zip(TX, TY):
            px = tx * cos_a - ty * sin_a + x
            py = tx * sin_a + ty * cos_a + y
            all_points.append((px, py))
    
    all_points = np.array(all_points)
    return max(all_points.max(axis=0) - all_points.min(axis=0))

# Compute per-N scores
per_n_data = []
for n in range(1, 201):
    side = compute_side_for_n(df, n)
    score = side**2 / n
    efficiency = score / 0.355  # Theoretical minimum is ~0.355
    per_n_data.append({'n': n, 'side': side, 'score': score, 'efficiency': efficiency})

per_n_df = pd.DataFrame(per_n_data)
print("Per-N Analysis:")
print(f"Total score: {per_n_df['score'].sum():.6f}")
print(f"\nWorst efficiency (most room for improvement):")
print(per_n_df.nsmallest(10, 'efficiency')[['n', 'side', 'score', 'efficiency']])

In [None]:
# Calculate potential improvement if we could match best efficiency
best_efficiency = per_n_df['efficiency'].min()
print(f"\nBest efficiency: {best_efficiency:.4f} at N={per_n_df.loc[per_n_df['efficiency'].idxmin(), 'n']}")

# If all N matched best efficiency
theoretical_best = 0.355 * best_efficiency * 200
print(f"If all N matched best efficiency: {theoretical_best:.2f}")

# Improvement needed per N to reach target
target = 68.919154
current = per_n_df['score'].sum()
gap = current - target
print(f"\nCurrent: {current:.6f}")
print(f"Target: {target:.6f}")
print(f"Gap: {gap:.6f}")
print(f"Average improvement needed per N: {gap/200:.6f}")

In [None]:
# Identify N values with most potential for improvement
per_n_df['improvement_potential'] = per_n_df['score'] - 0.355  # vs theoretical minimum
per_n_df['contribution_pct'] = per_n_df['score'] / per_n_df['score'].sum() * 100

print("\nN values with highest improvement potential:")
print(per_n_df.nlargest(20, 'improvement_potential')[['n', 'score', 'improvement_potential', 'contribution_pct']])

## Strategy Analysis

### What We've Tried (All Failed to Improve):
1. ❌ Ensemble from existing CSVs - saspav wins for all N
2. ❌ C++ SA optimizer on pre-optimized - 0 improvement
3. ❌ Lattice construction - much worse (88.33)
4. ❌ Lattice + SA - still worse (85.93)

### What We HAVEN'T Tried:
1. **Eazy Optimizer** - Uses sophisticated techniques:
   - Square potential gradient (pushes toward center)
   - Elastic pulse (periodic squeeze/relax)
   - Complex orbital moves
   - Multi-scale optimization (1e-3 → 1e-9)
   - 250,000 iterations per N with 20-second timeout

2. **Different SA parameters** - The sa_v1_parallel may need different settings

3. **Focus on specific N values** - Small N (1-20) have worst efficiency

### Recommended Next Experiment:
**Compile and run the Eazy optimizer on the saspav baseline**

This is the highest-leverage action because:
1. It uses fundamentally different optimization techniques
2. It's designed specifically for this problem
3. It runs 250,000 iterations per N (much more than our previous attempts)
4. It uses multi-scale approach (1e-3 → 1e-9) for fine-tuning