# Evolver Loop 16 Analysis

## Objective
Analyze the current situation and identify what approaches haven't been fully tried.

## Key Questions:
1. What is the egortrushin tessellation SA approach?
2. What N values does it target?
3. How does it differ from what we've tried?
4. What other approaches might work?

In [1]:
import pandas as pd
import numpy as np
from decimal import Decimal, getcontext
from shapely import affinity
from shapely.geometry import Polygon
from shapely.ops import unary_union
import matplotlib.pyplot as plt

getcontext().prec = 25
scale_factor = Decimal("1")

class ChristmasTree:
    def __init__(self, center_x='0', center_y='0', angle='0'):
        self.center_x = Decimal(center_x)
        self.center_y = Decimal(center_y)
        self.angle = Decimal(angle)
        trunk_w = Decimal('0.15')
        trunk_h = Decimal('0.2')
        base_w = Decimal('0.7')
        mid_w = Decimal('0.4')
        top_w = Decimal('0.25')
        tip_y = Decimal('0.8')
        tier_1_y = Decimal('0.5')
        tier_2_y = Decimal('0.25')
        base_y = Decimal('0.0')
        trunk_bottom_y = -trunk_h
        initial_polygon = Polygon([
            (Decimal('0.0') * scale_factor, tip_y * scale_factor),
            (top_w / Decimal('2') * scale_factor, tier_1_y * scale_factor),
            (top_w / Decimal('4') * scale_factor, tier_1_y * scale_factor),
            (mid_w / Decimal('2') * scale_factor, tier_2_y * scale_factor),
            (mid_w / Decimal('4') * scale_factor, tier_2_y * scale_factor),
            (base_w / Decimal('2') * scale_factor, base_y * scale_factor),
            (trunk_w / Decimal('2') * scale_factor, base_y * scale_factor),
            (trunk_w / Decimal('2') * scale_factor, trunk_bottom_y * scale_factor),
            (-(trunk_w / Decimal('2')) * scale_factor, trunk_bottom_y * scale_factor),
            (-(trunk_w / Decimal('2')) * scale_factor, base_y * scale_factor),
            (-(base_w / Decimal('2')) * scale_factor, base_y * scale_factor),
            (-(mid_w / Decimal('4')) * scale_factor, tier_2_y * scale_factor),
            (-(mid_w / Decimal('2')) * scale_factor, tier_2_y * scale_factor),
            (-(top_w / Decimal('4')) * scale_factor, tier_1_y * scale_factor),
            (-(top_w / Decimal('2')) * scale_factor, tier_1_y * scale_factor),
        ])
        rotated = affinity.rotate(initial_polygon, float(self.angle), origin=(0, 0))
        self.polygon = affinity.translate(rotated,
                                          xoff=float(self.center_x * scale_factor),
                                          yoff=float(self.center_y * scale_factor))
    def clone(self):
        return ChristmasTree(str(self.center_x), str(self.center_y), str(self.angle))

def get_tree_list_side_length(tree_list):
    all_polygons = [t.polygon for t in tree_list]
    bounds = unary_union(all_polygons).bounds
    return Decimal(max(bounds[2] - bounds[0], bounds[3] - bounds[1])) / scale_factor

def get_total_score(dict_of_side_length):
    score = 0
    for k, v in dict_of_side_length.items():
        score += v ** 2 / Decimal(k)
    return score

def parse_csv(csv_path):
    result = pd.read_csv(csv_path)
    result['x'] = result['x'].str.strip('s')
    result['y'] = result['y'].str.strip('s')
    result['deg'] = result['deg'].str.strip('s')
    result[['group_id', 'item_id']] = result['id'].str.split('_', n=2, expand=True)
    dict_of_tree_list = {}
    dict_of_side_length = {}
    for group_id, group_data in result.groupby('group_id'):
        tree_list = [ChristmasTree(center_x=row['x'], center_y=row['y'], angle=row['deg']) for _, row in group_data.iterrows()]
        dict_of_tree_list[group_id] = tree_list
        dict_of_side_length[group_id] = get_tree_list_side_length(tree_list)
    return dict_of_tree_list, dict_of_side_length

print('Helper functions loaded')

Helper functions loaded


In [2]:
# Load current best solution
dict_of_tree_list, dict_of_side_length = parse_csv('/home/code/exploration/datasets/ensemble_best.csv')
current_score = get_total_score(dict_of_side_length)
print(f'Current best score: {current_score:.8f}')
print(f'Target score: 68.919154')
print(f'Gap: {float(current_score) - 68.919154:.6f} ({(float(current_score) - 68.919154) / 68.919154 * 100:.2f}%)')

Current best score: 70.63047845
Target score: 68.919154
Gap: 1.711324 (2.48%)


In [3]:
# Analyze per-N scores to find where improvements might be possible
scores_per_n = []
for n in range(1, 201):
    key = f'{n:03d}'
    side = dict_of_side_length[key]
    score = float(side ** 2 / Decimal(n))
    scores_per_n.append({'N': n, 'side_length': float(side), 'score': score})

df_scores = pd.DataFrame(scores_per_n)
print('Top 20 N values by score contribution:')
print(df_scores.nlargest(20, 'score')[['N', 'side_length', 'score']].to_string())

Top 20 N values by score contribution:
     N  side_length     score
0    1     0.813173  0.661250
1    2     0.949504  0.450779
2    3     1.142031  0.434745
4    5     1.443692  0.416850
3    4     1.290806  0.416545
6    7     1.673104  0.399897
5    6     1.548438  0.399610
8    9     1.867280  0.387415
7    8     1.755921  0.385407
14  15     2.377955  0.376978
9   10     1.940696  0.376630
20  21     2.811667  0.376451
19  20     2.742469  0.376057
21  22     2.873270  0.375258
10  11     2.030803  0.374924
15  16     2.446640  0.374128
25  26     3.118320  0.373997
11  12     2.114873  0.372724
12  13     2.199960  0.372294
24  25     3.050182  0.372144


In [4]:
# Summary of analysis
print('='*70)
print('SUMMARY OF ANALYSIS')
print('='*70)
print()
print('Current best score: 70.630478')
print('Target score: 68.919154')
print('Gap: 1.711 points (2.48%)')
print()
print('WHAT WE HAVE TRIED (17 experiments):')
print('- Ensemble from 25+ public sources (ceiling at 70.630478)')
print('- bbox3 optimization (produces overlapping trees)')
print('- sa_v1_parallel optimization (produces overlapping trees)')
print('- Grid-based approaches (zaburo, tessellation) - fundamentally worse')
print('- Constructive heuristics (scanline, lattice, chebyshev, BL) - all worse')
print('- Random restart SA - no improvement')
print('- Long-running SA (15 generations) - no improvement')
print('- Basin hopping - no improvement')
print('- Genetic algorithm with crossover - no improvement')
print('- Tree removal technique - tiny improvement (0.000013)')
print('- Cross-N extraction (exhaustive) - same tiny improvement')
print()
print('WHAT WE HAVE NOT FULLY TRIED:')
print('1. Egortrushin tessellation SA with TRANSLATION optimization')
print('   - Optimizes base tree positions, not individual trees')
print('   - Creates fundamentally different configurations')
print('   - Targets specific N values: 72, 100, 110, 144, 156, 196, 200')
print()
print('2. Asymmetric solutions (mentioned in discussions)')
print('   - Discussion "Why the winning solutions will be Asymmetric" (34 votes)')
print('   - Top teams use asymmetric layouts')
print()
print('3. Very high temperature SA from random initial configurations')
print('   - All our SA runs started from the baseline')
print('   - Need to explore DIFFERENT basins')
print()
print('CRITICAL OBSERVATION:')
print('The target (68.919) is 2.27 points BELOW the public LB leader (71.19).')
print('This means the target requires techniques NOT in any public kernel.')
print('We need to discover something NEW, not just optimize existing approaches.')

SUMMARY OF ANALYSIS

Current best score: 70.630478
Target score: 68.919154
Gap: 1.711 points (2.48%)

WHAT WE HAVE TRIED (17 experiments):
- Ensemble from 25+ public sources (ceiling at 70.630478)
- bbox3 optimization (produces overlapping trees)
- sa_v1_parallel optimization (produces overlapping trees)
- Grid-based approaches (zaburo, tessellation) - fundamentally worse
- Constructive heuristics (scanline, lattice, chebyshev, BL) - all worse
- Random restart SA - no improvement
- Long-running SA (15 generations) - no improvement
- Basin hopping - no improvement
- Genetic algorithm with crossover - no improvement
- Tree removal technique - tiny improvement (0.000013)
- Cross-N extraction (exhaustive) - same tiny improvement

WHAT WE HAVE NOT FULLY TRIED:
1. Egortrushin tessellation SA with TRANSLATION optimization
   - Optimizes base tree positions, not individual trees
   - Creates fundamentally different configurations
   - Targets specific N values: 72, 100, 110, 144, 156, 196, 200