# Loop 20 Analysis: Understanding the Gap

Current best: 70.627634
Target: 68.919154
Gap: 1.708 (2.42%)

After 21 experiments, ALL approaches converge to ~70.63. We need to understand:
1. Where is the score coming from? (which N values contribute most)
2. What is the theoretical minimum?
3. What approaches haven't been tried?

In [1]:
import pandas as pd
import numpy as np
from shapely.geometry import Polygon
from shapely.affinity import rotate, translate
import matplotlib.pyplot as plt

TREE_TEMPLATE = [
    (0.0, 0.8), (0.125, 0.5), (0.0625, 0.5), (0.2, 0.25), (0.1, 0.25),
    (0.35, 0.0), (0.075, 0.0), (0.075, -0.2), (-0.075, -0.2), (-0.075, 0.0),
    (-0.35, 0.0), (-0.1, 0.25), (-0.2, 0.25), (-0.0625, 0.5), (-0.125, 0.5)
]

def parse_s_value(val):
    if isinstance(val, str) and val.startswith('s'):
        return float(val[1:])
    return float(val)

def create_tree_polygon(x, y, angle):
    tree = Polygon(TREE_TEMPLATE)
    tree = rotate(tree, angle, origin=(0, 0), use_radians=False)
    tree = translate(tree, x, y)
    return tree

# Load current best
df = pd.read_csv('/home/submission/submission.csv')
df['x'] = df['x'].apply(parse_s_value)
df['y'] = df['y'].apply(parse_s_value)
df['deg'] = df['deg'].apply(parse_s_value)
df['n'] = df['id'].apply(lambda x: int(x.split('_')[0]))

print(f"Loaded {len(df)} trees")

Loaded 20100 trees


In [2]:
# Calculate per-N scores and analyze
scores = {}
side_lengths = {}
for n in range(1, 201):
    group = df[df['n'] == n]
    trees = [create_tree_polygon(row['x'], row['y'], row['deg']) for _, row in group.iterrows()]
    all_x, all_y = [], []
    for tree in trees:
        minx, miny, maxx, maxy = tree.bounds
        all_x.extend([minx, maxx])
        all_y.extend([miny, maxy])
    side = max(max(all_x) - min(all_x), max(all_y) - min(all_y))
    side_lengths[n] = side
    scores[n] = (side ** 2) / n

total_score = sum(scores.values())
print(f"Total score: {total_score:.6f}")
print(f"Target: 68.919154")
print(f"Gap: {total_score - 68.919154:.6f}")

Total score: 70.627634
Target: 68.919154
Gap: 1.708480


In [3]:
# Analyze score distribution
print("\n=== SCORE DISTRIBUTION ===")
print(f"N=1-10: {sum(scores[n] for n in range(1,11)):.4f} ({100*sum(scores[n] for n in range(1,11))/total_score:.1f}%)")
print(f"N=11-50: {sum(scores[n] for n in range(11,51)):.4f} ({100*sum(scores[n] for n in range(11,51))/total_score:.1f}%)")
print(f"N=51-100: {sum(scores[n] for n in range(51,101)):.4f} ({100*sum(scores[n] for n in range(51,101))/total_score:.1f}%)")
print(f"N=101-200: {sum(scores[n] for n in range(101,201)):.4f} ({100*sum(scores[n] for n in range(101,201))/total_score:.1f}%)")

# Top 10 N values by score contribution
print("\n=== TOP 10 N VALUES BY SCORE ===")
sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
for n, score in sorted_scores[:10]:
    print(f"N={n}: {score:.6f} (side={side_lengths[n]:.4f})")


=== SCORE DISTRIBUTION ===
N=1-10: 4.3291 (6.1%)
N=11-50: 14.7048 (20.8%)
N=51-100: 17.6144 (24.9%)
N=101-200: 33.9794 (48.1%)

=== TOP 10 N VALUES BY SCORE ===
N=1: 0.661250 (side=0.8132)
N=2: 0.450779 (side=0.9495)
N=3: 0.434745 (side=1.1420)
N=5: 0.416850 (side=1.4437)
N=4: 0.416545 (side=1.2908)
N=7: 0.399897 (side=1.6731)
N=6: 0.399610 (side=1.5484)
N=9: 0.387415 (side=1.8673)
N=8: 0.385407 (side=1.7559)
N=15: 0.376978 (side=2.3780)


In [4]:
# Calculate theoretical minimum (single tree area * N / N = single tree area)
# For a single tree at optimal angle (45 degrees), the bounding box is:
single_tree = create_tree_polygon(0, 0, 45)
minx, miny, maxx, maxy = single_tree.bounds
single_tree_side = max(maxx - minx, maxy - miny)
single_tree_area = single_tree.area

print(f"\n=== THEORETICAL ANALYSIS ===")
print(f"Single tree at 45 deg: side={single_tree_side:.6f}, area={single_tree_area:.6f}")
print(f"Single tree bounding box area: {single_tree_side**2:.6f}")

# For N trees, theoretical minimum is when packing efficiency is 100%
# Efficiency = (N * tree_area) / (side^2)
# If efficiency = 1, then side = sqrt(N * tree_area)
print(f"\nTheoretical minimum scores (100% packing efficiency):")
theoretical_min = 0
for n in range(1, 201):
    min_side = np.sqrt(n * single_tree_area)
    min_score = (min_side ** 2) / n
    theoretical_min += min_score

print(f"Theoretical minimum total: {theoretical_min:.6f}")
print(f"Current efficiency: {theoretical_min / total_score * 100:.1f}%")
print(f"Gap to theoretical: {total_score - theoretical_min:.6f}")


=== THEORETICAL ANALYSIS ===
Single tree at 45 deg: side=0.813173, area=0.245625
Single tree bounding box area: 0.661250

Theoretical minimum scores (100% packing efficiency):
Theoretical minimum total: 49.125000
Current efficiency: 69.6%
Gap to theoretical: 21.502634


In [5]:
# Analyze efficiency per N
print("\n=== EFFICIENCY ANALYSIS ===")
efficiencies = {}
for n in range(1, 201):
    actual_area = side_lengths[n] ** 2
    theoretical_area = n * single_tree_area
    efficiency = theoretical_area / actual_area * 100
    efficiencies[n] = efficiency

# Find N values with worst efficiency (most room for improvement)
print("\nN values with WORST efficiency (most room for improvement):")
sorted_eff = sorted(efficiencies.items(), key=lambda x: x[1])
for n, eff in sorted_eff[:15]:
    potential_improvement = scores[n] * (1 - eff/100)
    print(f"N={n}: efficiency={eff:.1f}%, score={scores[n]:.6f}, potential improvement={potential_improvement:.6f}")


=== EFFICIENCY ANALYSIS ===

N values with WORST efficiency (most room for improvement):
N=1: efficiency=37.1%, score=0.661250, potential improvement=0.415625
N=2: efficiency=54.5%, score=0.450779, potential improvement=0.205154
N=3: efficiency=56.5%, score=0.434745, potential improvement=0.189120
N=5: efficiency=58.9%, score=0.416850, potential improvement=0.171225
N=4: efficiency=59.0%, score=0.416545, potential improvement=0.170920
N=7: efficiency=61.4%, score=0.399897, potential improvement=0.154272
N=6: efficiency=61.5%, score=0.399610, potential improvement=0.153985
N=9: efficiency=63.4%, score=0.387415, potential improvement=0.141790
N=8: efficiency=63.7%, score=0.385407, potential improvement=0.139782
N=15: efficiency=65.2%, score=0.376978, potential improvement=0.131353
N=10: efficiency=65.2%, score=0.376630, potential improvement=0.131005
N=21: efficiency=65.2%, score=0.376451, potential improvement=0.130826
N=20: efficiency=65.3%, score=0.376057, potential improvement=0.130

In [6]:
# What would it take to reach the target?
target = 68.919154
gap = total_score - target

print(f"\n=== GAP ANALYSIS ===")
print(f"Current: {total_score:.6f}")
print(f"Target: {target:.6f}")
print(f"Gap: {gap:.6f}")
print(f"Gap %: {gap/total_score*100:.2f}%")

# If we improve all N values uniformly
print(f"\nTo reach target with uniform improvement:")
print(f"  Each N needs {gap/200:.6f} improvement")
print(f"  That's {gap/200/np.mean(list(scores.values()))*100:.2f}% per N")

# If we focus on worst efficiency N values
print(f"\nTo reach target by fixing worst efficiency N values:")
cumulative_improvement = 0
for n, eff in sorted_eff:
    potential = scores[n] * (1 - eff/100)
    cumulative_improvement += potential
    if cumulative_improvement >= gap:
        print(f"  Need to fix N=1 to N={n} (worst {n} N values)")
        break


=== GAP ANALYSIS ===
Current: 70.627634
Target: 68.919154
Gap: 1.708480
Gap %: 2.42%

To reach target with uniform improvement:
  Each N needs 0.008542 improvement
  That's 2.42% per N

To reach target by fixing worst efficiency N values:
  Need to fix N=1 to N=8 (worst 8 N values)


In [7]:
# Summary
print("\n" + "="*60)
print("LOOP 20 ANALYSIS SUMMARY")
print("="*60)
print(f"Current best: {total_score:.6f}")
print(f"Target: {target:.6f}")
print(f"Gap: {gap:.6f} ({gap/total_score*100:.2f}%)")
print(f"\nTheoretical minimum: {theoretical_min:.6f}")
print(f"Current efficiency: {theoretical_min/total_score*100:.1f}%")
print(f"\nKey insight: The gap is 2.42% of total score.")
print(f"This requires finding fundamentally better configurations,")
print(f"not just optimizing existing ones.")
print(f"\nAfter 21 experiments, ALL approaches converge to ~70.63.")
print(f"The baseline is at an EXTREMELY STRONG LOCAL OPTIMUM.")
print(f"\nPossible paths forward:")
print(f"1. Find a completely different structural approach")
print(f"2. Focus on specific N values with worst efficiency")
print(f"3. Use exact solvers for small N")
print(f"4. Try asymmetric configurations (per discussion)")
print(f"5. Use corner-rebuild technique from chistyakov kernel")


LOOP 20 ANALYSIS SUMMARY
Current best: 70.627634
Target: 68.919154
Gap: 1.708480 (2.42%)

Theoretical minimum: 49.125000
Current efficiency: 69.6%

Key insight: The gap is 2.42% of total score.
This requires finding fundamentally better configurations,
not just optimizing existing ones.

After 21 experiments, ALL approaches converge to ~70.63.
The baseline is at an EXTREMELY STRONG LOCAL OPTIMUM.

Possible paths forward:
1. Find a completely different structural approach
2. Focus on specific N values with worst efficiency
3. Use exact solvers for small N
4. Try asymmetric configurations (per discussion)
5. Use corner-rebuild technique from chistyakov kernel
