## Metrics implemented (as promised in the project plan)

We quantify structure using:
1. Grid-based density variance (binning)
2. Mean nearest-neighbour distance (PBC)
3. Number of clusters (connected components under distance threshold eps)
4. Largest cluster fraction (LCF; order parameter)

In [1]:
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().resolve().parents[0]   
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

## Test of clustering metrics

### Sanity check: metrics on random configurations

Before applying the metrics to simulation outputs, we first verify their
behaviour on a random (uniform) particle distribution.

For a random configuration, we expect:
- The nearest-neighbour distance to fluctuate around a constant value
- The largest cluster fraction (LCF) to remain small
- No dominant cluster to appear

This sanity check ensures that the metrics do not artificially detect structure
where none exists.

In [2]:
import numpy as np
from src.metrics import nearest_neighbor_distance, largest_cluster_fraction

box_size = 1.0
eps = 0.05

for i in range(5):
    positions = np.random.rand(100, 2) * box_size
    nn = nearest_neighbor_distance(positions, box_size=box_size)
    lcf = largest_cluster_fraction(positions, eps=eps, box_size=box_size)
    print(f"Run {i}: NN={nn:.3f}, LCF={lcf:.2f}")

Run 0: NN=0.046, LCF=0.10
Run 1: NN=0.048, LCF=0.07
Run 2: NN=0.046, LCF=0.07
Run 3: NN=0.056, LCF=0.05
Run 4: NN=0.049, LCF=0.06


## Test of clustering metrics

This notebook tests the basic clustering observables on a random
initial configuration (near-uniform distribution).


In [3]:
for i in range(5):
    positions = np.random.rand(100, 2) * box_size
    nn = nearest_neighbor_distance(positions, box_size=box_size)
    lc = largest_cluster_fraction(positions, eps=eps, box_size=box_size)
    print(f"Run {i}: NN={nn:.3f}, LCF={lc:.2f}")

Run 0: NN=0.052, LCF=0.06
Run 1: NN=0.048, LCF=0.05
Run 2: NN=0.047, LCF=0.06
Run 3: NN=0.049, LCF=0.05
Run 4: NN=0.050, LCF=0.06


In [4]:
from src.universe_sim import run_simulation

h_cluster = run_simulation(
    N=200,
    steps=800,
    seed=0,
    attraction=0.02,
    noise=0.01,
    interaction_range=0.6,
    repulsion_radius=0.05,
    save_every=20,
)

pos = h_cluster[-1]

In [5]:
print("nn:", nearest_neighbor_distance(pos, 1.0))
print("LCF:", largest_cluster_fraction(pos, eps=0.06, box_size=1.0))
print("dens_var:", density_variance_grid(pos, 1.0, bins=20, normalized=True))
print("n_clusters:", number_of_clusters(pos, eps=0.06, box_size=1.0, min_size=3))

nn: 0.011046733410187066
LCF: 0.995


NameError: name 'density_variance_grid' is not defined

### Consistency check of clustering metrics

We evaluate all implemented observables on a clearly clustered configuration
to verify that they provide consistent and physically meaningful results.

In a clustered state, we expect:
- a small nearest-neighbour distance,
- a large largest-cluster fraction (LCF â‰ˆ 1),
- a high density variance,
- and a small number of clusters.

This check confirms that all metrics respond coherently to the same underlying structure.

In [None]:
from src.metrics import (
    nearest_neighbor_distance,
    largest_cluster_fraction,
    density_variance_grid,
    number_of_clusters,
)

pos = h_cluster[-1]  
print("nn:", nearest_neighbor_distance(pos, 1.0))
print("LCF:", largest_cluster_fraction(pos, eps=0.06, box_size=1.0))
print("dens_var:", density_variance_grid(pos, 1.0, bins=20, normalized=True))
print("n_clusters:", number_of_clusters(pos, eps=0.06, box_size=1.0, min_size=3))

nn: 0.011046733410187066
LCF: 0.995
dens_var: 14.899999999940402
n_clusters: 1
