# Universe Model — Metrics and Validation

This notebook validates the behaviour of the universe simulation by:

- verifying that all clustering metrics behave as expected,
- checking limiting cases (random vs. clustered configurations),
- analysing how spatial structure emerges as model parameters change.

The goal is to ensure that the simulation produces meaningful and reproducible
results before it is used for systematic parameter sweeps and analysis.

## Metrics implemented (as specified in the project plan)

We quantify spatial structure using the following observables:

1. **Grid-based density variance**  
   Measures spatial inhomogeneity by binning particles on a grid.

2. **Mean nearest-neighbour distance (PBC)**  
   Captures typical inter-particle spacing under periodic boundary conditions.

3. **Number of clusters**  
   Defined as connected components under a distance threshold ε.

4. **Largest cluster fraction (LCF)**  
   Fraction of particles belonging to the largest cluster.  
   This serves as the main *order parameter* for structure formation.

In [36]:
from _bootstrap import PROJECT_ROOT, RESULTS_DIR, FIGURES_DIR, DEFAULT_SEEDS

from src.config import SimConfig, MetricsConfig
from src.universe_sim import run_simulation
from src.metrics import (
    nearest_neighbor_distance,
    largest_cluster_fraction,
    density_variance_grid,
    number_of_clusters,
)

import numpy as np

In [37]:
SIM = SimConfig(
    N=200,
    steps=800,
    save_every=20,
    attraction=0.08,
    noise=0.03,
    interaction_range=0.30,
)

MET = MetricsConfig(
    eps=0.06,
    bins=20,
    min_size=3,
    burn_frac=0.5,
)

print("[01] SIM =", SIM)
print("[01] MET =", MET)

[01] SIM = SimConfig(N=200, steps=800, dt=1.0, box_size=1.0, save_every=20, attraction=0.08, interaction_range=0.3, noise=0.03, repulsion=0.02, repulsion_radius=0.05)
[01] MET = MetricsConfig(eps=0.06, bins=20, min_size=3, burn_frac=0.5)


## A) Sanity check on a random configuration

For an approximately uniform random configuration, we expect:
- **LCF**: small (no dominant cluster),
- **NN distance**: relatively large,
- **#clusters**: comparatively large (many small components under eps),
- **density variance**: relatively small.

In [38]:
box_size = SIM.box_size
eps = MET.eps

np.random.seed(0)

print("Random configuration sanity check:")
for i in range(5):
    positions = np.random.rand(200, 2) * box_size
    nn = nearest_neighbor_distance(positions, box_size=box_size)
    lcf = largest_cluster_fraction(positions, eps=eps, box_size=box_size)
    densvar = density_variance_grid(positions, box_size=box_size, bins=MET.bins, normalized=True)
    ncl = number_of_clusters(positions, eps=eps, box_size=box_size, min_size=MET.min_size)

    print(f"Run {i}: NN={nn:.3f}, LCF={lcf:.2f}, dens_var={densvar:.2f}, n_clusters={ncl}")

Random configuration sanity check:
Run 0: NN=0.035, LCF=0.10, dens_var=2.16, n_clusters=25
Run 1: NN=0.034, LCF=0.14, dens_var=2.22, n_clusters=20
Run 2: NN=0.035, LCF=0.12, dens_var=1.70, n_clusters=26
Run 3: NN=0.033, LCF=0.15, dens_var=1.92, n_clusters=21
Run 4: NN=0.035, LCF=0.08, dens_var=2.06, n_clusters=26


## B) Consistency check on a clearly clustered configuration

We simulate a configuration where clustering is expected (moderate attraction, low-ish noise).

In a clustered state, we expect:
- **NN distance**: small,
- **LCF ≈ 1** (dominant cluster),
- **density variance**: high,
- **#clusters**: small.

In [39]:
SIM_CLUSTER = SimConfig(**{**SIM.__dict__, "attraction": 0.10, "noise": 0.01, "steps": 800, "save_every": 20})

history = run_simulation(
    N=SIM_CLUSTER.N,
    steps=SIM_CLUSTER.steps,
    dt=SIM_CLUSTER.dt,
    box_size=SIM_CLUSTER.box_size,
    attraction=SIM_CLUSTER.attraction,
    repulsion=SIM_CLUSTER.repulsion,
    repulsion_radius=SIM_CLUSTER.repulsion_radius,
    noise=SIM_CLUSTER.noise,
    interaction_range=SIM_CLUSTER.interaction_range,
    seed=0,
    save_every=SIM_CLUSTER.save_every,
)

pos = history[-1]

nn = nearest_neighbor_distance(pos, box_size=SIM_CLUSTER.box_size)
lcf = largest_cluster_fraction(pos, eps=MET.eps, box_size=SIM_CLUSTER.box_size)
densvar = density_variance_grid(pos, box_size=SIM_CLUSTER.box_size, bins=MET.bins, normalized=True)
ncl = number_of_clusters(pos, eps=MET.eps, box_size=SIM_CLUSTER.box_size, min_size=MET.min_size)

print("Clustered configuration check:")
print("NN      =", nn)
print("LCF     =", lcf)
print("dens_var=", densvar)
print("n_clust =", ncl)

Clustered configuration check:
NN      = 0.006053127676215215
LCF     = 1.0
dens_var= 34.639999999861445
n_clust = 1


## Interpretation

If the metrics are implemented correctly, the random and clustered configurations should show **coherent** behaviour:
- random: low LCF, higher NN, many clusters, low density variance  
- clustered: high LCF (≈ 1), lower NN, fewer clusters, high density variance  

This confirms the metrics respond consistently to the same underlying spatial structure.

## Relation to limiting cases

Extreme-regime sanity checks (e.g., very high noise → disordered state; zero attraction → near-uniform state) are already covered in **00_sanity_checks.ipynb**.

Here we focus on **metric consistency** rather than repeating limiting-case experiments.