# 6 · Spectral Statistics and Scale Regimes

**Observational record associated with the book**  
*Discovering Chaos in Prime Numbers — Computational Investigations through the Euler Mirror*   
© Alvaro Costa, 2025

This notebook is part of a **canonical sequence of computational records**.  
It introduces **no new hypotheses, conjectures, or interpretative models**.  

Its sole purpose is to **record** the behaviour of arithmetic structures under an  
**explicit, deterministic, and reproducible observational regime**.

The complete conceptual discussion is provided in the book.  
This notebook documents **only the corresponding experiment**.

**Licence:** Creative Commons BY–NC–ND 4.0  
Reading, execution, and citation are permitted.  
Modification, adapted redistribution, or independent commercial use are not permitted.


---

## 1. Operational introduction

In the previous chapters, we constructed the deterministic operator $ M $ and analysed its visual structure. We now shift the focus from aesthetic inspection  
to **statistical analysis**.

The aim of this laboratory is to record how the relationships between the eigenvalues of $ M $ behave as the **regime of observation** is modified. We will  
show that the observed order does not depend solely on the underlying arithmetic object, but on the way scale enters the construction of the operator.

No hypothesis of universality is assumed at this stage. The tools are introduced so that the reader may **observe the phenomenon** before any theoretical  
interpretation is attempted.

---

## 2. The spacing measure

The principal tool of this chapter is the **Nearest–Neighbour Spacing Distribution**, denoted by $ P(s) $.

Let $ \{\lambda_i\}_{i=1}^N $ be the real eigenvalues of the operator $ M $, ordered increasingly,

$
\lambda_1 \le \lambda_2 \le \cdots \le \lambda_N .
$

We define the spacing between consecutive eigenvalues as

$
s_i = \lambda_{i+1} - \lambda_i .
$

These spacings are not analysed in absolute terms. Prior to statistical analysis, a normalisation of the spectral density is applied, a procedure known as  
*unfolding*, whose purpose is to remove variations in the mean level density and to allow comparisons between distinct spectra.

In this notebook, unfolding is implemented in two different ways, depending on the observation regime:

* **locally**, on the linear scale, to compensate for residual density variations;
* **globally**, on the logarithmic scale, where the mean density is already approximately stabilised.

After this normalisation, the histogram of the spacings $ s_i $ empirically defines the distribution $ P(s) $, which will be used as the primary observational  
instrument throughout the experiment.

### What to observe

* **Noise regime (Poisson):**  
  The eigenvalues behave approximately independently.  
  The probability of very small spacings is maximal.

* **Correlated regime (level repulsion):**  
  The eigenvalues avoid each other in an organised manner.  
  The probability of zero spacing is strongly suppressed,
  producing a characteristic correlated structure.

---

## 3. Experimental protocol: the impact of scale

The widget below allows a controlled exploration of the effect of the **observation scale** on the spectral statistics of the operator.

Three parameters can be adjusted:

* **$ N $** — the number of sampled points and, consequently, the dimension of the operator;
* **$ X_0 $** — the central point of the arithmetic region under analysis;
* **`span`** — the extent of the observation window on the logarithmic scale.

For each choice of these parameters, the experiment constructs a **single spectral operator**, evaluated over **two distinct sets of sampling points**,  
corresponding to **two observation metrics** applied to the same region of the number line:

1. **Linear scale**  
   The sampling points are distributed uniformly in $ x $, corresponding to direct observation on the additive number line.

2. **Logarithmic scale**  
   The points are distributed uniformly in $ \ln x $, aligning the observation with the multiplicative structure implicit in the operator.

The results are presented **side by side**, allowing a direct visual comparison between the emerging statistical regimes.

The displayed plots are **observational records**. No theoretical interpretation is imposed at this stage.

The purpose is to allow the reader to **observe directly** how the statistics of spectral spacings respond to a change in the measuring ruler, while keeping  
both the operator and the analysed arithmetic region fixed.

> Although the operator is intrinsically logarithmic, observation on the linear scale is retained as a contrast experiment, allowing one to verify how the loss  
> of phase coherence destroys spectral correlation.


In [1]:
# ==========================================================
# NOTEBOOK 06 — SCALE TRANSITION AND SPECTRAL STRUCTURE
#
# Operator:
#   M_ij = cos(Δπ(x_i) ln x_j) + cos(Δπ(x_j) ln x_i)
#
# Difference between observation modes:
#   - Logarithmic: multiplicative window
#   - Linear:      same window, observed additively
#
# Requirements: numpy, matplotlib, ipywidgets
# ==========================================================

import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact

# --- 1. PRIME GENERATION ---
def generate_primes_upto(n):
    if n < 2:
        return np.array([], dtype=np.int64)
    size = (n - 1) // 2
    sieve = np.ones(size, dtype=bool)
    for i in range(int(np.sqrt(n)) // 2):
        if sieve[i]:
            p = 2*i + 3
            sieve[(p*p-3)//2::p] = False
    return np.concatenate(([2], 2*np.where(sieve)[0] + 3))

# --- 2. COMPUTATION FUNCTIONS ---
def get_delta_pi(x_points, primes):
    x_int = np.floor(x_points).astype(int)
    pi_x = np.searchsorted(primes, x_int, side='right')
    pi_x_div_2 = np.searchsorted(primes, x_int // 2, side='right')
    return pi_x - 2 * pi_x_div_2

def generate_matrix(fx, x_values):
    logx = np.log(x_values.astype(np.float64))
    C = np.cos(np.outer(fx.astype(np.float64), logx))
    M = C + C.T
    M = (M - M.mean()) / M.std()
    return 0.5 * (M + M.T)

def normalize_spacings(lam, local=False):
    lam_bulk = np.sort(lam)[int(0.1*len(lam)):int(0.9*len(lam))]
    s = np.diff(lam_bulk)
    s = s[s > 0]
    if local:
        w = 21
        pad = w // 2
        s_padded = np.pad(s, (pad, pad), mode='reflect')
        local_mean = np.convolve(s_padded, np.ones(w)/w, mode='valid')
        return s / local_mean
    return s / s.mean()

# --- 3. INTERACTIVE FUNCTION ---
def plot_observational(N=2048, log_X0=7, span=2.4):

    X0 = int(10**log_X0)
    max_x = int(X0 * np.exp(span/2)) + N
    primes = generate_primes_upto(max_x)

    # Linear scale
    x_lin = np.arange(X0, X0 + N)
    s_lin = normalize_spacings(
        np.linalg.eigh(
            generate_matrix(get_delta_pi(x_lin, primes), x_lin)
        )[0],
        local=True
    )

    # Logarithmic scale
    x_log = np.exp(np.linspace(np.log(X0) - span/2,
                               np.log(X0) + span/2, N))
    s_log = normalize_spacings(
        np.linalg.eigh(
            generate_matrix(get_delta_pi(x_log, primes), x_log)
        )[0],
        local=False
    )

    # Auxiliary curves (no theoretical labels)
    s_grid = np.linspace(0, 4, 400)
    curve_A = (np.pi * s_grid / 2) * np.exp(-np.pi * s_grid**2 / 4)
    curve_B = np.exp(-s_grid)

    # --- PLOTTING ---
    fig, axes = plt.subplots(1, 2, figsize=(16, 6), sharey=True)

    configs = [
        (s_lin, "a) Linear Scale"),
        (s_log, "b) Logarithmic Scale")
    ]

    for ax, (data, title) in zip(axes, configs):
        ax.hist(
            data[data <= 4],
            bins=50,
            density=True,
            alpha=0.85,
            histtype='stepfilled',
            edgecolor='black',
            linewidth=0.3,
            label='Observed distribution'
        )
        ax.plot(s_grid, curve_A, 'r--', lw=2.2, label='Curve A')
        ax.plot(s_grid, curve_B, 'k:', lw=2.2, label='Curve B')
        ax.set_title(title, fontsize=14)
        ax.set_xlim(0, 4)
        ax.set_xlabel('s')
        ax.legend()

    fig.suptitle(f"N = {N},  X₀ = 10^{log_X0}", fontsize=16)
    plt.tight_layout(rect=[0, 0, 1, 0.95])
    plt.show()

# --- 4. WIDGETS ---
interact(
    plot_observational,
    N=widgets.Dropdown(
        options=[512, 1024, 2048],
        value=2048,
        description='N'
    ),
    log_X0=widgets.IntSlider(
        min=3, max=8, step=1,
        value=7,
        description='X₀ = 10^',
        continuous_update=False
    ),
    span=widgets.FloatSlider(
        min=1.0, max=4.0, step=0.1,
        value=2.4,
        description='Span (log)',
        continuous_update=False
    )
);


interactive(children=(Dropdown(description='N', index=2, options=(512, 1024, 2048), value=2048), IntSlider(val…

---

## 4. Methodological transparency: the computational mechanism

To ensure full reproducibility of the results, the code operates under a strict protocol, divided into four fundamental stages:

1. **Generation and sampling**  
   An optimised sieve is used to identify prime numbers and to compute the arithmetic signal $\Delta_\pi(x)$. The sampling strategy is determined by the  
   scale regime: in **linear** mode, the sampling points are contiguous; in **logarithmic** mode, the points are distributed exponentially so as to align  
   with the average density of primes.

3. **Construction of the operator**
   The operator is constructed as a harmonic correlation matrix, in which each entry corresponds to the projection of the arithmetic signal onto a logarithmic  
   frequency. The matrix is explicitly symmetrised $(M = M^T)$, guaranteeing a well-defined real spectrum.

5. **Spectral filtering (*bulk analysis*)**  
   Only the central 90% of the spectrum is retained. Edge regions, which are more susceptible to numerical truncation effects, are discarded in order to avoid  
   statistical artefacts.

7. **Unfolding (normalisation)**  
   The spacings are normalised by the mean spectral density. This procedure does not create correlations nor induce universality; it merely removes scale effects  
   that could obscure the statistical regime already encoded in the operator.

> **Unfolding does not create GOE behaviour — it merely prevents the mean density from hiding it.**

---

## 5. Parameters of the observation protocol

The experimental protocol adopted in this notebook is deliberately simple. The construction of the operator is kept fixed, and only the **observation regime**  
is varied.

No statistical regularisation parameters are introduced, nor are advanced unfolding techniques or adaptive filtering methods employed. The aim is to render the  
phenomenon visible before any refined quantification is attempted.

### The `span` parameter

The parameter `span` controls the extent of the observation window on the logarithmic scale. It defines the multiplicative ratio between the minimum and maximum  
values of $x$ included in the analysis, thereby determining￼ determining the effective width of the arithmetic region under examination.

In the code, this choice is translated into a common physical window for both observation modes, ensuring that both analyse **the same region of the number line**,  
albeit under different metrics.

The value of `span` is intentionally chosen to be relatively broad in this notebook, so as to allow any structural correlations, if present, to manifest clearly.

### Absence of jitter and additional parameters

No perturbative (*jitter*) parameters are used, nor are local smoothing techniques or statistical resampling procedures applied.

The sampling grid is deterministic, and the normalisation applied to the spacings is minimal. This choice avoids the premature introduction of technical hypotheses  
and preserves a direct reading of the effect of scale on the spectrum.

More refined scalar metrics, such as $\langle r \rangle$ or $\Sigma^2(L)$, are deliberately deferred to later stages of the work.

---

## 6. Methodological observation — dependence on the observation ruler

This notebook implements an **elementary observational protocol**, in which the only relevant degree of freedom is the scale of observation.

The operator is kept fixed, and no hypothesis of universality is assumed or tested.

The spectral operator employed is constructed from the projection

$$
M_{ij} =
\cos\,\bigl(\Delta_\pi(x_i)\,\log x_j\bigr)
+
\cos\,\bigl(\Delta_\pi(x_j)\,\log x_i\bigr),
$$

that is, its internal structure is explicitly **logarithmic** with respect to the observation variable $x$.

This choice is not neutral.

As a direct consequence, the operator enters into structural resonance only when the sampling points are distributed in a manner coherent with the logarithmic  
scale. When the observation is performed on a linear scale, phase coherence is broken, and the spectrum loses long-range correlation.

In summary:

* the operator does **not** enforce correlated statistics;
* correlated statistics **emerge only** when the observation ruler is compatible with the multiplicative structure implicit in the operator.

The observed distinction between regimes should not be interpreted as an ontological change of the system, but as a direct consequence of the **adequacy (or  
inadequacy) of the observation ruler to the structure of the operator**.

> Statistical universality is not an absolute property of the operator, but of the **relationship between operator and observation**.
