# Clustering Measurements

{{ Triumvirate }} provides the algorithms for computing clustering statistics
in both Fourier and configuration space and in both local and global
plane-parallel approximations (see [Background](../background.rst) for
details).

The usage of these measurement algorithms is all very similar, so as
an example we will mainly consider the bispectrum measurement below, and
briefly mention the differences for other measurements.

In [1]:
from triumvirate.threept import compute_bispec

## Ingredients

There are a number of inputs for measurements:

  - catalogue objects (as {py:class}`~triumvirate.catalogue.ParticleCatalogue`;
    see also [Particle Catalogue](./Catalogue.ipynb));
    
  - measurement parameters (as {py:class}`~triumvirate.parameters.ParameterSet`
    or passed/overridden by keyword arguments; see also
    [Parameter Set](./Parameters.ipynb));
    
  - optional logger (as {py:class}`logging.Logger`; see also
    [Customised Logger](./Logger.ipynb)).

We will reuse `trv_logger`, `parameter_set` and  `binning` created in the
[Customised Logger](./Logger.ipynb), [Parameter Set](./Parameters.ipynb)
and [Binnig Scheme](./Binning.ipynb) tutorials as inputs.

In [2]:
from triumvirate.logger import setup_logger
from triumvirate.parameters import ParameterSet
from triumvirate.dataobjs import Binning

# Demo logger
trv_logger = setup_logger()

# Demo parameter set
try:
    parameter_set = ParameterSet(param_filepath="parameter_template.yml")
except OSError:
    from triumvirate.parameters import fetch_paramset_template

    parameter_dict = fetch_paramset_template('dict')

    for ax_name in ['x', 'y', 'z']:
        parameter_dict['boxsize'][ax_name] = 1000.
        parameter_dict['ngrid'][ax_name] = 64

    parameter_dict.update({
        'catalogue_type': 'sim',
        'statistic_type': 'bispec',
        'degrees'       : {'ell1': 0, 'ell2': 0, 'ELL': 0},
        'range'         : [0.005, 0.105],
        'num_bins'      : 10,
    })

    parameter_set = ParameterSet(param_dict=parameter_dict)

# Demo binning
binning = Binning('fourier', 'lin', bin_min=0.005, bin_max=0.105, num_bins=10)

[2025-04-13 23:44:37 (+00:00:00) [0;34mSTAT[0m C++] Parameters validated.


In addition, we have used ``nbodykit`` to produce three types of
mock catalogues:

- The first is a simulation-like log-normal catalogue `catalogue_sim`
  in a cubic box of size $L = 1000\,h^{-1}\,\mathrm{Mpc}$ with number density
  $\bar{n} = 5 \times 10^{-4} \,h^3\,\mathrm{Mpc}^{-3}$. The input cosmological
  parameters are $h = 0.6736, \Omega_{\mathrm{CDM},0} = 0.2645, 
  \Omega_{\mathrm{b},0} = 0.04930, A_s = 2.083 \times 10^{-9}$ and
  $n_s = 0.9649$, and the linear power spectrum at redshift $z = 1$ with
  linear tracer bias $b_1 = 2$ is used.

- The second is a survey-like catalogue `catalogue_survey` based on
  the simulation-like one, with the catalogue cut to the inscribing sphere of
  radius $L/2$ inside the cubic box.

- The third is a uniform random catalogue `catalogue_rand` with
  number density $5 \bar{n}$ in the same spherical volume as
  the survey-like one.

Following the [Particle Catalogue](./Catalogue.ipynb) tutorial, these
catalogues are instantiated as
{py:class}`~triumvirate.catalogue.ParticleCatalogue`.

In [3]:
import numpy as np

# Catalogue selectors
def cut_to_sphere(coords, boxsize):
    return np.less_equal(np.sqrt(np.sum(coords**2, axis=-1)), boxsize/2.)

# Catalogue properties
density = 5.e-4
boxsize = 1000.

In [4]:
# Create simulation-like catalogue, or load if existing.
catalogue_sim_filepath = "mock_catalogue_sim.dat"

try:
    catalogue_sim = np.loadtxt(
        catalogue_sim_filepath,
        dtype=[(axis, np.float64) for axis in ['x', 'y', 'z']]
    )
except FileNotFoundError:
    from nbodykit.cosmology import Cosmology, LinearPower
    from nbodykit.lab import LogNormalCatalog

    # Cosmology, matter power spectrum and bias at given redshift
    cosmo = Cosmology(
        h=0.6736, Omega0_b=0.04930, Omega0_cdm=0.2645,
        A_s=2.083e-09, n_s=0.9649
    )
    redshift = 1.
    bias = 2.

    powspec = LinearPower(cosmo, redshift)

    catalogue_sim = LogNormalCatalog(
        powspec, density, boxsize, bias=bias, Nmesh=256, seed=42
    )
    catalogue_sim['Position'] -= boxsize/2.

    np.savetxt(catalogue_sim_filepath, catalogue_sim['Position'].compute())

In [5]:
# Create survey-like catalogue.
try:
    catalogue_survey = catalogue_sim[
        cut_to_sphere(catalogue_sim['Position'], boxsize).compute()
    ]
except (IndexError, ValueError):
    catalogue_survey = catalogue_sim[
        cut_to_sphere(
            catalogue_sim[['x', 'y', 'z']]
            .view(np.float64).reshape(len(catalogue_sim), 3),
            boxsize
        )
    ]

In [6]:
# Create random catalogue, or load if existing.
catalogue_rand_filepath = "mock_catalogue_rand.dat"

try:
    catalogue_rand = np.loadtxt(
        catalogue_rand_filepath,
        dtype=[(axis, np.float64) for axis in ['x', 'y', 'z']]
    )
except FileNotFoundError:
    from nbodykit.lab import UniformCatalog
    catalogue_rand = UniformCatalog(5*density, boxsize, seed=42)
    catalogue_rand['Position'] -= boxsize/2.
    catalogue_rand = catalogue_rand[
        cut_to_sphere(catalogue_rand['Position'], boxsize).compute()
    ]
    np.savetxt(catalogue_rand_filepath, catalogue_rand['Position'].compute())

In [7]:
import warnings
from triumvirate.catalogue import ParticleCatalogue

warnings.filterwarnings('ignore', message=".*'nz' field.*")

catalogue_sim = ParticleCatalogue(
    *[catalogue_sim[coord_axis] for coord_axis in ['x', 'y', 'z']]
)
catalogue_survey = ParticleCatalogue(
    *[catalogue_survey[coord_axis] for coord_axis in ['x', 'y', 'z']],
    nz=density
)
catalogue_rand = ParticleCatalogue(
    *[catalogue_rand[coord_axis] for coord_axis in ['x', 'y', 'z']],
    nz=density
)

## Measurements

Having specified all the inputs, measurements can be made by simply passing
them as arguments to the relevant function:

In [8]:
results = compute_bispec(
    catalogue_survey, catalogue_rand,
    paramset=parameter_set,
    logger=trv_logger
)

[2025-04-13 23:44:38 (+00:00:01) [1;34mSTAT[0m] Parameter set have been initialised.
[2025-04-13 23:44:38 (+00:00:01) [0;34mSTAT[0m C++] Parameters validated.
[2025-04-13 23:44:38 (+00:00:01) [1;34mSTAT[0m] Binning has been initialised.
[2025-04-13 23:44:38 (+00:00:01) [1;34mSTAT[0m] Lines of sight have been initialised.
[2025-04-13 23:44:38 (+00:00:01) [1;34mSTAT[0m] Catalogues have been aligned.
[2025-04-13 23:44:38 (+00:00:01) [1;34mSTAT[0m] Preparing catalogue for clustering algorithm... (entering C++)
[2025-04-13 23:44:39 (+00:00:01) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 259444, wtotal = 259444.000, wstotal = 259444.000 (source=extdata).
[2025-04-13 23:44:39 (+00:00:01) [0;32mINFO[0m C++] Extents of particle coordinates: {'x': (2.438, 998.884 | 996.446), 'y': (0.879, 998.351 | 997.472), 'z': (0.364, 998.843 | 998.478)} (source=extdata).
[2025-04-13 23:44:39 (+00:00:01) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 1308287, wtotal = 1308287.000, wstotal = 

### Specifying lines of sight

In the case above, the lines of sight are computed automatically, but one could
supply external data arrays as replacements:

In [9]:
# import numpy as np
results = compute_bispec(
    catalogue_survey, catalogue_rand,
    los_data=np.ones((len(catalogue_survey), 3)),
    los_rand=np.ones((len(catalogue_rand), 3)),
    paramset=parameter_set,
    logger=trv_logger
)

[2025-04-13 23:44:42 (+00:00:04) [1;34mSTAT[0m] Parameter set have been initialised.
[2025-04-13 23:44:42 (+00:00:04) [0;34mSTAT[0m C++] Parameters validated.
[2025-04-13 23:44:42 (+00:00:04) [1;34mSTAT[0m] Binning has been initialised.
[2025-04-13 23:44:42 (+00:00:04) [1;34mSTAT[0m] Lines of sight have been initialised.
[2025-04-13 23:44:42 (+00:00:04) [1;34mSTAT[0m] Catalogues have been aligned.
[2025-04-13 23:44:42 (+00:00:04) [1;34mSTAT[0m] Preparing catalogue for clustering algorithm... (entering C++)
[2025-04-13 23:44:42 (+00:00:04) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 259444, wtotal = 259444.000, wstotal = 259444.000 (source=extdata).
[2025-04-13 23:44:42 (+00:00:04) [0;32mINFO[0m C++] Extents of particle coordinates: {'x': (2.438, 998.884 | 996.446), 'y': (0.879, 998.351 | 997.472), 'z': (0.364, 998.843 | 998.478)} (source=extdata).
[2025-04-13 23:44:42 (+00:00:05) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 1308287, wtotal = 1308287.000, wstotal = 

### Substituting for parameter set

One could also override/bypass `paramset` by passing the relevant/required
keyword arguments. In the example below, we directly set the bispectrum
multipole degrees and form, the binning and the mesh assignment parameters
without a `paramset` argument; if the `paramset` argument was set, its entries
would be overridden by these keyword arguments.

In [10]:
# DEMO
# import warnings
warnings.filterwarnings('ignore', message=".*default values are unchanged.*")

results = compute_bispec(
    catalogue_survey, catalogue_rand,
    degrees=(1, 1, 0),
    binning=binning,
    form='row',
    idx_bin=5,
    sampling_params={
        'assignment': 'cic',
        'boxsize': [1000.,]*3,
        'ngrid': [64,]*3
    },
    logger=trv_logger
)

[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] Validating parameters... (entering C++)
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] ... validated parameters. (exited C++)
[2025-04-13 23:44:44 (+00:00:06) [0;34mSTAT[0m C++] Parameters validated.
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] Parameter set have been initialised.
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] Binning has been initialised.
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] Lines of sight have been initialised.
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] Catalogues have been aligned.
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] Preparing catalogue for clustering algorithm... (entering C++)
[2025-04-13 23:44:44 (+00:00:06) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 259444, wtotal = 259444.000, wstotal = 259444.000 (source=extdata).
[2025-04-13 23:44:44 (+00:00:06) [1;34mSTAT[0m] ... prepared catalogue for clustering algorithm. (exited C++)
[2025-04-13 23:44:44 (+00:00:06) [0

### Minor differences

For other measurement algorithms, the syntax is very similar except for a few
minor differences:

- For two-point statistics, the argument corresponding to `degrees`
  above is `degree` as there is only a single multipole degree. The arguments
  `form` and `idx_bin` do not apply.

- For global plane-parallel measurements, no random catalogue is required.

- For window function measurements, only the random catalogue is required.

For full details, please consult the API reference
({py:mod}`~triumvirate.twopt` and {py:mod}`~triumvirate.threept` modules).

## Results

The returned measurement results are dictionaries containing the
raw statistic (key with suffix ``_raw``) without shot noise subtraction,
the shot noise (key with suffix ``_shot``), the bin centres for each
coordinate dimension (keys with suffix ``_bin``), the average/effectuve bin
coordinates (keys with suffix ``_eff``), and the number of contributing modes
(or analogously pairs) in each bin (key ``'nmodes'``/``'npairs'``).

In [11]:
# DEMO
from pprint import pprint
pprint(results)

{'bk_raw': array([-6.15401315e+08-5.96128196e-08j,  2.40744325e+08-3.20009470e-08j,
       -2.84005114e+08+1.87229060e-09j, -1.06997676e+08+2.23534321e-09j,
        1.94789754e+07-1.04143810e-08j,  8.29519632e+07+7.32134006e-09j,
       -4.53112276e+06+9.17934162e-09j, -1.06546658e+08-7.62472807e-10j,
       -6.84846274e+07-3.75439266e-09j, -1.14891079e+08-2.36752695e-09j]),
 'bk_shot': array([ -5212454.29561975+5.75197972e-10j,
        -8944908.64663568+1.06710274e-09j,
       -15387868.79029874+1.87692160e-09j,
       -17045970.26450608+2.07324903e-09j,
       -21370062.42415106+2.49607964e-09j,
       -13373351.34139535+1.50754375e-09j,
       -19646302.14048065+2.31487097e-09j,
       -16373586.94542564+1.98468654e-09j,
       -14739949.18356972+1.77008608e-09j,
       -12282473.11528281+1.46574486e-09j]),
 'k1_bin': array([0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06]),
 'k1_eff': array([0.06040971, 0.06040971, 0.06040971, 0.06040971, 0.06040971,
       0.06040971, 0

### Saving to files

In the algorithmic function for each type of measurement, if one sets
``save='.txt'`` or ``save='.npz'``, the results as a dictionary will be
automatically saved to a file in either ``.txt`` or ``.npz`` format.

If the `paramset` argument is set to a
{py:class}`~triumvirate.parameters.ParameterSet` object, the output directory
will be ``paramset['directories']['measurements']`` (an empty output directory
path points to the current working directory), and the string
``paramset['tags']['output']`` will be appended to the file name before
the extension suffix.

This is demonstrated below for a global plane-parallel power spectrum
measurement:

In [12]:
from triumvirate.twopt import compute_powspec_in_gpp_box

# DEMO
parameter_set.update(tags={'output': '_demo'})

results = compute_powspec_in_gpp_box(
    catalogue_sim,
    degree=0, paramset=parameter_set,
    save='.txt', logger=trv_logger
)

[2025-04-13 23:44:45 (+00:00:08) [1;34mSTAT[0m] Parameter set have been initialised.
[2025-04-13 23:44:45 (+00:00:08) [0;34mSTAT[0m C++] Parameters validated.
[2025-04-13 23:44:45 (+00:00:08) [0;34mSTAT[0m C++] Parameters validated.
[2025-04-13 23:44:45 (+00:00:08) [1;34mSTAT[0m] Binning has been initialised.
[2025-04-13 23:44:45 (+00:00:08) [1;34mSTAT[0m] Catalogue box has been periodised.
[2025-04-13 23:44:45 (+00:00:08) [1;32mINFO[0m] Inserted missing 'nz' field based on particle count and box size.
[2025-04-13 23:44:45 (+00:00:08) [1;34mSTAT[0m] Preparing catalogue for clustering algorithm... (entering C++)
[2025-04-13 23:44:46 (+00:00:08) [1;34mSTAT[0m] ... prepared catalogue for clustering algorithm. (exited C++)
[2025-04-13 23:44:46 (+00:00:08) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 499214, wtotal = 499214.000, wstotal = 499214.000 (source=extdata).
[2025-04-13 23:44:46 (+00:00:08) [0;32mINFO[0m C++] Extents of particle coordinates: {'x': (0.002, 999.9

Let's have a look at the output measurement file:

In [13]:
# DEMO
with open("pk0_demo.txt", 'r') as results_file:
    print(results_file.read())

# Catalogue source: extdata:5120690896
# Catalogue size: ntotal = 499214, wtotal = 499214.000, wstotal = 499214.000
# Catalogue particle extents: ([0.002, 999.998], [0.001, 999.999], [0.000, 1000.000])
# Box size: [1000.000, 1000.000, 1000.000]
# Box alignment: centre
# Mesh number: [64, 64, 64]
# Mesh assignment and interlacing: tsc, False
# Normalisation factor: 4.012605716e-03 (particle)
# Normalisation factor alternatives: 4.012605716e-03 (particle), 2.859492535e-03 (mesh), 0.000000000e+00 (mesh-mixed)
# [0] k_cen, [1] k_eff, [2] nmodes, [3] Re{pk0_raw}, [4] Im{pk0_raw}, [5] Re{pk0_shot}, [6] Im{pk0_shot}
1.000000000e-02	1.149964290e-02	        56	 3.160464462e+04	 0.000000000e+00	 2.003148950e+03	 0.000000000e+00
2.000000000e-02	2.047114034e-02	       194	 3.881291755e+04	 0.000000000e+00	 2.003148950e+03	 0.000000000e+00
3.000000000e-02	3.052421724e-02	       488	 2.746437066e+04	 0.000000000e+00	 2.003148950e+03	 0.000000000e+00
4.000000000e-02	4.062536317e-02	       812	 2.2986

We see that a header with summary information about the input parameters and
data as well as some intermediary results has also been included in the
saved file.

Analogously, with the ``save='.npz'`` output format, we would have

In [14]:
results = compute_powspec_in_gpp_box(
    catalogue_sim,
    degree=0, paramset=parameter_set,
    save='.npz', logger=trv_logger
)

[2025-04-13 23:44:46 (+00:00:08) [1;34mSTAT[0m] Parameter set have been initialised.
[2025-04-13 23:44:46 (+00:00:08) [0;34mSTAT[0m C++] Parameters validated.
[2025-04-13 23:44:46 (+00:00:08) [1;34mSTAT[0m] Binning has been initialised.
[2025-04-13 23:44:46 (+00:00:08) [1;34mSTAT[0m] Catalogue box has been periodised.
[2025-04-13 23:44:46 (+00:00:08) [1;34mSTAT[0m] Preparing catalogue for clustering algorithm... (entering C++)
[2025-04-13 23:44:46 (+00:00:08) [1;34mSTAT[0m] ... prepared catalogue for clustering algorithm. (exited C++)
[2025-04-13 23:44:46 (+00:00:08) [0;32mINFO[0m C++] Catalogue loaded: ntotal = 499214, wtotal = 499214.000, wstotal = 499214.000 (source=extdata).
[2025-04-13 23:44:46 (+00:00:08) [0;32mINFO[0m C++] Extents of particle coordinates: {'x': (0.002, 999.998 | 999.996), 'y': (0.001, 999.999 | 999.998), 'z': (0.000, 1000.000 | 999.999)} (source=extdata).
[2025-04-13 23:44:46 (+00:00:08) [1;32mINFO[0m] Normalisation factors: 4.012606e-03 (parti

In [15]:
# DEMO
with np.load("pk0_demo.npz", allow_pickle=True) as results_file:
    print(results_file['header'])

Catalogue source: extdata:5120690896
Catalogue size: ntotal = 499214, wtotal = 499214.000, wstotal = 499214.000
Catalogue particle extents: ([0.002, 999.998], [0.001, 999.999], [0.000, 1000.000])
Box size: [1000.000, 1000.000, 1000.000]
Box alignment: centre
Mesh number: [64, 64, 64]
Mesh assignment and interlacing: tsc, False
Normalisation factor: 4.012605716e-03 (particle)
Normalisation factor alternatives: 4.012605716e-03 (particle), 2.859492535e-03 (mesh), 0.000000000e+00 (mesh-mixed)
[0] k_cen, [1] k_eff, [2] nmodes, [3] Re{pk0_raw}, [4] Im{pk0_raw}, [5] Re{pk0_shot}, [6] Im{pk0_shot}


In [16]:
# Hide cell.
!rm -r pk0_demo.*