# Generate mock data for a cluster

In this example we generate mock data with a variety of systematic effects including photometric redshifts, source galaxy distributions, and shape noise.  We then populate a galaxy cluster object. This notebooks is organised as follows:
- Imports and configuration setup
- Generate mock data with different source galaxy options
- Generate mock data with different field-of-view options
- Generate mock data with different galaxy cluster options (only available with the Numcosmo and/or CCL backends). Use the `os.environ['CLMM_MODELING_BACKEND']` line below to select your backend.

In [None]:
import os

## Uncomment the following line if you want to use a specific modeling backend among 'ct' (cluster-toolkit), 'ccl' (CCL) or 'nc' (Numcosmo). Default is 'ccl'
# os.environ['CLMM_MODELING_BACKEND'] = 'nc'

In [None]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
import clmm

Make sure we know which version we're using

In [None]:
clmm.__version__

In [None]:
# to limit decimal places in catalog columns display
def _nice_display(cat):
    for col in ("ra", "dec"):
        cat[col].info.format = ".2f"
    for col in ("e1", "e2"):
        cat[col].info.format = ".4f"
    cat["z"].info.format = ".3f"
    return cat

## Import mock data module and setup the configuration 

In [None]:
from clmm.support import mock_data as mock
from clmm import Cosmology

Mock data generation requires a defined cosmology

In [None]:
mock_cosmo = Cosmology(H0=70.0, Omega_dm0=0.27 - 0.045, Omega_b0=0.045, Omega_k0=0.0)

Mock data generation requires some cluster information. The default is to work with the NFW profile, using the "200,mean" mass definition. The Numcosmo and CCL backends allow for more flexibility (see last section of this notebook)

In [None]:
cosmo = mock_cosmo
cluster_id = "Awesome_cluster"
cluster_m = 1.0e15  # M200,m
cluster_z = 0.3
src_z = 0.8
concentration = 4
ngals = 1000  # number of source galaxies

# Cluster centre coordinates
cluster_ra = 50.0
cluster_dec = 87.0

It is also possible to choose the coordinate system for the generated ellipticities. Possible options are either "celestial" or "euclidean". The default choice is "euclidean". See https://doi.org/10.48550/arXiv.1407.7676 section 5.1 for more details.

In [None]:
coordinate_system = "euclidean"

## Generate the mock catalog with different source galaxy options

- Clean data: no noise, all galaxies at the same redshift

In [None]:
zsrc_min = cluster_z + 0.1

In [None]:
ideal_data = mock.generate_galaxy_catalog(
    cluster_m,
    cluster_z,
    concentration,
    cosmo,
    src_z,
    ngals=ngals,
    cluster_ra=cluster_ra,
    cluster_dec=cluster_dec,
    coordinate_system=coordinate_system,
)

In [None]:
# let's put all these quantities in a single dictionary to facilitate clarity
cluster_kwargs = {
    "cluster_m": cluster_m,
    "cluster_z": cluster_z,
    "cluster_ra": cluster_ra,
    "cluster_dec": cluster_dec,
    "cluster_c": concentration,
    "cosmo": cosmo,
    "coordinate_system": coordinate_system,
}

In [None]:
ideal_data = mock.generate_galaxy_catalog(**cluster_kwargs, zsrc=src_z, ngals=ngals)

- Noisy data: shape noise, all galaxies at the same redshift

In [None]:
noisy_data_src_z = mock.generate_galaxy_catalog(
    **cluster_kwargs, zsrc=src_z, shapenoise=0.05, ngals=ngals
)

* Noisy data: shape noise plus measurement error, all galaxies at the same redshift

In [None]:
noisy_data_src_z_e_err = mock.generate_galaxy_catalog(
    **cluster_kwargs, zsrc=src_z, shapenoise=0.05, mean_e_err=0.05, ngals=ngals
)

<div class="alert alert-warning">

**WARNING:** Experimental feature. Uncertainties are created by simply drawing random numbers near the value specified by `mean_e_err`. Use at your own risk. This will be improved in future releases.
    
</div>

- Noisy data: photo-z errors (and pdfs!), all galaxies at the same redshift. At present, the pdfs are generated by drawing a random value from a Normal distribution with mean `ztrue` and width given by `dz = (1+z)*photoz_sigma_unscaled`, and the pdf is this Normal distribution centered around `z` instead of `ztrue`.

In [None]:
np.random.seed(41363)

noisy_data_photoz = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc=src_z,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    pz_bins=101,
    ngals=ngals
)
_nice_display(noisy_data_photoz[:5])

* Changing ellipticity coordinate system: notice that $e_2$ changes sign!

In [None]:
cluster_kwargs_celestial = cluster_kwargs.copy()
cluster_kwargs_celestial["coordinate_system"] = "celestial"

np.random.seed(41363)

noisy_data_photoz_celestial = mock.generate_galaxy_catalog(
    **cluster_kwargs_celestial,
    zsrc=src_z,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    pz_bins=101,
    ngals=ngals
)
_nice_display(noisy_data_photoz_celestial[:3])

- Clean data: source galaxy redshifts drawn from a redshift distribution instead of fixed `src_z` value. Options are `chang13` for Chang et al. 2013 or `desc_srd` for the distribution given in the DESC Science Requirement Document. No shape noise or photoz errors.

In [None]:
ideal_with_src_dist = mock.generate_galaxy_catalog(
    **cluster_kwargs, zsrc="chang13", zsrc_min=zsrc_min, zsrc_max=7.0, ngals=ngals
)

- Noisy data: galaxies following redshift distribution, redshift error, shape noise

In [None]:
allsystematics = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="chang13",
    zsrc_min=zsrc_min,
    photoz_sigma_unscaled=0.05,
    ngals=ngals,
    pz_bins=101,
)

In [None]:
allsystematics2 = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="desc_srd",
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    photoz_sigma_unscaled=0.05,
    ngals=ngals,
    pz_bins=101,
    shapenoise=0.05,
)

Sanity check: checking that no galaxies were originally drawn below zsrc_min, before photoz errors are applied (when relevant)

In [None]:
print(
    f"""Number of galaxies below zsrc_min:
    ideal_data          : {np.sum(ideal_data['ztrue']<zsrc_min):5,}
    noisy_data_src_z    : {np.sum(noisy_data_src_z['ztrue']<zsrc_min):5,}
    noisy_data_photoz   : {np.sum(noisy_data_photoz['ztrue']<zsrc_min):5,}
    ideal_with_src_dist : {np.sum(ideal_with_src_dist['ztrue']<zsrc_min):5,}
    allsystematics      : {np.sum(allsystematics['ztrue']<zsrc_min):5,}
"""
)

### Different ways to store photometric redshift information

- In the default PDF storing (`pzpdf_type='shared_bins'`), the values of the PDF are added to the `pzpdf` column and the binning scheme is stored in the `pzpdf_info` attribute:

In [None]:
_temp_data_sb = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc=src_z,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    pz_bins=101,
    ngals=ngals,
)
_nice_display(_temp_data_sb[:5])

In [None]:
_temp_data_sb.pzpdf_info["zbins"][:10]

- It is also possible to generate individual binning of the PDF with `pzpdf_type='individual_bins'`. In this case, both the bin values and the PDF are added to the `pzpdf` column:

In [None]:
_temp_data_ib = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc=src_z,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    pz_bins=101,
    ngals=ngals,
    pzpdf_type="individual_bins",
)
_nice_display(_temp_data_ib[:5])

- Another possibility is just storing the quantiles of the PDF (`pzpdf_type='quantiles'`). In this case, the locations of the PDF quantiles are added to the `pzquantiles` column and the quantiles scheme is stored in the `pzpdf_info` attribute:

In [None]:
_temp_data_qt = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc=src_z,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    pz_bins=101,
    ngals=ngals,
    pzpdf_type="quantiles",
)
_nice_display(_temp_data_qt[:5])

In [None]:
_temp_data_qt.pzpdf_info["quantiles"][:10]

- Note that if `pzpdf_type=None`, the pdf is not stored:

In [None]:
_nice_display(
    mock.generate_galaxy_catalog(
        **cluster_kwargs,
        zsrc=src_z,
        shapenoise=0.05,
        photoz_sigma_unscaled=0.05,
        pz_bins=101,
        ngals=ngals,
        pzpdf_type=None
    )[:5]
)

### Unpacking photometric redshift information
- An unified way to recover the PDZ it using the `get_pzpdfs` method.
When `pzpdf_type=shared_bins`, a unique array of bins and each PDF is returned:

In [None]:
pzbins, pzpdf = _temp_data_sb.get_pzpdfs()
print(
    f"pzbins shape : {pzbins.shape}",
    f"\npzpdf  shape : {pzpdf.shape}",
)

- When `pzpdf_type=individual_bins`, each individual redshift binning and PDF is returned:

In [None]:
pzbins, pzpdf = _temp_data_ib.get_pzpdfs()
print(
    f"pzbins shape : {pzbins.shape}",
    f"\npzpdf  shape : {pzpdf.shape}",
)

When `pzpdf_type=quantiles`, the PDFs are unpacked using an unique array, and the output is of the same format as in `pzpdf_type=shared_bins`:

In [None]:
pzbins, pzpdf = _temp_data_qt.get_pzpdfs()
print(
    f"pzbins shape : {pzbins.shape}",
    f"\npzpdf  shape : {pzpdf.shape}",
)

This unpacking of the PDF is done based on the values of `pzpdf_info["unpack_quantile_zbins_limits"]` and can be configured:

In [None]:
_temp_data_qt.pzpdf_info["unpack_quantile_zbins_limits"] = (0, 5, 2000)
pzbins, pzpdf = _temp_data_qt.get_pzpdfs()
print(
    f"pzbins shape : {pzbins.shape}",
    f"\npzpdf  shape : {pzpdf.shape}",
)

### Inspect the catalog data

- Ideal catalog first entries: no noise on the shape measurement, all galaxies at z=0.8, no redshift errors (z = ztrue)

In [None]:
_nice_display(ideal_data[:3])

- With photo-z errors

In [None]:
_nice_display(noisy_data_photoz[:3])

- Histogram of the redshift distribution of background galaxies, for the true (originally drawn) redshift and the redshift once photoz errors have been added, and the stacked pdf. By construction no true redshift occurs below zsrc_min, but some 'observed' redshifts (i.e. including photoz errors) might be.

In [None]:
plt.hist(
    allsystematics["z"],
    bins=50,
    alpha=0.3,
    density=True,
    label="measured z (i.e. including photoz error)",
)
plt.hist(allsystematics["ztrue"], bins=50, alpha=0.3, density=True, label="true z")
stacked_pdf = np.mean(allsystematics["pzpdf"], axis=0)
plt.plot(allsystematics.pzpdf_info["zbins"], stacked_pdf, "C3", label="stacked pdf")
plt.axvline(zsrc_min, color="k", label="requested zmin")
plt.xlabel("Source Redshift")
plt.ylabel("n(z)")
plt.legend()
plt.xlim(0, 5)
plt.show()

In [None]:
plt.hist(allsystematics["ztrue"], bins=50, alpha=0.3, label="true z")
plt.hist(allsystematics2["ztrue"], bins=50, alpha=0.3, label="true z")
plt.show()

## Populate a galaxy cluster object

In [None]:
gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, allsystematics)

From a `GalaxyCluster` object that has photoz information, `draw_gal_z_from_pdz` allows to generate `nobj` random redshifts of each galaxy in `galcat`, from its photoz pdf, and store the result in a new `zcol_out` column.

In [None]:
z_random = gc_object.draw_gal_z_from_pdz(zcol_out="z_random", overwrite=False, nobj=1)

The plot below shows the "observed photoz pdf" (blue), centered on the "observed z" (red), the true redshift from which the shear where computed (green) and  a random redshift (orange) computed from the pdf

In [None]:
# p(z) for one of the galaxies in the catalog,
galid = 0
plt.fill(gc_object.galcat.pzpdf_info["zbins"], allsystematics["pzpdf"][galid], alpha=0.3)
plt.plot(gc_object.galcat.pzpdf_info["zbins"], gc_object.galcat["pzpdf"][galid], label="Photoz pdf")

plt.axvline(gc_object.galcat["z"][galid], label="Observed z", color="red")
plt.axvline(gc_object.galcat["ztrue"][galid], label="True z", color="g")
plt.axvline(gc_object.galcat["z_random"][galid], label="Random z from pdf", color="orange")

plt.xlabel("Redshift")
plt.ylabel("Photo-z Probability Distribution")
plt.legend(loc=1)
plt.xlim(gc_object.galcat["z"][galid] - 0.5, gc_object.galcat["z"][galid] + 0.5)
plt.show()

Plot source galaxy ellipticities

In [None]:
plt.scatter(gc_object.galcat["e1"], gc_object.galcat["e2"])

plt.xlim(-0.2, 0.2)
plt.ylim(-0.2, 0.2)
plt.xlabel("Ellipticity 1", fontsize="x-large")
plt.ylabel("Ellipticity 2", fontsize="x-large")
plt.show()

## Generate the mock data catalog with different field-of-view options

In the examples above, `ngals=1000` galaxies were simulated in a field corresponding to a 8 Mpc x 8 Mpc (proper distance) square box at the cluster redshift (this is the default). The user may however vary the field size and/or provide a galaxy density (instead of a number of galaxies). This is examplified below, using the `allsystematics` example.

- `ngals = 1000` in a 4 x 4 Mpc box. Asking for the same number of galaxies in a smaller field of view yields high galaxy density

In [None]:
allsystematics2 = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="chang13",
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    field_size=4,
    ngals=ngals
)

In [None]:
plt.scatter(allsystematics["ra"], allsystematics["dec"], marker=".", label="default 8 x 8 Mpc FoV")
plt.scatter(allsystematics2["ra"], allsystematics2["dec"], marker=".", label="user-defined FoV")
plt.legend()
plt.show()

- Alternatively, the user may provide a galaxy density (here ~1 gal/arcmin2 to roughly match 1000 galaxies, given the configuration) and the number of galaxies to draw will automatically be adjusted to the box size.

In [None]:
allsystematics3 = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="chang13",
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    ngal_density=1.3,
)
print(f"Number of drawn galaxies = {len(allsystematics3)}")

In [None]:
allsystematics4 = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="desc_srd",
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    ngal_density=1.3,
)
print(f"Number of drawn galaxies = {len(allsystematics4)}")

In [None]:
plt.scatter(allsystematics["ra"], allsystematics["dec"], marker=".", label="ngals = 1000")
plt.scatter(
    allsystematics3["ra"],
    allsystematics3["dec"],
    marker=".",
    label="ngal_density = 1 gal / arcmin2",
)
plt.legend()
plt.show()

## Generate mock data with different galaxy cluster options
WARNING: Available options depend on the modeling backend:
- Cluster-toolkit allows for other values of the overdensity parameter, but is retricted to working with the mean mass definition
- Both CCL and Numcosmo allow for different values of the overdensity parameter, but work with both the mean and critical mass definition
- Numcosmo further allows for the Einasto or Burkert density profiles to be used instead of the NFW profile



### Changing the overdensity parameter (all backend) - `delta_so` keyword (default = 200)

In [None]:
allsystematics_500mean = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="chang13",
    delta_so=500,
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    ngals=ngals
)

### Using the critical mass definition (Numcosmo and CCL only) - `massdef` keyword (default = 'mean')
WARNING: error will be raised if using the cluster-toolkit backend

In [None]:
allsystematics_200critical = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="chang13",
    massdef="critical",
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    ngals=ngals
)

### Changing the halo density profile (Numcosmo and CCL only) - `halo_profile_model` keyword (default = 'nfw')
WARNING: error will be raised if using the cluster-toolkit or CCL backends

In [None]:
allsystematics_200m_einasto = mock.generate_galaxy_catalog(
    **cluster_kwargs,
    zsrc="chang13",
    halo_profile_model="einasto",
    zsrc_min=zsrc_min,
    zsrc_max=7.0,
    shapenoise=0.05,
    photoz_sigma_unscaled=0.05,
    ngals=ngals
)