# ClusterShearCatalogs stage functionalities

This notebook aims at presenting the `ClusterShearCatalogs` stage of the TXpipe clusters extension. This stage selects background galaxies for each cluster of a cluster catalog and compute basic shear-related quantities for each of those galaxies (e.g., tangential and cross shear components, weights)

In [None]:
import os
from pprint import pprint
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.display import Image
import ceci
import h5py
import yaml

Make sure to change your path in the next cell that leads to your TXPipe directory. See examples for IN2P3 and NERSC below.

In [None]:
# user specific paths -- IN2P3 example
# my_txpipe_dir = "/pbs/home/m/mricci/throng_mricci/desc/TXPipe"
my_txpipe_dir = "/pbs/throng/lsst/users/ccombet/TXPipe"

#user specific paths -- NERSC example
#my_txpipe_dir = "/pscratch/sd/a/avestruz/TXPipe"

os.chdir(my_txpipe_dir)

import txpipe

# 1 deg$^2$ sample running directly in Jupyter

First we will do some runs on the 1 deg^2 example data set with around 80k galaxies. This is small enough that we can do it all in jupyter.

The data set, which is based on CosmoDC2, contains pre-computed photo-z and and contains a RedMapper cluster catalog for the field.

## This initiates and run the stage

In [None]:
print("Options for this pipeline and their defaults:")
print(txpipe.extensions.CLClusterShearCatalogs.config_options)

In [None]:
pipe_stage = txpipe.extensions.CLClusterShearCatalogs.make_stage(

    # catalogs
    shear_catalog = "data/example/inputs/metadetect_shear_catalog.hdf5",
    cluster_catalog = "./data/example/inputs/cluster_catalog.hdf5",
    source_photoz_pdfs = "data/example/inputs/photoz_pdfs.hdf5",    

    # Initial sample selection was performed and output in shear_tomography_catalog
    # by previously running the TXSourceSelectorMetadetect stage
    shear_tomography_catalog = "data/example/outputs_metadetect/shear_tomography_catalog.hdf5",
    
    # Fiducial cosmology: it is needed to get physical distances as we are
    # currently selecting sources based on projected distance (in Mpc) 
    # from cluster center
    fiducial_cosmology = "./data/fiducial_cosmology.yml",
    
    # This is the output for this stage
    cluster_shear_catalogs = "./data/cosmodc2/outputs-1deg2-CL/cluster_shear_catalogs.hdf5",
    
    # This contains all the options for this stage. Default config options will be updated
    config = "./examples/cosmodc2/config-1deg2-CL.yml",
)

In [None]:
# Check the new config options
pipe_stage.config

In [None]:
pipe_stage.run()
pipe_stage.finalize()

## Checking out the output

To avoid making lots and lots of copies of the data, this stage has not made a catalog, but instead made an index into the other catalogs, and stored only the new derived quantities.

We have a helper class which is designed to match up all the different catalogs that go into this and collect the results for each cluster.

In [None]:
ccc = txpipe.extensions.CombinedClusterCatalog(
    shear_catalog="./data/example/inputs/metadetect_shear_catalog.hdf5",
    shear_tomography_catalog="./data/example/outputs_metadetect/shear_tomography_catalog.hdf5",
    cluster_catalog="./data/example/inputs/cluster_catalog.hdf5",
    cluster_shear_catalogs="./data/cosmodc2/outputs-1deg2-CL/cluster_shear_catalogs.hdf5",
    photoz_pdfs="./data/example/inputs/photoz_pdfs.hdf5",
)

In [None]:
print(f"We have {ccc.ncluster} clusters")

We can extract the cluster catalog info by index (0 -- 74):

In [None]:
cluster_info = ccc.get_cluster_info(0)
cluster_info

And also the shear catalog associated with that cluster, again by index, in the CLMM data format

In [None]:
bg_cat = ccc.get_background_shear_catalog(0)
bg_cat[0:3]

# 20 deg$^2$ example using the pipeline approach

In [None]:
# Read the appropriate pipeline configuration, and ask for a flow-chart.
pipeline_file = "examples/cosmodc2/Cluster_pipelines/CLClusterShearCat-20deg2-CL.yml"
flowchart_file = "CLClusterShearCat.png"

pipeline_config = ceci.Pipeline.build_config(
    pipeline_file,
    flow_chart=flowchart_file,
    dry_run=True
)

# Run the flow-chart pipeline
ceci.run_pipeline(pipeline_config)


In [None]:
Image(flowchart_file)