RAIL Interface demo for p(z) estimation

Author: Eric Charles

Last Successfully Run: July 24, 2024

This notebook will demonstrate how to use the rail.interfaces package to construct an object that can estimate p(z) either for every object in an input catalog file that contains fluxes in various bands, or from a dictionary of numpy arrays with fluxes in band.

In [None]:
# standard utility imports
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# rail-related inputs
import qp
from rail.utils.path_utils import find_rail_file

In [None]:
# import the PZFactory that we will use as the interface
from rail.interfaces import PZFactory

In [None]:
# find a catalog file with 10 object for a quick demonstration
input_file = find_rail_file('examples_data/testdata/validation_10gal.hdf5')

In [None]:
help(PZFactory)

We are going to use a `TrainZEstimator` which just returns the same pdf (typically the overall z distribution of the training sample) every time.

In [None]:
stage = PZFactory.build_cat_estimator_stage(
    stage_name = 'train_z',
    class_name = 'TrainZEstimator',
    module_name = 'rail.estimation.algos.train_z',
    model_path = 'model_inform_trainz.pkl',
    data_path = 'dummy.in',
)

Note that we cached the stage object so that we don't have to recreate it each time we want it

In [None]:
check_stage = PZFactory.get_cat_estimator_stage('train_z')
assert check_stage == stage

Here we evalute a single p(z)

In [None]:
out_single = PZFactory.estimate_single_pz(stage, {'d':np.array([1,1])})

Note that the return object is a qp ensemble with a single pdf

In [None]:
out_single

In [None]:
out_single.npdf

We can evaluate the pdf on a grid and plot the values

In [None]:
zgrid = np.linspace(0, 1.5, 151)

In [None]:
_ = plt.plot(zgrid, np.squeeze(out_single.pdf(zgrid)))
_ = plt.xlabel('z')
_ = plt.ylabel('p(z)')

Here we evaluate p(z) for all the objects in a file

In [None]:
out_handle = PZFactory.run_cat_estimator_stage(
    stage,
    input_file,
)

Note that this returns a `DataHandle` that we can use to access the output data.
In this case it has 10 pdfs (one for each input object)

In [None]:
data = qp.read(out_handle.path)

In [None]:
data.npdf

Here we plot the pdf of the first object.  Because train_z returns the same pdf every time, this is identical to the one above.

In [None]:
_ = plt.plot(zgrid, np.squeeze(data[0].pdf(zgrid)))
_ = plt.xlabel('z')
_ = plt.ylabel('p(z)')

Let's clean up after ourselves

In [None]:
import os
try:
    os.unlink('inprogress_output_train_z.hdf5')
except:
    pass
try:
    os.unlink('output_train_z.hdf5')
except:
    pass