# FlexZBoost PDF Representation Comparison

**Author:** Drew Oldag

**Last Run Successfully:** Dec 18, 2025

This notebook does a quick comparison of storage requirements for Flexcode output using two different storage techniques. We'll compare `qp.interp` (x,y interpolated) output against the native parameterization of `qp_flexzboost`.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import qp
import rail.interactive as ri
import tables_io
from rail.utils.path_utils import find_rail_file

Create references to the training and test data.

In [None]:
trainFile = find_rail_file("examples_data/testdata/test_dc2_training_9816.hdf5")
testFile = find_rail_file("examples_data/testdata/test_dc2_validation_9816.hdf5")
training_data = tables_io.read(trainFile)
test_data = tables_io.read(testFile)

Define the configurations for the ML model to be trained by Flexcode. Specifically we'll use Xgboost with a set of 35 cosine basis functions.

In [None]:
fz_dict = dict(
    zmin=0.0,
    zmax=3.0,
    nzbins=301,
    trainfrac=0.75,
    bumpmin=0.02,
    bumpmax=0.35,
    nbump=20,
    sharpmin=0.7,
    sharpmax=2.1,
    nsharp=15,
    max_basis=35,
    basis_system="cosine",
    hdf5_groupname="photometry",
    regression_params={"max_depth": 8, "objective": "reg:squarederror"},
)

In [None]:
model = ri.estimation.algos.flexzboost.flex_z_boost_informer(
    input=training_data, **fz_dict
)

Now we configure the RAIL stage that will evaluate test data using the saved model.
Note that we specify `qp_representation='flexzboost'` here to instruct `rail_flexzboost` to store the model weights using `qp_flexzboost`.

Now we actually evaluate the test data, 20,449 example galaxies, using the trained model.

In [None]:
%%time
fzresults_qp_flexzboost = ri.estimation.algos.flexzboost.flex_z_boost_estimator(
    input=test_data, model=model["model"]
)

Example calculating median and mode. Note that we're using the `%%timeit` magic command to get an estimate of the time required for calculating `median`, but we're using `%%time` to estimate the `mode`. This is because `qp` will cache the output of the `pdf` function for a given grid. If we used `%%timeit`, then the resulting estimate would average the run time of one non-cached calculation and N-1 cached calculations. 

In [None]:
zgrid = np.linspace(0, 3.0, 301)

In [None]:
%%time
fz_medians_qp_flexzboost = fzresults_qp_flexzboost["output"].median()

In [None]:
%%time
fz_modes_qp_flexzboost = fzresults_qp_flexzboost["output"].mode(grid=zgrid)

Plotting median values.

In [None]:
fz_medians_qp_flexzboost = fzresults_qp_flexzboost["output"].median()

plt.hist(fz_medians_qp_flexzboost, bins=np.linspace(-0.005, 3.005, 101))
plt.xlabel("redshift")
plt.ylabel("Number")

Example convertion to a `qp.hist` histogram representation.

In [None]:
%%timeit
bins = np.linspace(0, 3, 301)
fzresults_qp_flexzboost["output"].convert_to(qp.hist_gen, bins=bins)

Now we'll repeat the experiment using `qp.interp` storage. Again, we'll define the RAIL stage to evaluate the test data using the saved model, but instruct `rail_flexzboost` to store the output as x,y interpolated values using `qp.interp`.

Finally we evaluate the test data again using the trained model, and then print out the size of the file that was saved using the x,y interpolated technique.

In [None]:
fzresults_qp_interp = ri.estimation.algos.flexzboost.flex_z_boost_estimator(
    input_data=test_data,
    model=model["model"],
    qp_representation="interp",
    calculated_point_estimates=[],
)

Example calculating median and mode. Note that we're using the `%%timeit` magic command to get an estimate of the time required for calculating `median`, but we're using `%%time` to estimate the `mode`. This is because `qp` will cache the output of the `pdf` function for a given grid. If we used `%%timeit`, then the resulting estimate would average the run time of one non-cached calculation and N-1 cached calculations.

In [None]:
zgrid = np.linspace(0, 3.0, 301)

In [None]:
%%timeit
fz_medians_qp_interp = fzresults_qp_interp["output"].median()

In [None]:
%%time
fz_modes_qp_interp = fzresults_qp_interp["output"].mode(grid=zgrid)

Plotting median values.

In [None]:
fz_medians_qp_interp = fzresults_qp_interp["output"].median()
plt.hist(fz_medians_qp_interp, bins=np.linspace(-0.005, 3.005, 101))
plt.xlabel("redshift")
plt.ylabel("Number")

Example convertion to a `qp.hist` histogram representation.

In [None]:
%%timeit
bins = np.linspace(0, 3, 301)
fzresults_qp_interp["output"].convert_to(qp.hist_gen, bins=bins)