# RAIL_Lephare example on LSST data

**Author:** Raphael Shirley, Edited by Tianqing Zhang

**Last successfully run:** Dec 18, 2025

This example notebook uses synthetic data produced by PZFlow in combination with several predefined SED templates and filter definition files.

In [None]:
import lephare as lp
import matplotlib.pyplot as plt
import numpy as np
import tables_io
from rail.utils.path_utils import find_rail_file

from rail import interactive as ri

Here we load previously created synthetic data

In [None]:
trainFile = find_rail_file("examples_data/testdata/output_table_conv_train.hdf5")
testFile = find_rail_file("examples_data/testdata/output_table_conv_test.hdf5")
traindata_io = tables_io.read(trainFile)
testdata_io = tables_io.read(testFile)

Retrieve all the required filter and template files

One could add or take out bandpasses by editing the configuration file. 

In [None]:
lephare_config_file = find_rail_file("examples_data/estimation_data/data/lsst.para")
lephare_config = lp.read_config(lephare_config_file)

lp.data_retrieval.get_auxiliary_data(keymap=lephare_config)

We use the inform stage to create the library of SEDs with various redshifts, extinction parameters, and reddening values. This typically takes ~3-4 minutes.

In [None]:
lephare_model = ri.estimation.algos.lephare.lephare_informer(
    input=traindata_io,
    name="lephare",
    nondetect_val=np.nan,
    hdf5_groupname="",
    # Use a sparse redshift grid to speed up the notebook
    zmin=0,
    zmax=3,
    nzbins=31,
    lephare_config=lephare_config,  # this is important if you want to modify your default setup with the parameter file
)

Now we take the sythetic test data, and find the best fits from the library. This results in a PDF, zmode, and zmean for each input test data. Takes ~2 minutes to run on 1500 inputs.

In [None]:
lephare_estimated = ri.estimation.algos.lephare.lephare_estimator(
    input=testdata_io,
    nondetect_val=np.nan,
    model=lephare_model["model"],
    hdf5_groupname="",
    aliases=dict(input="test_data", output="lephare_estim"),
)

An example lephare PDF and comparison to the true value

In [None]:
indx = 0
zgrid = np.linspace(0, 3, 31)
plt.plot(
    zgrid,
    np.squeeze(lephare_estimated["output"][indx].pdf(zgrid)),
    label="Estimated PDF",
)
plt.axvline(x=testdata_io["redshift"][indx], color="r", label="True redshift")
plt.legend()
plt.xlabel("z")
plt.show()

More example fits

In [None]:
indxs = [8, 16, 32, 64, 128, 256, 512, 1024]
zgrid = np.linspace(0, 3, 31)
fig, axs = plt.subplots(2, 4, figsize=(20, 6))
for i, indx in enumerate(indxs):
    ax = axs[i // 4, i % 4]
    ax.plot(
        zgrid,
        np.squeeze(lephare_estimated["output"][indx].pdf(zgrid)),
        label="Estimated PDF",
    )
    ax.axvline(x=testdata_io["redshift"][indx], color="r", label="True redshift")
    ax.set_xlabel("z")

Histogram of the absolute difference between lephare estimate and true redshift

In [None]:
estimate_diff_from_truth = np.abs(
    lephare_estimated["output"].ancil["zmode"] - testdata_io["redshift"]
)

plt.figure()
plt.hist(estimate_diff_from_truth, 100)
plt.xlabel("abs(z_estimated - z_true)")
plt.show()

Let's compare the estimated median redshift vs. the true redshift in a scatter plot. 

In [None]:
truez = testdata_io["redshift"]

plt.figure(figsize=(8, 8))
plt.scatter(truez, lephare_estimated["output"].median(), s=3)
plt.plot([-1, 3], [-1, 3], "k--")
plt.xlim([-0.1, 3])
plt.ylim([-0.1, 3])

plt.xlabel("redshift", fontsize=12)
plt.ylabel("z mode", fontsize=12)