# Demo: RAIL Evaluation 

Contact: _Julia Gschwend_ ([julia@linea.gov.br](mailto:julia@linea.gov.br)), _Sam Schmidt, Alex Malz, Eric Charles_


The purpose of this notebook is to demonstrate the use of the metrics scripts to be used on the photo-z PDF catalogs produced by the PZ working group. The first implementation of the _evaluation_ module is based on the refactoring of the algorithms used in [Schmidt et al. 2020](https://arxiv.org/pdf/2001.03621.pdf), available on Github repository [PZDC1paper](https://github.com/LSSTDESC/PZDC1paper). 


To run this code, you must install qp and have the notebook in the same directory as metrics.py. You must also install some run-of-the-mill Python packages: numpy, scipy, matplotlib, and seaborn.

### Contents

* [Sample](#sample)
 - [Run FZBoost](#fzboost)
* [CDF-based metrics](#cdf_metrics)
 - [PIT](#pit) 
 - [QQ plot](#qq) 
* [Summary statistics of CDF-based metrics](#summary_stats)
  - [KS](#ks) 
  - [CvM](#cvm) 
  - [AD](#ad) 
* [CDE loss](#cde_loss)  
* [Summary](#summary)

In [None]:
#from IPython.display import Markdown
from sample import Sample
from metrics import *
import os
%matplotlib inline
%reload_ext autoreload
%autoreload 2

<a class="anchor" id="sample"></a>

# Sample  


To compute the photo-z metrics of a given test sample, it is necessary to read the output of a photo-z code containing galaxies' photo-z PDFs. Let's use the toy data available in `tests/data/` (**test_dc2_training_9816.hdf5** and **test_dc2_validation_9816.hdf5**) and the configuration file available in `examples/configs/FZBoost.yaml` to generate a small samples of photo-z PDFs using the **FZBoost** algorithm available on RAIL's _estimation_ module.

<a class="anchor" id="fzboost"></a>
### Run FZBoost

Go to dir  `<your_path>/RAIL/examples/` and run the command:

`python main.py configs/FZBoost.yaml`

The photo-z output files (inputs for this notebook) will be writen at: 

`<your_path>/RAIL/examples/results/FZBoost/test_FZBoost.hdf5`. 

In [None]:
my_path = '/Users/julia/github/RAIL' # replace it by your path to RAIL's parent dir
pdfs_file =  os.path.join(my_path, "examples/results/FZBoost/test_FZBoost.hdf5")
ztrue_file =  os.path.join(my_path, "tests/data/test_dc2_validation_9816.hdf5")

Let's create a Sample object containing both the PDFs and true redshifts for each photo-z code.

In [None]:
sample = Sample(pdfs_file, ztrue_file, code="FlexZBoost", name="toy data")
sample

In [None]:
print(sample)

PDFs of 5 galaxies for illustration. The function `plot_pdfs` calls a _qp_ built-in plot function and returns the color codes of galaxies whose indexes are include in the list `gals`. The galaxies in the example were chosen arbitrarily to cover the sample's redshift space. The dashed lines represent their respective true redshifts.

In [None]:
#gals = np.random.choice(len(ztrue), 5)
gals = [540, 2256, 12175, 17802, 19502]
colors = sample.plot_pdfs(gals)

<a class="anchor" id="cdf_metrics"></a>
# CDF-based metrics

The folowing metrics are computed based on the photo-z PDFs. Let's create a Metrics object to access the basic metrics (e.g., PIT outlier rate, defined below) and basic plots. It is the parent class of other particular metrics. Instantiating a Metrics object can take a bit long depending on the sample's size because it preparates useful quantities to compute all metrics. 

In [None]:
%%time 
metrics = Metrics(sample)

<a class="anchor" id="pit"></a>

## PIT

The first metric available is the Probability Integral Transform (PIT), which is the Cumulative Distribution Function (CDF) of the photo-z PDF 

$$ \mathrm{CDF}(f, q)\ =\ \int_{-\infty}^{q}\ f(z)\ dz $$

evaluated at the galaxy's true redshift for every galaxy $i$ in the catalog.

$$ \mathrm{PIT}(p_{i}(z);\ z_{i})\ =\ \int_{-\infty}^{z^{true}_{i}}\ p_{i}(z)\ dz $$ 

For instance, the PIT values for the 5 PDFs shown above are:

In [None]:
metrics.pit[gals]

#### PIT outlier rate

The PIT outlier rate is a global metric defined as the fraction of galaxies in the sample with extreme PIT values (PIT $<10^{-4}$ or PIT $>0.9999$). The lower and upper limits for considering a PIT as outlier are optional parameters set at the Metrics instantiation. 

In [None]:
pit_out_rate = PitOutRate(metrics).pit_out_rate
print(f"PIT outlier rate of this sample: {pit_out_rate:.4f}") 

When instantiating an individual metric object like `pit_out_rate` above, it updates the parent `metrics` object with the metric value. The same is valid for all infividual metrics shown below.

In [None]:
metrics.pit_out_rate

<a class="anchor" id="qq"></a>
## PIT-QQ plot

The histogram of PIT values is a useful tool for a qualitative assessment of PDFs quality. It shows whether the PDFs are:
* biased (tilted PIT histogram)
* under-dispersed (excess counts close to the boudaries 0 and 1)
* over-dispersed (lack of counts close the boudaries 0 and 1)
* well-calibrated (flat histogram)

Following the standards in DC1 paper, the PIT histogram is accompanied by the quantile-quantile (QQ), which can be used to compare qualitatively the PIT distribution obtained with the PDFs agaist the ideal case (uniform distribution). The closer the QQ plot is to the diagonal, the better is the PDFs calibration. 

In [None]:
metrics.plot_pit_qq() #savefig=True) 

By default, the function `plot_pit_qq` displays both PIT histogram and the QQ plots together. The title and label are retrieved from sample's attributed for sample name and the photo-z (if not informed as optional parameters). It is also possible to select one plot at a time. 

In [None]:
metrics.plot_pit_qq(show_pit=False, show_pit_out_rate=False, title="QQ only")

It is possible ti set the number of bins in the PIT histigram. By default, it uses the same number of quantiles, which is an attribute of the metrics object.

In [None]:
metrics.n_quant

In [None]:
metrics.plot_pit_qq(show_qq=False, title="PIT only", bins=30)

The black horizontal line represents the ideal case where the PIT histogram would behave as a uniform distribution U(0,1). 

Let's explore a different number of quantiles. By default, a Metrics object is instantiated with parameter `n_quant` = 100 percentiles. To redefine the number of quantiles, let's create a new Metrics object.

In [None]:
metrics_20 = Metrics(sample, n_quant=20)

In [None]:
metrics_20.plot_pit_qq(title="N$_{quantiles}$=20")

<a class="anchor" id="summary_stats"></a>
# Summary statistics of CDF-based metrics

To evaluate globally the quality of PDFs estimates, `rail.evaluation` provides a set of metrics to compare the empirical distributions of PIT values with the reference uniform distribution, U(0,1). The individual metrics shown below are implemented as independent classes, in most cases receiving the metrics object as input, from where it can access the PIT array. These metrics can be accessed separately, or be called by one of the functions of the generic Metrics class (`markdown_metrics_table()` or `print_metrics_table()`), which calculate all metrics available at once. 

<a class="anchor" id="ks"></a>
### Kolmogorov-Smirnov  

Let's start with the Kolmogorov-Smirnov (KS) statistic test, which is the maximum difference between the empirical and the expected cumulative distributions of PIT values:

$$
\mathrm{KS} \equiv \max_{PIT} \Big( \left| \ \mathrm{CDF} \small[ \hat{f}, z \small] - \mathrm{CDF} \small[ \tilde{f}, z \small] \  \right| \Big)
$$

Where $\hat{f}$ is the PIT distribution and $\tilde{f}$ is U(0,1). Therefore, the smaller value of KS the closer the PIT distribution is to be uniform. The CDFs are calculated by qp.Ensamble CDF method. 

In [None]:
ks = KS(metrics)

In [None]:
ks.stat

In [None]:
ks.pvalue

Visual interpretation of the KS statistic:

In [None]:
ks.plot()

<a class="anchor" id="cvm"></a>
### Cramer-von Mises

Similarly, let's calculate the Cramer-von Mises (CvM) test, a variant of the KS statistic defined as the mean-square difference between the CDFs of an empirical PDF and the true PDFs:

$$ \mathrm{CvM}^2 \equiv \int_{-\infty}^{\infty} \Big( \mathrm{CDF} \small[ \hat{f}, z \small] \ - \ \mathrm{CDF} \small[ \tilde{f}, z \small] \Big)^{2} \mathrm{dCDF}(\tilde{f}, z) $$ 


on the distribution of PIT values, which should be uniform if the PDFs are perfect.

In [None]:
cvm = CvM(metrics)

In [None]:
cvm.stat

In [None]:
cvm.pvalue

<a class="anchor" id="ad"></a>
### Anderson-Darling 

Another variation of the KS statistic is the Anderson-Darling (AD) test, a weighted mean-squared difference featuring enhanced sensitivity to discrepancies in the tails of the distribution. 

$$ \mathrm{AD}^2 \equiv N_{tot} \int_{-\infty}^{\infty} \frac{\big( \mathrm{CDF} \small[ \hat{f}, z \small] \ - \ \mathrm{CDF} \small[ \tilde{f}, z \small] \big)^{2}}{\mathrm{CDF} \small[ \tilde{f}, z \small] \big( 1 \ - \ \mathrm{CDF} \small[ \tilde{f}, z \small] \big)}\mathrm{dCDF}(\tilde{f}, z) $$ 

In [None]:
ad = AD(metrics)

In [None]:
ad.stat

It is possible to remove catastrophic outliers before calculating the integral for the sake of preserving numerical instability. For instance, Schmidt et al. computed the Anderson-Darling statistic within the interval (0.01,0.99).

In [None]:
ad_cut = AD(metrics, ad_pit_min=0.01, ad_pit_max=0.99)

In [None]:
ad_cut.stat

<a class="anchor" id="cde_loss"></a>
# CDE Loss



In the absence of true photo-z posteriors, the metric used to evaluate individual PDFs is the **Conditional Density Estimate (CDE) Loss**, a metric analogue to the root-mean-squared-error:

$$ L(f, \hat{f}) \equiv  \int \int {\big(f(z | x) - \hat{f}(z | x) \big)}^{2} dzdP(x), $$ 

where $f(z | x)$ is the true photo-z PDF and $\hat{f}(z | x)$ is the estimated PDF in terms of the photometry $x$. Since $f(z | x)$  is unknown, we estimate the **CDE Loss** as described in [Izbicki & Lee, 2017 (arXiv:1704.08095)](https://arxiv.org/abs/1704.08095). :

$$ \mathrm{CDE} = \mathbb{E}\big(  \int{{\hat{f}(z | X)}^2 dz} \big) - 2{\mathbb{E}}_{X, Z}\big(\hat{f}(Z, X) \big) + K_{f},  $$


where the first term is the expectation value of photo-z posterior with respect to the marginal distribution of the covariates X, and the second term is the expectation value  with respect to the joint distribution of observables X and the space Z of all possible redshifts (in practice, the centroids of the PDF bins), and the third term is a constant depending on the true conditional densities $f(z | x)$. 

In [None]:
cde_loss = CDE(metrics).cde_loss
print(f"CDE loss of this sample: {cde_loss:.2f}") 

<a class="anchor" id="summary"></a>
# Summary

All metrics can be calculated at once and presented in a table by the main `metrics` object. 

In [None]:
metrics.markdown_metrics_table()