# Demo: RAIL Evaluation 

_Sam Schmidt, Alex Malz, Julia Gschwend_ ([julia@linea.gov.br](mailto:julia@linea.gov.br))

The purpose of this notebook is to demonstrate the use of the metrics scripts to be used on the photo-$z$ PDF catalogs produced by the PZ working group. The first implementation of the _evaluation_ module is based on the refactoring of the algorithms used in [Schmidt et al. 2020](https://arxiv.org/pdf/2001.03621.pdf), available on Github repository [PZDC1paper](https://github.com/LSSTDESC/PZDC1paper). 

To run this code, you must install qp and have the notebook in the same directory as metrics.py. You must also install some run-of-the-mill Python packages: matplotlib, numpy, scipy, and skgof.

### Contents

* [Sample](#sample)
 - [Run FZBoost](#fzboost)
 - [Traditional validation plots](#old_valid)  
* [Metrics](#metrics)
 - [PIT](#pit) 
 - [QQ plot](#qq) 
 - [CDE loss](#cde) 
 - [KS](#ks) 
 - [CvM](#cvm) 
 - [AD](#ad) 
* [Summary](#summary)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Markdown

#import warnings
#warnings.filterwarnings('ignore')

from sample import Sample
from metrics import Metrics

%matplotlib inline
%load_ext autoreload
%autoreload 2

<a class="anchor" id="sample"></a>

# Sample  


To compute the photo-z metrics of a given test sample, it is necessary to read the output of a photo-z code containing galaxies' photo-z PDFs. Let's use the toy data available in `tests/data/` (**test_dc2_training_9816.hdf5** and **test_dc2_validation_9816.hdf5**) and the configuration file available in `examples/configs/FZBoost.yaml` to generate a small samples of photo-z PDFs using the **FZBoost** algorithm available on RAIL's _estimation_ module.

<a class="anchor" id="fzboost"></a>
### Run FZBoost

Go to dir  `<your_path>/RAIL/examples/` and run the command:

`python main.py configs/FZBoost.yaml`

The photo-z output files (inputs for this notebook) will be writen at: 

`<your_path>/RAIL/examples/results/FZBoost/test_FZBoost.hdf5`. 

<font color='red'>The new RAIL's version will produce output of the codes as qp files rather than the old format hdf5 files (Sam's message on Slack about RAIL's issue#33). TO DO: update the read() function of class Sample </font>

In [None]:
my_path = '/Users/julia/github/RAIL' # replace it by your path to RAIL's parent dir
pdfs_file = my_path + '/examples/results/FZBoost/test_FZBoost.hdf5'
ztrue_file = my_path + '/tests/data/test_dc2_validation_9816.hdf5'

Let's create a Sample object containing both the PDFs and true redshifts for each photo-z code.

In [None]:
sample = Sample(pdfs_file, ztrue_file, code="FZBoost", name="toy data")
sample

In [None]:
print(sample)

PDFs of 5 galaxies for illustration. The function `plot_pdfs` calls a _qp_ built-in plot function and returns the color codes of galaxies whose indexes are include in the list `gals`. The galaxies in the example were chosen arbitrarily to cover the sample's redshift space. 

In [None]:
#gals = np.random.choice(len(ztrue), 5)
gals = [540, 2256, 12175, 17802, 19502]
colors = sample.plot_pdfs(gals)

<a class="anchor" id="old_valid"></a>
### Validation plots

Traditional validation plots. The point colors (optional) follow the same color code as the PDFs above. 

<font color='red'>TO DO: update the plots below to look like Figure 4 from CHIPPR's paper ([Malz & Hogg 2020](https://arxiv.org/pdf/2007.12178.pdf)). </font>


In [None]:
sample.plot_old_valid(gals=gals, colors=colors)

<a class="anchor" id="metrics"></a>
# Metrics

The folowing metrics are computed based on the photo-z PDFs. Let's create a Metrics object to access the basic metrics (e.g., PIT outlier rate, defined below) and basic plots. It is the parent class of other particular metrics. 

Instantiating a Metrics object can take a bit long, depending on the sample size. 

In [None]:
metrics = Metrics(sample)

<a class="anchor" id="pit"></a>

## PIT

The first metric we calculate is the Probability Integral Transform (PIT), which is the Cumulative Distribution Function (CDF) 

\begin{equation*}
\mathrm{CDF}(f, q)\ =\ \int_{-\infty}^{q}\ f(z)\ dz,
\end{equation*}


evaluated at the galaxy's true redshift for every galaxy $i$ in the catalog.

\begin{equation*}
\mathrm{PIT}(p_{i}(z);\ z_{i})\ =\ \int_{-\infty}^{z^{true}_{i}}\ p_{i}(z)\ dz,
\end{equation*}


For instance, the PIT values for the 5 PDFs shown above are:

In [None]:
metrics.pit[gals]

#### PIT outlier rate

The PIT outlier rate is a global metric defined as the fraction of galaxies in the sample with extreme PIT values (PIT $<10^{-4}$ or PIT $>0.9999$). The lower and upper limits for considering a PIT as outlier are optional parameters set at the Metrics instantiation. 

In [None]:
print(f"PIT outlier rate of this sample: {metrics.pit_out_rate:.4f}") 

## PIT-QQ plot

The histogram of PIT values is a useful tool for a qualitative assessment of PDFs quality. It shows whether the PDFs are:
* biased (tilted PIT histogram)
* under-dispersed (excess counts close to the boudaries 0 and 1)
* over-dispersed (lack of counts close the boudaries 0 and 1)
* well-calibrated (flat histogram)

Following the standards in DC1 paper, the PIT histogram is accompanied by the quantile-quantile (QQ), which can be used to compare qualitatively the PIT distribution obtained with the PDFs agaist the ideal case (uniform distribution). The closer the QQ plot is to the diagonal, the better is the PDFs calibration. 

In [None]:
metrics.plot_pit_qq(savefig=True)

By default, the function `plot_pit_qq` displays both PIT histogram and the QQ plots together. The title and label are retrieved from sample's attributed for sample name and the photo-z (if not informed as optional parameters). It is also possible to select one plot at a time. 

In [None]:
metrics.plot_pit_qq(show_pit=False, show_pit_out_rate=False, title="QQ only")

It is possible ti set the number of bins in the PIT histigram. By default, it uses the same number of quantiles, which is an attribute of the metrics object.

In [None]:
metrics.n_quant

In [None]:
metrics.plot_pit_qq(show_qq=False, title="PIT only", bins=30)

The black horizontal line represents the ideal case where the PIT histogram would behave as a uniform distribution U(0,1). 

Let's explore a different number of quantiles. By default, a Metrics object is instantiated with parameter Nquants = 100 percentiles. 

In [None]:
metrics_20 = Metrics(sample, n_quant=20)

In [None]:
metrics_20.plot_pit_qq(title="N$_{quants}$=20")

<a class="anchor" id="cde"></a>
## CDE Loss

Next we can calculate the CDE loss described in Izbicki & Lee 2017 (arXiv:1704.08095)

$$ \int \int ((p(z \mid x) - \hat{p}(z \mid x))^{2} dz dP(x) $$

which extends L2 density estimation loss to conditional density estimation.  We can estimate this quantity (up to an unknown additive constant which depends on the true conditional densities) from data as

$$ \frac{1}{n} \sum_{i=1}^{n} \int \hat{p}^{2}(z \mid x_{i}) dz - \frac{2}{n} \sum_{i=1}^{n} \hat{p}(z_{i} \mid x_{i}) $$


In [None]:
print(f"CDE loss of this sample: {metrics.cde_loss:.2f}") 

<a class="anchor" id="ks"></a>
## Kolmogorov-Smirnov  


Next, we calculate the Kolmogorov-Smirnov (KS) test statistic,
\begin{equation*}
\mathrm{KS}(\{p_{i}(z)\}_{N};\ \{z_{i}\}_{N})\ =\ \max_{PIT}\left[ \left| CDF(\{PIT(p_{i}(z);\ z_{i})\}_{N}) - CDF(\{z_{i}\}_{N}) \right| \right],
\end{equation*}
on the distribution of PIT values, which should be uniform if the PDFs are perfect.

<font color='red'>WARNING: error when importing skgof -> No module named 'scipy._lib.six'.    </font>

From [issue#4 of skgof repository](https://github.com/wrwrwr/scikit-gof/issues/4): 
    
"Simply importing skgof gives this error.

Currently ecdfgof.py uses scipy._lib.six module which is not present in newer versions of scipy
(probably because they don't support python 2 anymore).
It should be updated to either use six directly or drop support for python 2 and use str instead of six.string_types." 
    


 <font color='red'>   Temporary solution: I changed the source code of skgof in my computer to use **six** instead of scipy._lib.six. If you downgrade scipy version, you break qp.  </font> 



In [None]:
ks_stat, ks_pval = metrics.KS()
print(ks_stat)
print(ks_pval)

<a class="anchor" id="cvm"></a>
## Cramer-von Mises

Similarly, we calculate the Cramer-von Mises (CvM) test statistic,
\begin{equation*}
\mathrm{CvM}(\{p_{i}(z)\}_{N};\ \{z_{i}\}_{N})\ =\ \int_{-\infty}^{\infty}\ \left(CDF(\{PIT(p_{i}(z);\ z_{i})\}_{N})\ -\ CDF(\{z_{i}\}_{N})\right)^{2}\ \mathrm{d}CDF(\{z_{i}\}_{N}),
\end{equation*}
on the distribution of PIT values, which should be uniform if the PDFs are perfect.

In [None]:
cvm_stat, cvm_pval = metrics.CvM() 
print(cvm_stat)
print(cvm_pval)

<a class="anchor" id="ad"></a>
## Anderson-Darling 

And the Anderson-Darling (AD) test statistic,
\begin{equation*}
\mathrm{AD}(\{p_{i}(z)\}_{N};\ \{z_{i}\}_{N})\ =\ \int_{-\infty}^{\infty}\frac{\left(CDF(\{PIT(p_{i}(z);\ z_{i})\}_{N})\ -\ CDF(\{z_{i}\}_{N})\right)^{2}}{CDF(\{z_{i}\}_{N})\ \left(1\ -\ CDF(\{z_{i}\}_{N})\right)}\ \mathrm{d}CDF(\{z_{i}\}_{N}),
\end{equation*}
on the distribution of PIT values, which should be uniform if the PDFs are perfect.  However, for this test, we cut the ends of the distribution, which represent catastrophic utliers.  


In [None]:
ad_stat, ad_critical_values, ad_sign_level = metrics.AD()
print (ad_stat)
print (ad_critical_values)
print (ad_sign_level)

<a class="anchor" id="summary"></a>
# Summary

In [None]:
metrics_table = metrics.all() 
Markdown(metrics_table)

<font color='red'> TO DO: IMPLEMENT BOOTSTRAP ERRORS. </font>

<font color='red'> TO DO: IMPLEMENT UNIT TESTS. </font>

<font color='red'> QUESTION: METRICS AS FUNCTIONS OR SUBCLASSES? </font>

In [None]:
x = [4, 8, 65, 0, 22, 1]

In [None]:
sorted(x)