# Demo: RAIL Evaluation 

_Sam Schmidt, Alex Malz, Julia Gschwend_ ([julia@linea.gov.br](mailto:julia@linea.gov.br))

The purpose of this notebook is to demonstrate the use of the metrics scripts to be used on the photo-$z$ PDF catalogs produced by the PZ working group. The first implementation of the _evaluation_ module is based on the refactoring of the algorithms used in [Schmidt et al. 2020](https://arxiv.org/pdf/2001.03621.pdf), available on Github repository [PZDC1paper](https://github.com/LSSTDESC/PZDC1paper). 



In [None]:
import qp
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')


#import os
#import yaml
#from astropy.table import Table, QTable

from rail.evaluation.metrics import Data

%matplotlib inline
%load_ext autoreload
%autoreload 2

# 1.Data  

To compute the photo-z metrics of a given test sample, it is necessary to read the output of a photo-z code containing galaxies' photo-z PDFs. Let's generate a small sample of photo-z PDFs by running the FlexZBoost algorithm available in RAIL's _estimation_ module, using the toy data available in `tests/data/` (**test_dc2_training_9816.hdf5** and **test_dc2_validation_9816.hdf5**) and the config file available in `examples/config/`. 

First, a quick characterization of the validation sample... 

In [None]:
my_path = '/Users/julia/github/' # replace it by your path to RAIL's parent dir
valid_file = my_path + 'RAIL/tests/data/test_dc2_validation_9816.hdf5'

In [None]:
with h5py.File(valid_file ,'r') as valid_set:
    print(valid_set['photometry'].keys())

In [None]:
with h5py.File(my_path + 'RAIL/tests/data/test_dc2_validation_9816.hdf5' ,'r') as valid_set:
    ztrue = np.array(valid_set['photometry']['redshift'])
    mag_u = np.array(valid_set['photometry']['mag_u_lsst'])
    mag_g = np.array(valid_set['photometry']['mag_g_lsst'])
    mag_r = np.array(valid_set['photometry']['mag_r_lsst'])
    mag_i = np.array(valid_set['photometry']['mag_i_lsst'])
    mag_z = np.array(valid_set['photometry']['mag_z_lsst'])
    mag_y = np.array(valid_set['photometry']['mag_y_lsst'])

In [None]:
valid_df = pd.DataFrame({'z' : ztrue, 'mag u' : mag_u, 'mag g' : mag_g, 'mag r' : mag_r,
                         'mag i' : mag_i, 'mag z' : mag_z, 'mag y' : mag_y}) 
bands = ['u', 'g', 'r', 'i', 'z', 'y']

In [None]:
valid_df.describe()

In [None]:
plt.figure(figsize=(10,4))
plt.subplot(121)
sns.kdeplot(valid_df['z'], shade=True)
plt.xlabel('$z_{true}$')
plt.subplot(122)
for i, band in enumerate(bands):
    sns.kdeplot(valid_df[f'mag {band}'][valid_df[f'mag {band}']<40.], shade=True, label=band)
plt.xlim(18, 30)
plt.xlabel('mag')
plt.legend()
plt.tight_layout()

### Run FlexZBoost 

Go to dir  `<your_path>/RAIL/examples/` and run the command: `python main.py configs/FZBoost.yaml`.

The FlexZBoost's output file (our input file) will be writen at `<your_path>/RAIL/examples/results/FZBoost/test_FZBoost.hdf5`.

Let's create a Data object containing both the PDFs and true redshifts.

In [None]:
pdfs_file = my_path + 'RAIL/examples/results/FZBoost/test_FZBoost.hdf5'
data = Data(pdfs_file, valid_file)

In [None]:
print(data)

PDFs of 5 galaxies for illustration

In [None]:
#gals = np.random.choice(len(ztrue), 4)
gals = [540, 2256, 12175, 17802, 19502]
colors = data.plot_pdfs(gals)

Traditional validation plots (point colors follow the PDFs above)

In [None]:
data.old_valid_plots(gals, colors)

# 2. QQ plots

# 3. Metrics