## **04.a Evaluation of FlexZBoost: Testing Trained Model**
#### Authors: **Amanda Farias (afariassantos2@gmail.com), Iago Lopes (iagolops2012@gmail.com)**,
#### Creation date: **07/20/2024**,  
#### Last Verifed to Run: **09/19/2024** (by @iagolops)

The objective of this notebook is to evaluate the performance of the machine learning model FlexZBoost, which was trained in *Notebook 3*. We utilize the test datasets defined in *Notebook 2* to assess the accuracy and reliability of the photo-z predictions. 
$~$
##### Logistics: This notebook is intended to be run through the Jupyter Lab NERSC interface available in __[Jupyter nersc](https://jupyter.nersc.gov/)__ in the **desc-python** kernel.

In [None]:
import h5py
import pickle
import tables_io
import numpy as np
import pandas as pd
from rail.core.stage import RailStage
from rail.core.data import Hdf5Handle
from rail.estimation.algos.flexzboost import FlexZBoostEstimator

## Params

<div class="alert alert-block alert-warning">
<b>ATTENTION:</b> This is a change you need to make to ensure the code works correctly, as it needs to run in your NERSC account.
</div> 

In [None]:
nersc_name = 'iago'

In [None]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True

path = "/global/u1/" + nersc_name[0] + "/" + nersc_name
sigma = 10

## Load the trained machine learning

In [None]:
# Opening Train file
with open(f'{path}/train_a_roman_fzb_y1_{sigma}sig.pkl', 'rb') as f:
    # Load the object
    train_file = pickle.load(f)

# Loading test file
test_sample = DS.read_file(path=f'{path}/roman_rubin_y1_a_test_{sigma}sig.hdf5', handle_class=Hdf5Handle, key='test_y1_a')

In [None]:
limits = []  # selecting the magnitude limits of each band for FlexzBoost settings

bands = [
    "mag_u_lsst", "mag_g_lsst", "mag_r_lsst", 
    "mag_i_lsst", "mag_z_lsst", "mag_y_lsst",
]
print(train_file.z_max)

for band in bands:
    df = pd.DataFrame(test_sample.data['photometry'])
    filtered_df = df[df[f'{band}'] < 99] # we can't count 99 values
    limits.append(np.round(max(filtered_df[band]), 2))

print(f'Limits: {limits}')

## Configuring FlexZBoost estimator and testing

In [None]:
estimate_fzb = FlexZBoostEstimator.make_stage(
    name=f'estimate_a_roman_fzb_y1_{sigma}sig',
    hdf5_groupname='photometry',
    bands = ['mag_u_lsst',
             'mag_g_lsst',
             'mag_r_lsst',
             'mag_i_lsst',
             'mag_z_lsst',
             'mag_y_lsst'],
    err_bands = ['mag_err_u_lsst',
                 'mag_err_g_lsst',
                 'mag_err_z_lsst',
                 'mag_err_i_lsst',
                 'mag_err_z_lsst',
                 'mag_err_y_lsst'],
    mag_limits={'mag_u_lsst':limits[0],
                'mag_g_lsst':limits[1],
                'mag_r_lsst':limits[2],
                'mag_i_lsst':limits[3],
                'mag_z_lsst':limits[4],
                'mag_y_lsst':limits[5],},
    model=train_file, # using train file from FlexZBoost
    filters="path",
    zmin=0,
    zmax = train_file.z_max,
    nzbins=301,
    chunk_size=500000,
    calculated_point_estimates=['zmode']
)

<div class="alert alert-block alert-danger">
<b>attention:</b> ~10 minutes to run</div>

In [None]:
%%time
output_fzb = estimate_fzb.estimate(test_sample) #testing
