# Goldenspike workflow: Iterating over parameters

**Authors:** Jennifer Scora

**Last run successfully:** Jan 16, 2025

This notebook shows how to run through the various stages of RAIL (creation, estimation, and evaluation) while looping over a specific parameter and comparing the resulting photometric redshift estimates. It also will show how to use multiprocessing with the interactive mode (if you want full MPI, or are running on very large datasets, we recommend running in pipeline mode (link)).

## Creation

### Set up 

Here we need a few configuration parameters to deal with differences in data schema between existing PZ codes. We also need to grab the data to use for training the flow engine. 

In [5]:
import rail.interactive as ri 
import numpy as np
import tables_io
from pzflow.examples import get_galaxy_data

In [6]:
bands = ["u", "g", "r", "i", "z", "y"]
band_dict = {band: f"mag_{band}_lsst" for band in bands}
rename_dict = {f"mag_{band}_lsst_err": f"mag_err_{band}_lsst" for band in bands}

In [7]:
catalog = get_galaxy_data().rename(band_dict, axis=1)

### Train the model

Here we need to train the normalizing flow that serves as the engine for the input data creation. 

In [9]:
flow_model = ri.creation.engines.flowEngine.flow_modeler(
    input=catalog,
    seed=0,
    phys_cols={"redshift": [0, 3]},
    phot_cols={
        "mag_u_lsst": [17, 35],
        "mag_g_lsst": [16, 32],
        "mag_r_lsst": [15, 30],
        "mag_i_lsst": [15, 30],
        "mag_z_lsst": [14, 29],
        "mag_y_lsst": [14, 28],
    },
    calc_colors={"ref_column_name": "mag_i_lsst"},
)

Inserting handle into data store.  input: None, FlowModeler
Training 30 epochs 
Loss:
(0) 21.3266
(1) 3.9686
(2) 1.9351
(3) 5.2006
(4) -0.3579
(5) 2.2561
(6) 1.5917
(7) 0.3691
(8) -1.0218
(9) inf
Training stopping after epoch 9 because training loss diverged.
Inserting handle into data store.  model: inprogress_model.pkl, FlowModeler


### Sample the model

Now we will use the flow to produce some synthetic data for our training data set, as well as for our test data set. 

In [10]:
train_data_orig = ri.creation.engines.flowEngine.flow_creator(
    n_samples=150, model=flow_model["model"], seed=1235
)

Inserting handle into data store.  model: <pzflow.flow.Flow object at 0x734b2c4bf6e0>, FlowCreator
Inserting handle into data store.  output: inprogress_output.pq, FlowCreator


In [11]:
test_data_orig = ri.creation.engines.flowEngine.flow_creator(
    model=flow_model["model"], n_samples=150, seed=1234
)

Inserting handle into data store.  model: <pzflow.flow.Flow object at 0x734b2c4bf6e0>, FlowCreator
Inserting handle into data store.  output: inprogress_output.pq, FlowCreator


### Degrade the data 

We'll start with the training set:

In [12]:
# training set

train_data_errs = ri.creation.degraders.photometric_errors.lsst_error_model(
    input=train_data_orig["output"], seed=66, renameDict=band_dict, ndFlag=np.nan
)

train_data_inc = (
    ri.creation.degraders.spectroscopic_degraders.inv_redshift_incompleteness(
        input=train_data_errs["output"], pivot_redshift=1.0
    )
)

train_data_conf = ri.creation.degraders.spectroscopic_degraders.line_confusion(
    input=train_data_inc["output"],
    true_wavelen=5007.0,
    wrong_wavelen=3727.0,
    frac_wrong=0.05,
    seed=1337,
)

train_data_cut = ri.creation.degraders.quantityCut.quantity_cut(
    input=train_data_conf["output"], cuts={"mag_i_lsst": 25.0}
)

train_data_pq = ri.tools.table_tools.column_mapper(
    input=train_data_cut["output"], columns=rename_dict
)

train_data = ri.tools.table_tools.table_converter(
    input=train_data_pq["output"], output_format="numpyDict"
)

Inserting handle into data store.  input: None, LSSTErrorModel
Inserting handle into data store.  output: inprogress_output.pq, LSSTErrorModel
Inserting handle into data store.  input: None, InvRedshiftIncompleteness
Inserting handle into data store.  output: inprogress_output.pq, InvRedshiftIncompleteness
Inserting handle into data store.  input: None, LineConfusion
Inserting handle into data store.  output: inprogress_output.pq, LineConfusion
Inserting handle into data store.  input: None, QuantityCut
Inserting handle into data store.  output: inprogress_output.pq, QuantityCut
Inserting handle into data store.  input: None, ColumnMapper
Inserting handle into data store.  output: inprogress_output.pq, ColumnMapper
Inserting handle into data store.  input: None, TableConverter
Inserting handle into data store.  output: inprogress_output.hdf5, TableConverter


In [13]:
train_table = tables_io.convertObj(train_data["output"], tables_io.types.PD_DATAFRAME)
train_table.head()

Unnamed: 0,redshift,mag_u_lsst,mag_err_u_lsst,mag_g_lsst,mag_err_g_lsst,mag_r_lsst,mag_err_r_lsst,mag_i_lsst,mag_err_i_lsst,mag_z_lsst,mag_err_z_lsst,mag_y_lsst,mag_err_y_lsst
0,0.855962,26.921288,0.518191,26.715498,0.166577,25.713875,0.061205,24.559689,0.035924,23.853496,0.036793,23.617346,0.067434
1,1.097255,25.961281,0.244931,25.391997,0.05237,24.562934,0.022214,23.6528,0.016456,12.798364,0.005,11.669353,0.005
2,0.675636,24.685646,0.082276,24.305139,0.020273,23.989574,0.013829,23.465489,0.014151,23.486676,0.026654,23.30277,0.051014
3,0.915506,27.172901,0.620556,26.029287,0.091936,25.455418,0.048657,24.776863,0.043546,23.993324,0.041643,24.004426,0.09488
4,0.903248,26.182291,0.293202,26.066274,0.094968,25.532632,0.05211,24.961394,0.051297,24.632329,0.073394,24.418066,0.136038


Now the testing set:

In [14]:
test_data_errs = ri.creation.degraders.photometric_errors.lsst_error_model(
    input=test_data_orig["output"], seed=58, renameDict=band_dict, ndFlag=np.nan
)

test_data_pq = ri.tools.table_tools.column_mapper(
    input=test_data_errs["output"], columns=rename_dict, hdf5_groupname=""
)

test_data = ri.tools.table_tools.table_converter(
    input=test_data_pq["output"], output_format="numpyDict"
)


Inserting handle into data store.  input: None, LSSTErrorModel
Inserting handle into data store.  output: inprogress_output.pq, LSSTErrorModel
Inserting handle into data store.  input: None, ColumnMapper
Inserting handle into data store.  output: inprogress_output.pq, ColumnMapper
Inserting handle into data store.  input: None, TableConverter
Inserting handle into data store.  output: inprogress_output.hdf5, TableConverter


In [15]:
test_table = tables_io.convertObj(test_data["output"], tables_io.types.PD_DATAFRAME)
test_table.head()

Unnamed: 0,redshift,mag_u_lsst,mag_err_u_lsst,mag_g_lsst,mag_err_g_lsst,mag_r_lsst,mag_err_r_lsst,mag_i_lsst,mag_err_i_lsst,mag_z_lsst,mag_err_z_lsst,mag_y_lsst,mag_err_y_lsst
0,0.103872,22.917006,0.01785,22.358142,0.006133,21.829599,0.005361,21.567418,0.005577,21.561119,0.006831,21.553433,0.011617
1,0.456719,24.266843,0.056936,23.647775,0.012,22.755872,0.006585,22.372556,0.007054,22.217685,0.009724,22.104082,0.017938
2,0.525806,24.73372,0.085816,23.612764,0.011697,22.26153,0.005725,21.592136,0.0056,21.274857,0.006175,20.927122,0.007803
3,0.541941,27.441969,0.745856,27.256551,0.261787,26.055322,0.08279,25.573101,0.08816,25.220535,0.122955,25.805544,0.424416
4,2.028277,28.468899,1.373169,26.902849,0.195207,26.958002,0.181059,26.737067,0.239168,25.599105,0.170267,25.153335,0.253024


## Estimation

### Train the informer

### Generate redshift estimates

### Plot?

## Evaluation

## Same but with multiprocessing