# Demonstration and Validation

## Random Forest Example

Start by training the standard random forest example.

In [1]:
from tomo_challenge import load_data, load_redshift
from tomo_challenge.jax_metrics import ell_binning, compute_scores
from tomo_challenge.classifiers.random_forest import RandomForest

Found classifier Random
Found classifier RandomForest
Found classifier IBandOnly


Specify the challenge data to load:

In [2]:
bands='riz'
include_colors=True
include_errors=True

Load the training data:

In [3]:
training_file='/media/data2/tomo_challenge_data/ugrizy/training.hdf5'
train_data_arr = load_data(training_file, bands, 
                           errors=include_errors,
                           colors=include_errors, array=True)
training_z = load_redshift(training_file)
print(f'Loaded {len(train_data_arr)} training rows.')



Loaded 8615613 training rows.


Load the validation data:

In [4]:
validation_file='/media/data2/tomo_challenge_data/ugrizy/validation.hdf5'
valid_data_arr = load_data(validation_file, bands, 
                           errors=include_errors,
                           colors=include_errors, array=True)
val_z = load_redshift(validation_file)
print(f'Loaded {len(valid_data_arr)} validation rows.')

Loaded 17228554 validation rows.


Initialize a random forest classifier with 4 bins:

In [6]:
nbins_rf = 4
classifier = RandomForest(bands, {'bins': nbins_rf})

Train on a fraction of the data:

In [7]:
ntrain_rf = 20000
classifier.train(train_data_arr[:ntrain_rf], training_z[:ntrain_rf])

Finding bins for training data
Fitting classifier




Test on a fraction of the validation data:

In [8]:
nvalid_rf = 50000
idx_rf = classifier.apply(valid_data_arr[:nvalid_rf])
z_rf = val_z[:nvalid_rf]

## Jax Cosmo Scores

Calculate scores using the jax_cosmo implementation provided with tomo_challenge:

In [17]:
%time scores_rf = compute_scores(idx_rf, z_rf, metrics=['SNR_3x2', 'FOM_3x2', 'FOM_DETF_3x2'])



CPU times: user 1min 7s, sys: 17.9 s, total: 1min 25s
Wall time: 1min 51s


In [18]:
scores_rf

{'SNR_3x2': 1183.4375,
 'FOM_3x2': 2127.108154296875,
 'FOM_DETF_3x2': 45.92446517944336}

## Fast Scores

Use reweighting to speed up the score calculation:

In [1]:
from zotbin.binned import *

Load the initialization data:

In [4]:
init_data = load_binned('binned_28.npz')

Calculate scores using the fast reweighting method:

In [None]:
%time scores = get_binned_scores(idx_rf, z_rf, *init_data)