# Random Forest Training for QSO targets selection

**Author:** Edmond Chaussidon (CEA Saclay) (edmond.chaussidon@cea.fr)

This notebook explains how the random forest files (for the targets selection) are generated. To have a brief overview of the QSO targets selection and an interpretability of this selection, see qso-dr8.ipynb notebook written for DR8s release (however no major changes are expected with DR9 !)

All the file are written and saved in NERSC : `/global/cfs/cdirs/desi/target/analysis/RF`

**/!\** **WARNING** This notebook had generated files in NERSC !! **PLEASE** change path and savename to don't overwritte data or be sure to keep alive the current files. **/!\**

The training is divided in three parts :
    * 1) data_collection : collect data from dr9
    * 2) data_preparation : build atributes for RF
    * 3) train_test_RF : training and some tests

**Remark :** The first part is time consumming and already saved in : `/global/cfs/cdirs/desi/target/analysis/RF/``


In [None]:
DIR = '/global/cfs/cdirs/desi/target/analysis/RF/'

In [None]:
from pathlib import Path
path_train = f'{Path().absolute()}/../../py/desitarget/train/'

-------
## 1)  data_collection

**REMARK:** Not necessary to run this section for the training if the files are existing in DIR **WARNING** 

In [None]:
from desitarget.train.data_collection.sweep_meta import sweep_meta

sweep_meta('dr9s', f'{DIR}dr9s_sweep_meta.fits')
sweep_meta('dr9n', f'{DIR}dr9n_sweep_meta.fits')

* Add your version of topcat in my_tractor_extract_batch.py :

    `STILTSCMD = 'java -jar -Xmx4096M /global/homes/e/edmondc/Software/topcat/topcat-full.jar -stilts'`
    
* If you want a version of topcat go [here](`http://www.star.bris.ac.uk/~mbt/topcat/`).


In [None]:
from desitarget.train.data_collection.my_tractor_extract_batch import my_tractor_extract_batch

#collect QSO sample
#my_tractor_extract_batch(16, f'{DIR}/QSO_DR9s.fits', 'dr9s', '0,360,-10,30', 'qso', path_train, DIR)

In [None]:
#collect stars sample
#my_tractor_extract_batch(4, f'{DIR}/STARS_DR9s.fits', 'dr9s', '320,340,-1.25,1.25', 'stars', path_train, DIR)

In [None]:
#collect test sample 
#my_tractor_extract_batch(4, f'{DIR}/TEST_DR9s.fits', 'dr9s', '30,45,-5,5', 'test', path_train, DIR)

--------
## 2) data_preparation 

**Remark :** We remove test region from training data in *data_preparation/Code/make_training_samples.py* (it is **hard coding**)  for the region 30<RA<45 & -5<DEC<5. 

**/!\** **Take CARE** if you don't use this region for the test_sample **/!\**

In [None]:
from desitarget.train.data_collection.make_training_samples import make_training_samples

make_training_sample(f'{DIR}QSO_DR9s.fits', f'{DIR}STARS_DR9s.fits', f'{DIR}QSO_TrainingSample_DR9s.fits', f'{DIR}STARS_TrainingSample_DR9s.fits')

In [None]:
from desitarget.train.data_collection.make_training_samples import make_test_samples

make_test_samples(f'{DIR}TEST_DR9s.fits', f'{DIR}TestSample_DR9s.fits')

------
## 3) train_test_RF


In [None]:
#Pipeline Congifuration (to generate RF with different hyperparameters)
from desitarget.train.train_test_rf.PipelineConfigScript import PipelineConfigScript

fpn_STARS_TrainingSample = f"{DIR}STARS_TrainingSample_DR9s.fits"
fpn_QSO_TrainingSample = f"{DIR}QSO_TrainingSample_DR9s.fits"
fpn_TestSample = f"{DIR}TestSample_DR9s.fits"
fpn_QLF = f"{path_train}data_preparation/ROSS4_tabR"
fpn_config = f"{DIR}config.npz"

PipelineConfigScript(fpn_QSO_TrainingSample, fpn_STARS_TrainingSample, fpn_TestSample, fpn_QLF, fpn_config)

In [None]:
#Random Forest training
from desitarget.train.train_test_rf.train_RF import train_RF

## il faut creer le dossier RFmodel et RFmodel_desitarget !!

#RF all-z training
train_RF(f'{DIR}config.npz', 'DR9s_LOW', f'{DIR}RFmodel/DR9s_LOW')

In [None]:
#RF Highz training
train_RF(f'{DIR}config.npz', 'DR9s_HighZ', f'{DIR}RFmodel/DR9s_HighZ')

In [None]:
#Sklearn to desitarget format
from desitarget.train.train_test_rf.Convert_to_DESI_RF import convert_and_save_to_desi 

RF_filename_input = f"{DIR}RFmodel/DR9s_LOW/model_DR9s_LOW_z[0.0, 6.0]_MDepth25_MLNodes850_nTrees500.pkl.gz"
RF_filename_output = f"{DIR}RFmodel_desitarget/rf_model_dr9.npz"
convert_and_save_to_desi(RF_filename_input, RF_filename_output)

RF_HighZ_filename_input = f"{DIR}RFmodel/RFmodel/DR9s_HighZ/model_DR9s_HighZ_z[3.2, 6.0]_MDepth25_MLNodes850_nTrees500.pkl.gz"
RF_HighZ_filename_output = f"{DIR}RFmodel_desitarget/rf_model_dr9_HighZ.npz"
convert_and_save_to_desi(RF_HighZ_filename_input, RF_HighZ_filename_output)

------------
## 4) Some tests

In [None]:
from desitarget.train.train_test_rf.Some_tests import make_some_tests_and_plots

inputFile = f'{DIR}TestSample_DR9s.fits'
RF_file = f'{DIR}RFmodel/DR9s_LOW/model_DR9s_LOW_z[0.0, 6.0]_MDepth25_MLNodes850_nTrees500.pkl.gz'
RF_Highz_file = f'{DIR}RFmodel/DR9s_HighZ/model_DR9s_HighZ_z[3.2, 6.0]_MDepth25_MLNodes850_nTrees500.pkl.gz'

make_some_tests_and_plots(inputFile, RF_file, RF_Highz_file)