# Step 1: RFs and DL2 (data and MCs) scripts

In this notebook, we will run the magic-cta-pipe (MCP) scripts on a small DL1 data sample. Due to time constraints, it is unfeasible to run the pipeline on a full dataset needed to produce meaningful plots, so we have  provided a complete dataset to get 'nice' plots and a few MCs and *.h5* data files to try to run the pipeline.  


### First we import a few basic modules

In [None]:
import glob
import logging
import os
import subprocess
import sys
log = logging.getLogger()

### Random Forest training

Here we will train the Random Forest (RF) with two regressors, i.e. reconstructed energy and disp, and one gamma/hadron classifier which will provide a value of gammaness for each event.

Events are separated according to their `combo_type` (each event has only one combo type, i.e. MI+MII, LST1+MI, LST1+MII, LST1+MI+MII) and these subsamples are used to train telescope-wise RFs. So you will get 4 classifier output files (since you have 4 combinations) and in each file you will find one RF for every single telescope. 

Events used to train the RFs are extracted in a random way from the files listed in the input folders. Gammas are used to train all three types of RFs (i.e. energy, direction and gammaness), while protons are used only to train the classifiers.

We could run the script directly in the terminal:

>$ python lst1_magic_train_rfs.py -g Path1 -p Path2 -o Path3 -c configfile --train-energy --train-disp --train-classifier --use-unsigned

with the following options:  

-g: MC stereo DL1 gammas directory (diffuse)

-p: MC stereo DL1 protons directory (train sample)

-o: directory to save the output

-c: configuration file

The inputs "--train-*" are used to train the energy regressor, the disp regressor and the gammaness classifier; we can train only one, two or all types of RFs when we launch the script and we will need all of them (energy, disp, classifier) for the next step, which is to convert DL1 to DL2.

In the MCP configuration file (config.yaml), the option "gamma_offaxis" is used for selecting only MC gammas lying in a predefined ring. Here we use a ring with minimum radius of 0.2$^{\circ}$ and maximum of 0.5$^{\circ}$, while the wobble of our target is 0.4$^{\circ}$. If instead we set the minimum/maximum radius as "null", then we use all the gammas in the field.

Warning: In case of real training, you will need a lot of proton and gammas runs for every pointing direction, so it would be better to merge the input runs to have only one *.h5* for every pointing.

With the option "--use-unsigned" absolute values of features are used to train the RFs. 

In [None]:
%%capture
#os.system("sed -n '80,200p' /fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/config_dyn.yaml")
os.system("sed -n '80,200p' /home/dipierr/data/software-school-2023/RFs_and_DL2/input/config_dyn.yaml")

Let's start by setting up the data paths (e.g.: /fefs/aswg/workspace/federico.dipierro/magic-sw-sc2023-data):

In [None]:
#the output will be the RFs
#dir_rf=('/fefs/aswg/workspace/federico.dipierro/magic-sw-sc2023-data/RF')
#dir_rf=('...../RF/')
dir_rf=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/RF/')

f=open(f'{dir_rf}/RF.log','w')

#the input files are the Train samples
#dir_gamma_train=('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_1/DL1_stereo/gamma/train/')
#dir_proton_train=('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_1/DL1_stereo/proton/train/')
dir_gamma_train=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/input_step_1/DL1_stereo/gamma/train/')
dir_proton_train=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/input_step_1/DL1_stereo/proton/train/')

#and the configuration file
#config=('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/config_dyn.yaml')
config=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/config_dyn.yaml')

# and the scripts' folders
#scripts=('/fefs/aswg/software/virtual_env/ctasoft/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')
scripts=('/home/dipierr/CTA/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')

Now we go to the scripts directory to launch them:

In [None]:
cd $scripts

Now we use python `subprocess.run()` to run the RF script and also get a log file:

In [None]:
a=subprocess.run(['python','lst1_magic_train_rfs.py', f'-g{dir_gamma_train}', f'-p{dir_proton_train}',\
    f'-o{dir_rf}', f'-c{config}', '--train-energy', '--train-disp', '--train-classifier',\
        '--use-unsigned'], stdout=f, stderr=f) 

We have a look at the produced output files:

In [None]:
ls $dir_rf/

We can check the log file here on Jupyter by doing:

In [None]:
more $dir_rf/RF.log

It must look like: 

```
Gamma off-axis angles allowed:
    min: 0.2 deg
    max: 0.5 deg 
    [...]
```    

With more input files (here 19 pointing directions, called nodes, each one with hundreds of runs), along the Crab Declination line:

In [None]:
more /home/dipierr/data/software-school-2023/RFs_and_DL2/log/RF_Train.log

### Converting DL1 to DL2 

Here we will use the RFs to convert DL1 into DL2 (both MCs and real data). DL2 data contain events whose energy, direction and gammaness are evaluated by the telescope-wise RFs, such that, if the three telescopes see the same event, this event will have three energy/direction/gammaness values, one for each telescope.  

Since the events are separated into `combo_types`, the script uses the appropriate RFs (i.e.: right telescope, right combo_type).

The reconstructed arrival direction by each individual telescope is defined by the MCP script using the MARS-like DISP method, which looks for the minimum angular distance between all the head and tail candidates, as shown in the figure below:

![head](./figures/head_tail.png)


We can run the script directly in the terminal, or here in the jupyter notebook, with:

>$ python lst1_magic_dl1_stereo_to_dl2.py -d Path1 -r Path2 -o Path3

with the following options:  

-d: input file (DL1 stereo, MCs or real data; test sample gammas are ring-wobble ($0.4^{\circ}$), test sample protons are diffuse)

-r: directory where you stored your RFs

-o: output directory, to store DL2 files

#### Let's start with the MC
In case the directories are not already there, create them.

In [None]:
#dir_dl2_g=('/fefs/aswg/workspace/federico.dipierro/magic-sw-sc2023-data/DL2/gamma')
#dir_dl2_p=('/fefs/aswg/workspace/federico.dipierro/magic-sw-sc2023-data/DL2/proton')
#dir_dl2_g=('...../DL2/gamma')
#dir_dl2_p=('...../DL2/proton')
dir_dl2_g=('/home/dipierr/data/software-school-2023/DL2/gamma')
dir_dl2_p=('/home/dipierr/data/software-school-2023/DL2/proton')

f_g=open(f'{dir_dl2_g}/DL2_mc_gamma.log','w')
f_p=open(f'{dir_dl2_p}/DL2_mc_proton.log','w')

#scripts=('/fefs/aswg/software/virtual_env/ctasoft/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')
scripts=('/home/dipierr/CTA/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')

#dir_dl1_gamma_test=('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_1/DL1_stereo/gamma/test/*.h5')   #test gammas
#dir_dl1_proton_test=('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_1/DL1_stereo/proton/test/*.h5') #test protons
dir_dl1_gamma_test=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/input_step_1/DL1_stereo/gamma/test/*.h5')   #test gammas
dir_dl1_proton_test=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/input_step_1/DL1_stereo/proton/test/*.h5') #test protons

cd to the scripts directory to launch them

In [None]:
cd $scripts

Lines to get files from gamma/proton folders 

In [None]:
input_file_gamma = glob.glob(dir_dl1_gamma_test)
input_file_gamma.sort()
input_file_proton = glob.glob(dir_dl1_proton_test)
input_file_proton.sort()

Here we use python `subprocess.run()` to run the script and get a log file

In [None]:
for input_file in input_file_gamma: 
    b=subprocess.run(['python','lst1_magic_dl1_stereo_to_dl2.py', f'-d{input_file}', f'-r{dir_rf}',\
        f'-o{dir_dl2_g}'], stdout=f_g, stderr=f_g)     
        
for input_file in input_file_proton:
    c=subprocess.run(['python','lst1_magic_dl1_stereo_to_dl2.py', f'-d{input_file}', f'-r{dir_rf}',\
        f'-o{dir_dl2_p}'], stdout=f_p, stderr=f_p)

To check the log file we do:

In [None]:
more $dir_dl2_g/DL2_mc_gamma.log

In [None]:
more $dir_dl2_p/DL2_mc_proton.log

This process just created DL2 files for the MC gammas and protons and put them in the DL2 directory.

### Real data

Let's repeat the process for the real data:

In [None]:
#dir_dl2_real=('/fefs/aswg/workspace/federico.dipierro/magic-sw-sc2023-data/DL2/real')
#dir_dl2_real=('...../DL2/real')
dir_dl2_real=('/home/dipierr/data/software-school-2023/DL2/real')

f=open(f'{dir_dl2_real}/DL2_data.log','w')

#scripts=('/fefs/aswg/software/virtual_env/ctasoft/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')
scripts=('/home/dipierr/CTA/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')

#input_data=('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_1/DL1_stereo/real/*.h5')
input_data=('/home/dipierr/data/software-school-2023/RFs_and_DL2/input/input_step_1/DL1_stereo/real/*.h5')

cd to the scripts directory to launch them

In [None]:
cd $scripts

Lines to get files from data folder:

In [None]:
input_file_data = glob.glob(input_data)
input_file_data.sort()

Here we use python `subprocess.run()` to run the script and also get a log file

In [None]:
for input_file in input_file_data: 
    d=subprocess.run(['python','lst1_magic_dl1_stereo_to_dl2.py', f'-d{input_file}', f'-r{dir_rf}',\
        f'-o{dir_dl2_real}'], stdout=f, stderr=f)     

To check the log file:

In [None]:
more $dir_dl2_real/DL2_data.log

This process just created DL2 files for the real data and put them in the DL2 directory.