# Step 0: RFs and DL2 (data and MCs) scripts

In this notebook, we will run the magic-cta-pipe (MCP) scripts on a small DL1 data sample. Due to time constraints, it is unfeasible to run the pipeline on the full dataset needed to produce plots (e.g. SED), so we will provide you a complete dataset to get to 'nice' plots and a few MCs and *.h5* data files to run the pipeline.  
You will have to provide the path to the MCP scripts and the paths (+ filenames) where you want to save the logging infos.

### This time we will import only a few basic modules

In [29]:
import glob
import logging
import os
import subprocess
import sys
log = logging.getLogger()

### Random Forest training

Here we will train the Random Forest (RF) with two regressors, i.e. reconstructed energy and direction (disp), and one gamma/hadron classifier which will provide a value of gammaness for each event.

Events are separated according to their `combo_type` (each event has only one combo type, i.e. MI+MII, LST1+MI, LST1+MII, LST1+MI+MII) and these subsamples are used to train telescope-wise RFs. So you will get 4 classifier output files (since you have 4 combinations) and in each file you will find one RF for every single telescope. 

Events used to train the RFs are extracted in a random way from the files listed in the input folders. Gammas are used to train all three types of RFs (i.e. energy, direction and gammaness), while protons are used only to train classifiers.

We can run the script directly in the terminal with (or here in the jupyter, as we show later):

>$ python lst1_magic_train_rfs.py -g Path1 -p Path2 -o Path3 -c configfile --train-energy --train-disp --train-classifier --use-unsigned

with the followin options:  

-g: MC stereo DL1 gammas directory (diffuse)

-p: MC stereo DL1 protons directory (train sample)

-o: directory to save the output

-c: configuration file

The inputs "--train-X" are used to train the energy regressor, the disp regressor and the classifier; you can train only one, two or all types of RF when you launch the script, however, you will need all them (energy, disp, classifier) for the next step, which is to convert DL1 to DL2.

In the configuration file, the option "gamma_offaxis" is used for selecting only MC gammas lying in a predefined ring. Here we use a ring with minimum radius of 0.2$^{\circ}$ and maximum of 0.5$^{\circ}$, while the wobble of our target is 0.4$^{\circ}$. If instead we set the minimum/maximum radius as "null", then we use all the gammas in the field.

Warning: In case of real training, you will need a lot of proton and gammas runs for every pointing direction, so it would be better to merge the input runs to have only one *.h5* for every pointing.

The input "--use-unsigned" serves to train the RFs with the features absolute values, which are "intensity", "lenght", etc as listed below. 

In [30]:
%%capture
os.system("sed -n '80,200p' /home/raniere/Documentos/MAGIC/School_notebooks/config_dyn.yaml")



energy_regressor:
    settings:
        n_estimators: 150
        criterion: "squared_error"
        max_depth: 50
        min_samples_split: 2
        min_samples_leaf: 2
        min_weight_fraction_leaf: 0.0
        max_features: 1.0
        max_leaf_nodes: null
        min_impurity_decrease: 0.0
        bootstrap: true
        oob_score: false
        n_jobs: 5
        random_state: 42
        verbose: 0
        warm_start: false
        ccp_alpha: 0.0
        max_samples: null

    features: [
        "intensity",
        "length",
        "width",
        "skewness",
        "kurtosis",
        "slope",
        "intensity_width_2",
        "h_max",
        "impact",
        "pointing_alt",
        "pointing_az",
    ]

    gamma_offaxis:
        min: 0.2 deg
        max: 0.5 deg


disp_regressor:
    settings:
        n_estimators: 150
        criterion: "squared_error"
        max_depth: 50
        min_samples_split: 2
        min_samples_leaf: 2
        min_weight_fraction_lea

Let's start by setting up the data paths:

In [31]:
f=open('/home/raniere/Documentos/MAGIC/School_notebooks/RF.log','w')

scripts=('/home/raniere/Documentos/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')
input_dir_gamma=('/home/raniere/Documentos/MAGIC/School_notebooks/data/MC/gammadiffuse/')
input_dir_proton=('/home/raniere/Documentos/MAGIC/School_notebooks/data/MC/protons/train/')
output_dir_rf=('/home/raniere/Documentos/MAGIC/School_notebooks/Results_RF/')
config=('/home/raniere/Documentos/MAGIC/School_notebooks/config_dyn.yaml')

Now we go to the scripts directory to launch them:

In [32]:
cd $scripts

/home/raniere/Documentos/magic-cta-pipe/magicctapipe/scripts/lst1_magic


Now we use python `subprocess.run()` to run the RF script and also get a log file:

In [33]:
a=subprocess.run(['python','lst1_magic_train_rfs.py', f'-g{input_dir_gamma}', f'-p{input_dir_proton}',\
    f'-o{output_dir_rf}', f'-c{config}', '--train-energy', '--train-disp', '--train-classifier',\
        '--use-unsigned'], stdout=f, stderr=f) 

We can check the log file here on Jupyter by doing:

In [34]:
more /home/raniere/Documentos/MAGIC/School_notebooks/RF.log

It must look like: 

```
Gamma off-axis angles allowed:
    min: 0.2 deg
    max: 0.5 deg 
    [...]
```    

### Converting DL1 to DL2 

Here we will use the RFs to convert DL1 into DL2 (for MCs and also real data). DL2 data contain events whose energy, direction and gammaness are evaluated by the RFs telescope-wise, such that, if the three telescopes see the same event, this event will have three energy/direction/gammaness values, one for each telescope.  

Since the events are separated into `combo_types`, we have to choose the right combo of RFs.

To reconstruct the arrival directions, MCP script uses the MARS-like DISP method, which looks for the minimum angular distance between all the head and tail candidates, as shown in the figure below:

![head](./figures/head_tail.png)


We can run the script directly in the terminal with (or here in the jupyter, as we show below):

>$ python lst1_magic_dl1_stereo_to_dl2.py -d Path1 -r Path2 -o Path3

with the followin options:  

-d: input file (DL1 stereo, MCs or real data; test sample gammas are ring-wobble ($0.4^{\circ}$), test sample protons are diffuse)

-r: directory where you stored your RFs

-o: output directory, to store DL2 files

#### Let's start with the MC

In [35]:
f=open('/home/raniere/Documentos/MAGIC/School_notebooks/DL2_mc.log','w')

scripts=('/home/raniere/Documentos/magic-cta-pipe/magicctapipe/scripts/lst1_magic/')
input_gamma=('/home/raniere/Documentos/MAGIC/School_notebooks/data/MC/gammas/test/*.h5')   #test gammas
input_proton=('/home/raniere/Documentos/MAGIC/School_notebooks/data/MC/protons/test/*.h5') #test protons
input_dir_rf=('/home/raniere/Documentos/MAGIC/School_notebooks/Results_RF/')
output_dir_dl2=('/home/raniere/Documentos/MAGIC/School_notebooks/DL2/')

cd to the scripts directory to launch them

In [36]:
cd $scripts

/home/raniere/Documentos/magic-cta-pipe/magicctapipe/scripts/lst1_magic


Lines to get files from gamma/proton folders 

In [37]:
input_file_gamma = glob.glob(input_gamma)
input_file_gamma.sort()
input_file_proton = glob.glob(input_proton)
input_file_proton.sort()

Here we use python `subprocess.run()` to run the script and get a log file

In [38]:
for input_file in input_file_gamma: 
    b=subprocess.run(['python','lst1_magic_dl1_stereo_to_dl2.py', f'-d{input_file}', f'-r{input_dir_rf}',\
        f'-o{output_dir_dl2}'], stdout=f, stderr=f)     
        
for input_file in input_file_proton:
    c=subprocess.run(['python','lst1_magic_dl1_stereo_to_dl2.py', f'-d{input_file}', f'-r{input_dir_rf}',\
        f'-o{output_dir_dl2}'], stdout=f, stderr=f)

To check the log file we do:

In [39]:
more /home/raniere/Documentos/MAGIC/School_notebooks/DL2_mc.log

This process just created DL2 files for the MC gammas and protons and put them in the DL2 directory.

### Real data

Let's repeat the process for the real data:

In [40]:
f=open('/home/raniere/Documentos/MAGIC/School_notebooks/DL2_data.log','w')

input_data=('/home/raniere/Documentos/MAGIC/School_notebooks/data/*.h5')
input_dir_rf=('/home/raniere/Documentos/MAGIC/School_notebooks/Results_RF/')
output_dir_dl2=('/home/raniere/Documentos/MAGIC/School_notebooks/DL2/')

cd to the scripts directory to launch them

In [41]:
cd $scripts

/home/raniere/Documentos/magic-cta-pipe/magicctapipe/scripts/lst1_magic


Lines to get files from data folder:

In [42]:
input_file_data = glob.glob(input_data)
input_file_data.sort()

Here we use python `subprocess.run()` to run the script and also get a log file

In [43]:
for input_file in input_file_data: 
    d=subprocess.run(['python','lst1_magic_dl1_stereo_to_dl2.py', f'-d{input_file}', f'-r{input_dir_rf}',\
        f'-o{output_dir_dl2}'], stdout=f, stderr=f)     

To check the log file:

In [45]:
more /home/raniere/Documentos/MAGIC/School_notebooks/DL2_data.log

This process just created DL2 files for the real data and put them in the DL2 directory.