# CERN GitLab Project's instructions (AE in AD)

## 1. Setup Environment for the AD project

### 1.1. Welcome to CERN lxplus

lxplus is a gateway service provided by CERN for accessing the Linux computing resources of the Worldwide LHC Computing Grid (WLCG).

It is a part of CERN's centralized computing infrastructure which supports the massive data processing and analysis needs of  the LHC experiments.

<span style="color: red;"> Key question: What is the WLCG? </span>

It consists of a cluster of Linux servers running a standard CERN Linux distribution, which users can access via ssh.

<span style="color: red;"> Key question: What is a standard CERN Linux distribution? </span>

**Why running on CERN lxplus is important?**

Using CERN lxplus is important because of the computational resources, the environment, and the data access it provides.

* *You can access to HPC Resources!*
    *  This allows you do **Heavy Computation**. CERN lxplus provides you access to powerfil computing clusters that can handle large-scale computations, which are common in DL projects.
    *  You can do **Parallel Computing** (many calculations are carried out simultaneously; i.e, large problems can be divided into smaller ones, which can then be solved at the same time). CERN lxplus HPC resources can speed up DL models training and testing.

* *Proximity to Data*
    * CERN lxplus is directly connected to the storage systems where **CMS experiment data** is stored. This allows faster data retrieval and processing.
    * CERN lxplus allows you avoid the need to transfer **large datasets** to a local machine.

* *Consistency, Reproducibility and Collaboration*
    * lxplus environment provides a **standardized setup** that ensures consistency in running experiments, which allows scientific results reproducibility.
    * CERN lxplus is a **shared environment**. You can share code, date and results with a community of experts.
### 1.1. Install Miniconda

* *Accessing lxplus*
    * First, we need to log into the lxplus servers. For this, we must open the terminal and use SSH connection:

        `ssh -XY spaucarm@lxplus.cern.ch`

* *Install Miniconda3 on lxplus*
    * Miniconda is a minimal installer for Conda, a package and environment management system. For install Miniconda on lxplus, we have to do (staying at the terminal):

        <span style="color: green;"> # Download the Miniconda installer</span>
        
        `wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.10.3-Linux-x86_64.sh`

        <span style="color: green;"> # Run the installer</span>

        `bash ./Miniconda3-py39_4.10.3-Linux-x86_64.sh`

* *Clone the repository*
    * Next, clone the L1T AD Git Lab repository, which contains the DNN an CNN autoencoders code.a

       <span style="color: green;"> # Clone de repository</span>

        `git clone https://gitlab.cern.ch/l1a/l1_anomaly_ae.git`

       <span style="color: green;"> # Change directory to the repository (that is, we join to the L1T AD repository through lxplus)</span>

        `cd l1_anomaly_ae`

* *Setup the Conda environment*
    * Conda environments allow you to manage dependencies and versiones for projects. We can set up the environment using the `environment.yml` file included in the repository:

        <span style="color: green;"> # Create the conda environment (the .yml file will be read and a new environment with all the specified dependencies will be created)</span>

        `conda env create --file=environment.yml`

* *Activate the Conda Environment*
    * Every time we log in to lxplus and want to work with this project, we will need to activate the environment:

        `conda activate l1ad`

        <span style="color: green;"> # Now, we have our environment set up on CERN lxplus capable of training and testing AEs with CMS L1T emulator inputs</span>
        
### 1.2. Training and testing DNN models
Let's break down the detailed instructions for running the ML code base for AD.
Contenido

* *Train and evaluate DNN models*
    * The script `end2end.py` (included in the already cloned repository) handles the training and evaluation of DNN models using GT L1 information stored in HDF5 (`.h5`) files.

    *Key questions: Why does .h5 files are important for this project? What is HDF5? How is the GT L1 information about?*

* *Running the Full Workflow*
    * To run the full training-testing process, we must use the following command (we still remain in the terminal, in the l1ad Conda environment):

    `python dnn/end2end.py --run all --config dnn/config-PU65.yml`

    <span style="color: green;"> # python dnn/end2end.py: Executes the end2end.py script located in the 'dnn' directory. </span>

    <span style="color: green;"> # --run all: Specifies that both the training and testing steps should be executed. </span>

    <span style="color: green;"> # --config dnn/config-PU65.yml: Points to the YAML set up file that contains various settings for the workflow (file locations, model parameters and training options). </span>

     <span style="color: red;"> Key question: What is a YAML file? Why does it is important? </span>

* *Script Outputs*
    * The script performs the following actions: trains the DNN model using the specified dataset, evaluates the model's performance, saves the trained model and evaluation results/plots in the output directory defined in the YAML configuration file.

### 1.3. Training and testing DNN models (Script Options)

* *Using QKeras*
    * To use QKeras (a quantization extension of Keras for implementing quantized neural networks, add the `--model quantize` flag to the full workflow running command):

    `python dnn/end2end.py --run all --config dnn/config-PU65.yml --model_quantize`

* *Running Specific Steps*
    * If  you want to run only the training or evaluation step, you can set the `--run` option to `train` or `eval`:

    <span style="color: green;"> # Only training. </span>

    `python dnn/end2end.py --run train --config dnn/config-PU65.yml --model_quantize`

    <span style="color: green;"> # Only evaluation. </span>
    
    `python dnn/end2end.py --run eval --config dnn/config-PU65.yml --model_quantize`

* *Using Pre-dumped Data (Pickle File)*
    * During the training step, the script can dump data into a pickle file (in the YAML configuration file, and the file is saved in the same directory from whihc the script is run). If this file already exists and we want to skip this step, we use the `--laod_pickle` option:

    `python dnn/end2end.py --run all --config dnn/config-PU65.yml --load_pickle`

* *Using Pre-dumped Predictions (Results File)*
    * During the evaluation step, the script can dump predictions into a resultos file `results.h5`. If you want to skip this step in subsequent runs (assuming no changes in the trained model), use the `--load_results` option:

    `python dnn/end2end.py --run all --config dnn/config-PU65.yml --load_results`

### 1.4 Interactive Demo

There is an interactive Jupyter Notebook available for running the workflow (that provides an interactive interface for running the training and evaluation steps):

`jupyter notebook dnn/End2End_demo.ipynb`

### 1.5 Firmware

The workflow also involves generating high-level sysntesis (HLS) code from the neural network using `hls4ml`, which is then compiled into firmware. The firmware repository is `https://gitlab.cern.ch/ssummers/run3_ugt_ml`.

The compiled firmware is used ofr validation and is packaged into a Python module called `ugt_hls_emulator`.

 <span style="color: red;"> Key question: What does it mean with FIRMWARE? What is HLS? Why is it important?</span>

 ### 1.6. Training and testing DNN models (Deprecated. Why?)

 Even though this method is marked as deprecated, understanding its process can be provide useful insights. Why this process may have been deprecated?

* *Prepare the Data*

    To prepare date, was needed run:

    `python prepare_data.py --input-file QCD_preprocessed.h5 --input-bsm BSM_preprocessed.h5 --output-file data.pickle`

    <span style="color: green;"> Python prepare_data.py: Executes the prepare_data.py script. </span>

    <span style="color: green;"> --input-file QCD_preprocessed.h5: Specifies the input file for preprocessed QCD data.</span>

    <span style="color: green;"> --input-bsm BSM_preprocessed.h5: Specifies the input file for preprocessed BSM data.</span>

    <span style="color: green;"> --output-file data.pickle: Specifies the output file where the processed data will be saved in pickle format.</span>

    * Reason for deprecation: Hardcoding signals is not flexible or scalable, modern scripts might use more dynamic methods to handle data inputs (the script has a hardcoded list of BSM signals, which means if new signals are added or unused ones need to be removed, you have manually modify the script).

* *Training* 

    For training, was needed run:

    `python train.py --latent-dim 4 --output-model-h5 output_model.h5 --output-model-json output_model.json --input-data data.pickle --output-history history.h5 --batch-size 256 --n-epochs 30`

    <span style="color: green;"> Python train.py: Executes the train.py script. </span>

    <span style="color: green;"> --latent-dim 4: Setes the latent dimension of the model </span>

    <span style="color: green;"> --output-model-h5 output_model.h5: Specifies the filename for the trained model in HDF5 format. </span>

    <span style="color: green;"> --output-model-json output_model.json: Specifies the filename for the model architecture in JSON format. </span>

    <span style="color: green;"> --input-data data.pickle: Specifies the input data file in pickle format. </span>

    <span style="color: green;"> --output-history history.h5: Specifies the filename for saving the training history in HDF5 format. </span>

    <span style="color: green;"> --batch-size 256: Sets the batch size for training. </span>

    <span style="color: green;"> --n-epochs 30: Setes the number of epochs for training. </span>

    <span style="color: red;"> Key question: What is the latent dimension? </span>

    * Note that you can choose between two models: Convolutional VAE or Convolutional AE using the `--model-type` flag.

    * Reason for deprecation: The use of older model types and static configurations may have led to the development of more advanced or flexible training scripts. Current practices might prefer integration with newer libraries and tools.

* *Performance Evaluation*

    To see the performance, the command line was:

    `python evaluate.py --input-h5 output_model.h5 --input-json output_model.json --input-history history.h5 --output-result result.h5 --input-file data.pickle`

    <span style="color: green;"> python evaluate.py: Executes the evaluate.py script. </span>

    <span style="color: green;"> --input-h5 output_model.h5: Specifies the trained model file in HDF5 format. </span>

    <span style="color: green;"> --input-json output_model.json: Specifies the model architecture in JSON format. </span>

    <span style="color: green;"> --input-history history.h5: Specifies the history file in HDF5 format. </span>

    <span style="color: green;"> --output-result result.h5: Specifies the output file for saving evaluation results. </span>

    <span style="color: green;"> --input-file data.pickle: Specifies the input data file in pickle format. </span>

    * Reason for deprecation: The method may lack integration with newer evaluation metrics and tools. Modern Workflows might incorporate more streamlined and robust evaluation processes.

        * *Making ROC Curves, History, and Loss Distributions*

            For this, the command line is:

            `python plot.py --coord cyl --model vae --loss-type mse_kl --output-dir ./ --input-dir ./ --label my_vae`

            <span style="color: green;"> python plot.py: Executes the plot.py script. </span>

            <span style="color: green;"> -coord cyl: Sets the coordinate type (e.g., cylindrical). </span>

            <span style="color: green;"> --model vae: Specifies the model type (e.g. VAE). </span>

            <span style="color: green;"> --loss-type mse_kl: Setes the loss type (e.g. MSE and Kullback-Leibler divergence). </span>

            <span style="color: green;"> --output-dir ./: Specifies the output directory for the plots. </span>

            <span style="color: green;"> --input-dir ./: Specifies the input directory containing the required files. </span>

            <span style="color: green;"> --label my_vae: Sets a label for the plots. </span>

            * Reason for Deprecation: Plotting methods might be outdated or less customizable compared to newer visualization libraries. Modern plotting tools offer more flexibility and integration.