<img src="./images/DLI_Header.png" style="width: 400px;">


# 5. Introduction to NeMo Framework Launcher

In this notebook, we will learn how to use [NeMo Framework Launcher](https://github.com/NVIDIA/NeMo-Megatron-Launcher) to conveniently generate configuration files and SLURM scripts for NeMo jobs.

## The goals

The goals of this notebook are to:
* Learn how to use NeMo Framework Launcher to speed up launching end-to-end NeMo Framework training jobs
* Understand how to examine intermediate configuration files and scripts

---
# NeMo Framework Launcher

The NeMo Framework Launcher is designed to be a simple and easy to use tool for launching NeMo Framework training jobs on CSPs or on-prem clusters.

It takes care of generating and launching job submission scripts, as well as storing job results. It also comes packaged with tested configuration files, which can be easily modified by user.

<img src="images/nemo_launcher.png" width="800"/>

The most convenient way to use NeMo Framework Launcher is with the [NeMo FW Container](https://registry.ngc.nvidia.com/orgs/ea-bignlp/teams/ga-participants/containers/nemofw-training), the access for which can be applied for [here](https://developer.nvidia.com/nemo-framework).

Let's get started with NeMo Framework Launcher. We will begin with examining an [example configuration file for GPT3 126m](./code/NeMo-Megatron-Launcher/launcher_scripts/conf/training/gpt3/126m.yaml):

In [None]:
!cat /dli/code/NeMo-Megatron-Launcher/launcher_scripts/conf/training/gpt3/126m.yaml | head -25

Note the similarity with some arguments we modified in the previous notebooks, in particular, the `trainer` section. Just as before, we will need to overwrite some of them directly from the command line using [Hydra](https://hydra.cc/docs/intro/). Note that we can omit the section with SLRUM configuration.

As NeMo Framework Launcher supports running different types of training/finetuning/data preparation jobs, we will need to specify the desired model development stage in `stages=[]` and prepend job arguments with corresponding prefix. For us, that would be `training`.

In [None]:
%%writefile /dli/code/pretrain_gpt_126m_nemo_fw.sh   

# Distributed training args
NNODES=2
GPUS_PER_NODE=2
TP_SIZE=1
PP_SIZE=1

# Distributed training 
MICRO_BATCH_SIZE=4    
GLOBAL_BATCH_SIZE=64

# Data Paths
VOCAB_FILE=/dli/data/GPT-2_assets/gpt2-vocab.json
MERGE_FILE=/dli/data/GPT-2_assets/gpt2-merges.txt
DATA_PATH=[1.0,/dli/data/GPT-2_assets/my-gpt2_text_document]

OUTPUT_PATH=/dli/nemo
LOGS_PATH=/dli/nemo/logs
NAME="GPT_126m_NeMo_FW"      


OPTIMIZER_ARGS=" \
            training.model.optim.name=fused_adam \
            training.model.optim.betas=[0.9,0.95] \
            training.model.optim.lr=6e-5 \
            training.model.optim.sched.min_lr=6e-6 \
            training.model.optim.sched.name=CosineAnnealing \
            +training.model.optim.sched.max_steps=800 \
            training.model.optim.sched.warmup_steps=80 \
            training.model.optim.weight_decay=1e-1 \
        "

# NeMo Framework Launcher arguments
LAUNCHER_ARGS=" \
            cluster_type=bcm \
            stages=[training] \
            training=gpt3/126m \
            training_config=gpt3/126m \
            launcher_scripts_path=/dli/code/NeMo-Megatron-Launcher/launcher_scripts \
            "

# Search path for NeMo example configs
HYDRA_ARGS=" \
            training.hydra.searchpath=[file:///dli/code/NeMo/examples/nlp/language_modeling/conf]
        "

# Trainer arguments
TRAINER_ARGS=" \
            training.trainer.devices=$GPUS_PER_NODE \
            training.trainer.num_nodes=$NNODES \
            training.trainer.max_steps=1000 \
            +training.trainer.enable_model_summary=true \
            training.trainer.log_every_n_steps=10 \
            training.trainer.val_check_interval=20 \
            training.trainer.limit_val_batches=10 \
            +training.trainer.use_profiler=true \
        "

GPT_ARGS=" \
            training.model.micro_batch_size=$MICRO_BATCH_SIZE \
            training.model.global_batch_size=$GLOBAL_BATCH_SIZE \
            training.model.tokenizer.vocab_file=$VOCAB_FILE \
            training.model.tokenizer.merge_file=$MERGE_FILE \
            $OPTIMIZER_ARGS \
        "

OUTPUT_ARGS=" \
            training.run.results_dir=$OUTPUT_PATH/$NAME \
            training.exp_manager.explicit_log_dir=$OUTPUT_PATH/$NAME \
            training.exp_manager.resume_if_exists=false \
            training.exp_manager.name=$NAME \
        "

PARALLEL_ARGS=" \
            training.model.tensor_model_parallel_size=$TP_SIZE \
            training.model.pipeline_model_parallel_size=$PP_SIZE \
        "

CMD=" \
            python /dli/code/NeMo-Megatron-Launcher/launcher_scripts/main.py \
            $LAUNCHER_ARGS \
            $HYDRA_ARGS \
            $TRAINER_ARGS \
            $GPT_ARGS \
            $OUTPUT_ARGS \
            $PARALLEL_ARGS \
            training.model.data.data_prefix=$DATA_PATH \
            training.model.data.data_impl=mmap \
            training.model.data.splits_string=\"949,50,1\" \
        "

$CMD

Let's run the NeMo Framework Launcher script. It will generate the submission script and will run it. 

**Special Warning:** running generated script will fail, as it's expected to be run in a different environment compared to the one used in the course. We are just interested in examining the resulting script.

In [None]:
!bash /dli/code/pretrain_gpt_126m_nemo_fw.sh

Let's examine the generated script. 

In [None]:
!cat /dli/nemo/GPT_126m_NeMo_FW/nemo-megatron-gpt3_126m_submission.sh

As you can see, the script already contains configured parameters for SLURM (starting from `#SBATCH`), updated environment variables, and command to run the job (starting from `srun`). NeMo Framework Launcher also generated .yaml config file for the job in `/dli/nemo/GPT_126m_NeMo_FW` with a name of `gpt3_126m_hydra.yaml`.

Note the generated run command, starting from `srun`. It has arguments `container-image` and `container-mounts`, as it's expected that every node in the cluster will start the necessary container (in this case, `ga-participants/nemofw-training`), and run the job inside it. Running containerized workloads requires installing [enroot](https://github.com/NVIDIA/enroot) and [Pyxis](https://github.com/NVIDIA/pyxis) on your SLURM cluster. In this course, we are not going to run the generated script due to differences between the course environment and NeMo Framework Launcher containers, as well as to avoid running Docker containers inside Docker containers.

Finally, let's examine the [generated config file](./nemo/GPT_126m_NeMo_FW/gpt3_126m_hydra.yaml):

In [None]:
!cat /dli/nemo/GPT_126m_NeMo_FW/gpt3_126m_hydra.yaml | head -25

As you can see, with some slight differences it has the same structure and is intended to run in a similar way to how we ran NeMo Framework jobs before.

---
<h2 style="color:green;">Congratulations!</h2>

In the next lab, we will experiment with other techniques used for training large-scale neural networks and demonstrate their usage for Computer Vision. Move on to [06_Multi-Nodes_Distributed_Training_for_Computer_Vision.ipynb](06_Multi-Nodes_Distributed_Training_for_Computer_Vision.ipynb).