#Supervised Fine-Tuning (SFT)

Author: Leo Du

Email: ldu@nvidia.com

**GOAL**

given a foundation model (in thie case llama-2-7B) that was pretrained on a broad, general purpose corpus, our goal is to fine tune the model on a specific task through supervised learning approach. SFT is a general purpose to improve model performance on a specific downstream task which is usually domain specific. SFT could be directly applied to a foundational model or a domain adapted pre trained model.

in this case we use open source verilog code dataset containing description of the verilog code in natural language as input and the actual verilog code as output. We demonstrate that SFT model trained on this specific dataset could be used for domain specific code generation given an input prompt, which would be very useful in developing coding copilot applications.

**Software Requirements**

1. access to latest NeMo framework NGC Containers
2. this playbook has been tested on: nvcr.io/nvidia/nemo:25.02'. it is expected to work similarly on other environments

In your terminal, launch the NeMo framework container

In [None]:
docker run -it -p 8080:8080 -p 8088:8088 --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:dev

In your terminal, launch Jupyter Notebook as follows

In [None]:
jupyter notebook --allow-root --ip 0.0.0.0 --port 8088 --no-browser --NotebookApp.token=''

**Hardware Requirements**

This playbook has been tested on 2xA100 80G but can be scaled to multiple GPUs as well as multiple nodes by modifying the appropriate parameters

**Step1**

download the llama-2-7b model from hugging face and convert it to .nemo format, remove the original download once conversion is complete

In [None]:
!git lfs install
!git clone https://huggingface.co/meta-llama/Llama-2-7b-hf

In [None]:
!cd Llama-2-7b-hf
!python3 ../convert.py
!cd ..
!rm -rf Llama-2-7b-hf/

**Step2**

download the verilog dataset, preprocess the dataset to train, validation, test split then run the supervised fine tuning step


In [None]:
from pathlib import Path
from typing import List, Optional

import nemo_run as run
import pytorch_lightning as pl
import torch
import wandb
from lightning.pytorch.loggers import WandbLogger
from megatron.core.inference.common_inference_params import CommonInferenceParams
from megatron.core.optimizer import OptimizerConfig
from verilog_data_module import VerilogDataModule

from nemo import lightning as nl
from nemo.collections import llm
from nemo.collections.llm import Llama2Config7B
from nemo.collections.llm.gpt.data.fine_tuning import FineTuningDataModule
from nemo.collections.llm.recipes.precision.mixed_precision import bf16_mixed
from nemo.lightning.io.mixin import IOMixin


# configure custom dataset
def verilog() -> run.Config[pl.LightningDataModule]:
    return run.Config(VerilogDataModule, seq_length=1024, micro_batch_size=2, global_batch_size=8, num_workers=8)


# configure trainer class similar to pytorch lightning trainer
def trainer() -> run.Config[nl.Trainer]:
    strategy = run.Config(nl.MegatronStrategy, tensor_model_parallel_size=2)
    trainer = run.Config(
        nl.Trainer,
        devices=2,
        max_steps=200,
        accelerator="gpu",
        strategy=strategy,
        plugins=bf16_mixed(),
        log_every_n_steps=40,
        limit_val_batches=2,
        val_check_interval=20,
        num_sanity_val_steps=0,
    )
    return trainer


# configure the logger
def logger() -> run.Config[nl.NeMoLogger]:
    ckpt = run.Config(
        nl.ModelCheckpoint,
        save_last=True,
        every_n_train_steps=40,
        monitor="val_loss",
        save_top_k=1,
        save_on_train_epoch_end=True,
        save_optim_on_train_end=True,
    )

    ## this is where hthe
    return run.Config(
        nl.NeMoLogger,
        name="sft_log",
        log_dir="//workspace",
        use_datetime_version=False,
        ckpt=ckpt,
        wandb=None,
    )


# configre the optimizer, adam with cosine annealing
def adam_with_cosine_annealing() -> run.Config[nl.OptimizerModule]:
    opt_cfg = run.Config(
        OptimizerConfig,
        optimizer="adam",
        lr=5e-5,
        adam_beta2=0.98,
        use_distributed_optimizer=True,
        clip_grad=1.0,
        bf16=True,
    )
    return run.Config(nl.MegatronOptimizerModule, config=opt_cfg)


# configure the base model
def llama2_7b() -> run.Config[pl.LightningModule]:
    return run.Config(llm.LlamaModel, config=run.Config(llm.Llama2Config7B))


# configure auto resume
def resume() -> run.Config[nl.AutoResume]:
    return run.Config(
        nl.AutoResume,
        restore_config=run.Config(
            nl.RestoreConfig,
            ## default path to save converted hf model
            path="/root/.cache/nemo/models/Llama-2-7b-hf",
        ),
        # requires completely saved checkpoint to resume from
        resume_if_exists=False,
    )


# with all above components created, call NeMo2.0 finetune API
def configure_finetuning_recipe():
    return run.Partial(
        llm.finetune,
        model=llama2_7b(),
        trainer=trainer(),
        data=verilog(),
        log=logger(),
        optim=adam_with_cosine_annealing(),
        resume=resume(),
    )


def local_executor_torchrun(nodes: int = 1, devices: int = 2) -> run.LocalExecutor:
    # Env vars for jobs are configured here
    env_vars = {
        "TORCH_NCCL_AVOID_RECORD_STREAMS": "1",
        "NCCL_NVLS_ENABLE": "0",
    }

    executor = run.LocalExecutor(ntasks_per_node=devices, launcher="torchrun", env_vars=env_vars)
    return executor


def main():
    print("preprocess data!")
    verilog = VerilogDataModule()
    verilog_data = verilog._download_data()
    verilog._preprocess_and_split_data(verilog_data)
    print("running supervised fine tuning!")
    run.run(configure_finetuning_recipe(), executor=local_executor_torchrun())


if __name__ == "__main__":
    main()

**Step3**

once the SFT step is complete, run the inference step to generate prediction on both base and SFT models

In [None]:
import os
from pathlib import Path
from typing import List, Optional

import nemo_run as run
import pytorch_lightning as pl
import torch
from megatron.core.inference.common_inference_params import CommonInferenceParams
from megatron.core.optimizer import OptimizerConfig
from run_sft import local_executor_torchrun, trainer

from nemo import lightning as nl
from nemo.collections import llm
from nemo.collections.llm import Llama2Config7B
from nemo.collections.llm.recipes.precision.mixed_precision import bf16_mixed
from nemo.lightning.io.mixin import IOMixin

input_data = "/workspace/data/verilog/test.jsonl"
base_llama_path = "/root/.cache/nemo/models/Llama-2-7b-hf"
sft_ckpt_path = str(
    next(
        (d for d in Path("/workspace/sft_log/checkpoints").iterdir() if d.is_dir() and d.name.endswith("-last")), None
    )
)

os.makedirs("/workspace/inference", exist_ok=True)
output_path_base = "/workspace/inference/base_llama_prediction.jsonl"
output_path_sft = "/workspace/inference/sft_prediction.jsonl"


# Configure inference to predict on base model checkpoint
def configure_inference_base():
    return run.Partial(
        llm.generate,
        path=str(base_llama_path),
        trainer=trainer(),
        input_dataset=input_data,
        inference_params=CommonInferenceParams(num_tokens_to_generate=50, top_k=1),
        output_path=output_path_base,
    )


# Configure inference to predict on trained DAPT checkpoint
def configure_inference_sft():
    return run.Partial(
        llm.generate,
        path=str(sft_ckpt_path),
        trainer=trainer(),
        input_dataset=input_data,
        inference_params=CommonInferenceParams(num_tokens_to_generate=50, top_k=1),
        output_path=output_path_sft,
    )


if __name__ == '__main__':
    print("running inference on base model")
    run.run(configure_inference_base(), executor=local_executor_torchrun())
    print("running inference on supervise fine tuned model")
    run.run(configure_inference_sft(), executor=local_executor_torchrun())

**Step4**

once the predictions are made, we evaluate the prediction's ROUGE scores. You should expect the SFT model's ROUGE score is much higher than that of the base model's scores.

In [None]:
!python3 /opt/NeMo/scripts/metric_calculation/compute_rouge.py --ground-truth /workspace/data/verilog/test.jsonl --preds /workspace/inference/base_llama_prediction.jsonl --answer-field "output" 
!python3 /opt/NeMo/scripts/metric_calculation/compute_rouge.py --ground-truth /workspace/data/verilog/test.jsonl --preds /workspace/inference/sft_prediction.jsonl --answer-field "output"