# Parameter-Efficient Fine-Tuning (PEFT) with NeMo

In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html)
methods to showcase how to adapt a large language model (LLM) to 
a downstream task, such as financial sentiment predictions. 

With one line configuration change, you can try different PEFT techniques such as [p-tuning](https://arxiv.org/abs/2103.10385), [adapters](https://proceedings.mlr.press/v97/houlsby19a.html), or [LoRA](https://arxiv.org/abs/2106.09685), which add a small number of trainable parameters to the LLM
that condition the model to produce the desired output for the downstream task.

For more details, see the [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) in NeMo, which we adapt using NVFlare's Lightning client API to run in a federated scenario.

## Dependencies
We assume you followed the instructions [here](../../README.md#requirements) 
to install the NeMo framework and the NeMo-NVFlare package. 

## Download the pre-trained LLM
In this example, we use a `MegatronGPTModel`, a transformer-based language model based on the GPT architecture.

In [None]:
# Check what GPT .nemo models we have available on NGC
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
MegatronGPTModel.list_available_models()

In [None]:
# Download the model from NGC
import os
model_file = "megatron_gpt_345m.nemo"
if not os.path.isfile(model_file):
    !wget "https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/$model_file"
else:
    print(f"{model_file} already downloaded.")

## Data preprocessing
As our downstream task, we will use the [Financial PhraseBank dataset](https://huggingface.co/datasets/financial_phrasebank) for sentiment analysis.

The Financial PhraseBank dataset contains the sentiments for financial news headlines from a retail investor's perspective. Further details about the dataset can be found in Malo et al.'s ["Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts"](https://arxiv.org/abs/1307.5336).

We can configure the prompt template used by NeMo to solve this downstream task by setting `prompt_template: "{sentence} sentiment: {label}"` in [megatron_gpt_peft_tuning_config.yaml](./code/megatron_gpt_peft_tuning_config.yaml) accordingly.

#### 1. Download the preprocessing scripts
We use the preprocessing scripts provided by NeMo which can be downloaded from GitHub.

In [None]:
script_name = "prompt_learning_financial_phrase_bank_preprocessing.py"
if not os.path.isfile(script_name):
    !wget -N "https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/dataset_processing/nlp/financial_phrase_bank/$script_name"
else:
    print(f"{script_name} already downloaded.")

#### 2. Download the Financial PhraseBank Dataset

Download the `FinancialPhraseBank-v1.0.zip` dataset from [here](https://www.researchgate.net/profile/Pekka_Malo/publication/251231364_FinancialPhraseBank-v1.0/data/0c96051eee4fb1d56e000000/FinancialPhraseBank-v1.0.zip).

Then extract it under `./data`.

#### 3. Preprocess the dataset

In [None]:
!python3 prompt_learning_financial_phrase_bank_preprocessing.py

#### 4. Split the dataset to simulate clients
Next, we use three clients to simulate federated learning for running PEFT with NeMo. 
We use a [Dirichlet sampling](https://arxiv.org/abs/2002.06440) strategy for creating a heterogeneous partition. Smaller values of `alpha` cause higher heterogeneity.

In [None]:
from data.split_financial_phrase_data import clean_memmap

# Clean NeMo memmap data before running a new data split
clean_memmap("./data")

# Split the data
alpha = 10.0
assert isinstance(alpha, float), "Expecting float value in filepath names used below."
!python3 data/split_financial_phrase_data.py --alpha={alpha} --data_path=data/FinancialPhraseBank-v1.0/financial_phrase_bank_train.jsonl --num_clients=3 --out_dir=data/FinancialPhraseBank-v1.0_split

Below are some examples of how the training data is distributed amount the three clients when using different values of `alpha`.
<div>
<img src="./figs/summary_alpha1.0.svg" alt="Label distribution with alpha=1.0" style="width: 400px;"/>
<img src="./figs/summary_alpha5.0.svg" alt="Label distribution with alpha=5.0" style="width: 400px;"/>
<img src="./figs/summary_alpha10.0.svg" alt="Label distribution with alpha=10.0" style="width: 400px;"/>
</div>

## Federated learning simulations
Next, we are using NVFlare's [simulator](https://nvflare.readthedocs.io/en/latest/user_guide/fl_simulator.html) to simulate each client training on their own dataset locally and all three clients training together using the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm implemented in NVFlare.

With this setting, we require a GPU with at least 24GB of memory to run all clients in parallel on the same GPU. 
If you have multiple GPUs in your system, you can use the `gpu` argument to assign one GPU for each client, e.g., `gpu="0,1"`.

We will use NVFlare's job command for each setting to create the configurations needed to train the models based on the [sag_nemo](https://github.com/NVIDIA/NVFlare/blob/main/job_templates/sag_nemo/info.md) job template. This template allows the definition of different configurations for each client, which we will use to assign their local training data file to each of them.

#### 1. Convert NeMo PEFT script to FL

To run NeMo in an FL scenario, we convert the NeMo [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) using the new lightning client API. 

This conversion can be done with only a few lines of code changes, as highlighted in the figure below:

1. Import nvflare lightning api
2. Patch your lightning trainer
3. (Optionally) validate the current global model
4. Train as usually

<div>
<img src="./figs/lightning_client_api.png" alt="Drawing" style="width: 600px;"/>
</div>

You can directly use all the PEFT methods implemented in the NeMo script, by changing the value of [peft_scheme](./code/megatron_gpt_peft_tuning_config.yaml) in the client configuration shown below accordingly:
* p-tuning
* adapter + p-tuning
* adapter
* LoRa
* ia3

<div>
<img src="./figs/peft_config.png" alt="PEFT config" style="width: 700px;"/>
</div>

In this example, we will use LoRA to run the following experiments.

#### 1. Local training
First, we create the job files and modify them to include the data paths for each client and the pre-trained LLM using the `-f` option.
Note, the `app_config` options are specific to the app script (`megatron_gpt_peft_tuning.py`) and modify variables in the NeMo config file (`megatron_gpt_peft_tuning_config.yaml`) directly on execution.

At this point, we also modify the local number of clients, local steps and FL rounds to simulate local training. The PEFT method is [LoRA](https://arxiv.org/abs/2106.09685).

First, we set the location of NVFlare job templates directory.

In [None]:
!nvflare config -jt ../../../../job_templates

Then, create the job and configure it for simulating local training.

In [None]:
import os
peft_scheme="lora" # can be either ptuning, adapter, lora, or ia3

# Common configs
peft_scheme_arg=f"model.peft.peft_scheme\={peft_scheme}" 
app_script="megatron_gpt_peft_tuning.py"
restore_from_path=f"{os. getcwd()}/megatron_gpt_345m.nemo"
val_files=f"model.data.validation_ds.file_names\=\[{os. getcwd()}/data/FinancialPhraseBank-v1.0/financial_phrase_bank_val.jsonl\]"
train_files_prefix=f"model.data.train_ds.file_names\=\[{os. getcwd()}/data/FinancialPhraseBank-v1.0_split/alpha{alpha}_site"

# Simulate local training
num_rounds=1
trainer_config="trainer.max_steps\=1000 trainer.val_check_interval\=100"

!nvflare job create -force -j "./jobs/peft_{peft_scheme}_local_345M" -w "sag_nemo" -sd "code" \
   -f app_1/config_fed_client.conf app_script={app_script} app_config="{peft_scheme_arg} model.restore_from_path\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-1.jsonl\]" \
   -f app_2/config_fed_client.conf app_script={app_script} app_config="{peft_scheme_arg} model.restore_from_path\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-2.jsonl\]" \
   -f app_3/config_fed_client.conf app_script={app_script} app_config="{peft_scheme_arg} model.restore_from_path\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-3.jsonl\]" \
   -f app_server/config_fed_server.conf num_rounds={num_rounds} restore_from_path={restore_from_path}

Next, simulate each client training on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 1000 steps on their local dataset.

In [None]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder=f"jobs/peft_{peft_scheme}_local_345M",
    workspace=f"/tmp/nvflare/nemo/peft_{peft_scheme}_local_345M_alpha{alpha}",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

#### 2. Federated training
Next, we use the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm to adapt the model in a federated scenario. First, create and modify the configuration files again. 
This time, we increase the number of FL rounds and decrease the number of local steps per round to match the federated scenario. 

Here, each client runs LoRA for one 200 steps before sending their local model updates to the server for aggregation. This is repeated for 5 FL rounds. All the other parameters are the same as above.

In [None]:
# FedAvg setting
num_rounds=5
trainer_config="trainer.max_steps\=200 trainer.val_check_interval\=100"

!nvflare job create -force -j "./jobs/peft_{peft_scheme}_fedavg_345M" -w "sag_nemo" -sd "code" \
   -f app_1/config_fed_client.conf app_script={app_script} app_config="{peft_scheme_arg} model.restore_from_path\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-1.jsonl\]" \
   -f app_2/config_fed_client.conf app_script={app_script} app_config="{peft_scheme_arg} model.restore_from_path\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-2.jsonl\]" \
   -f app_3/config_fed_client.conf app_script={app_script} app_config="{peft_scheme_arg} model.restore_from_path\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-3.jsonl\]" \
   -f app_server/config_fed_server.conf num_rounds={num_rounds} restore_from_path={restore_from_path}

Next, simulate the federated training using FedAvg. 

In [None]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder=f"jobs/peft_{peft_scheme}_fedavg_345M",
    workspace=f"/tmp/nvflare/nemo/peft_{peft_scheme}_fedavg_345M_alpha{alpha}",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

You can visualize the training process using TensorBoard

In [None]:
!tensorboard --logdir /tmp/nvflare/nemo

## Results
In this scenario, all clients utilize the same validation set, allowing for a direct comparison between the locally p-tuned and federated global models. As anticipated, the FedAvg-trained models achieve a higher overall mean accuracy than those trained solely on their local datasets for different values of `alpha`. This is because the global model has access to all client datasets and can, consequently, generalize better, especially in settings of higher client data heterogeneity.

Below are some examples of how the training data is distributed among the three clients when using different values of `alpha`. The lines show the mean accuracy of local models during training and shaded areas indicate the 95% confidence interval. 
<div>
<img src="./figs/val_accuracy_alpha1.0.svg" alt="Validation accuracy with alpha=1.0" style="width: 400px;"/>
<img src="./figs/val_accuracy_alpha5.0.svg" alt="Validation accuracy with alpha=5.0" style="width: 400px;"/>
<img src="./figs/val_accuracy_alpha10.0.svg" alt="Validation accuracy with alpha=10.0" style="width: 400px;"/>
</div>

## Inference

We can use `model.generate()` to run inference after adapting the model. 
Let's define some test examples to feed to the tuned model to see its predictions.

In [None]:
prompt = " sentiment:"
test_examples = [f"The products have a low salt and fat content .{prompt}",
    f"The agreement is valid for four years .{prompt}",
    f"Diluted EPS rose to EUR3 .68 from EUR0 .50 .{prompt}",
    f"The company is well positioned in Brazil and Uruguay .{prompt}",
    f"Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier .{prompt}",
]

First, we need to convert the best global PEFT model into a NeMo ckpt.

In [None]:
import os
from nemo_nvflare.utils import convert_global_to_ckpt
server_workspace = f"/tmp/nvflare/nemo/peft_{peft_scheme}_fedavg_345M_alpha{alpha}/simulate_job/app_server"
global_model_filepath = os.path.join(server_workspace, "best_FL_global_model.pt")
assert global_model_filepath.endswith(".pt")
ckpt_path = global_model_filepath.replace(".pt", ".ckpt")
convert_global_to_ckpt(global_model_filepath, ckpt_path)

Next, we will load the global model.

In [None]:
from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.collections.nlp.parts.peft_config import PEFT_CONFIG_MAP
from omegaconf import OmegaConf

# Load model configuration inference of the global model
cfg = OmegaConf.load("code/megatron_gpt_peft_fl_eval_config.yaml")

# Build trainer
trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()

# Set restore from paths with pre-trained model(s)
cfg.model.restore_from_path = "megatron_gpt_345m.nemo"

# Set the global peft weights
cfg.model.peft.restore_from_path = ckpt_path

model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg)
model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)
peft_cfg_cls = PEFT_CONFIG_MAP[cfg.model.peft.peft_scheme]

print("PEFT Weights will be loaded from", cfg.model.peft.restore_from_path)
model.load_adapters(cfg.model.peft.restore_from_path, peft_cfg_cls(model_cfg))
model.freeze()

print("Model initialized", type(model))

Run the model

In [None]:
# Adjust the sampling parameters as needed
sampling_params = {
    "use_greedy": True,
    "temperature": 1.0,
    "top_k": 0,
    "top_p": 0.9,
    "repetition_penalty": 1.2,
    "add_BOS": False,
    "all_probs": False,
    "compute_logprob": False,
    "end_strings": ["<|endoftext|>", "<extra_id_1>"],
}

response = model.generate(inputs=test_examples, length_params=None, sampling_params=sampling_params)

print('The prediction results of some sample queries with the trained model:')
for result in response['sentences']:
    print("-" * 30)
    print(result)

The expected output of a well-trained model looks something like this. Note, the test sentences do not include ground truth labels.

>      The products have a low salt and fat content . sentiment: neutral
>      ------------------------------
>      The agreement is valid for four years . sentiment: neutral
>      ------------------------------
>      Diluted EPS rose to EUR3 .68 from EUR0 .50 . sentiment: positive
>      ------------------------------
>      The company is well positioned in Brazil and Uruguay . sentiment: positive
>      ------------------------------
>      Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier . sentiment: negative
>      ------------------------------