# Parameter-Efficient Fine-Tuning (PEFT) with NeMo

In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
methods to showcase how to adapt a large language model (LLM) to 
a downstream task, such as financial sentiment predictions. 

With one line configuration change, you can try different PEFT techniques such as [p-tuning](https://arxiv.org/abs/2103.10385), [adapters](https://proceedings.mlr.press/v97/houlsby19a.html), or [LoRA](https://arxiv.org/abs/2106.09685), which add a small number of trainable parameters to the LLM
that condition the model to produce the desired output for the downstream task.

For more details, see the [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) in NeMo, which we adapt using NVFlare's Lightning client API to run in a federated scenario.

## Dependencies
We assume you followed the instructions [here](../../README.md#requirements) 
to install the NeMo framework and the NeMo-NVFlare package. 

## Download the pre-trained LLM
In this example, we use a `MegatronGPTModel`, a transformer-based language model based on the GPT architecture.

In [None]:
# Check what GPT .nemo models we have available on NGC
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
MegatronGPTModel.list_available_models()

In [None]:
# Download the model from NGC
import os
model_file = "megatron_gpt_345m.nemo"
if not os.path.isfile(model_file):
    !wget "https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/$model_file"
else:
    print(f"{model_file} already downloaded.")

## Data preprocessing
As our downstream task, we will use the [Financial PhraseBank dataset](https://huggingface.co/datasets/financial_phrasebank) for sentiment analysis.

The Financial PhraseBank dataset contains the sentiments for financial news headlines from a retail investor's perspective. Further details about the dataset can be found in Malo et al.'s ["Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts"](https://arxiv.org/abs/1307.5336).


#### 1. Download the preprocessing scripts
We use the preprocessing scripts provided by NeMo which can be downloaded from GitHub.

In [None]:
script_name = "prompt_learning_financial_phrase_bank_preprocessing.py"
if not os.path.isfile(script_name):
    !wget -N "https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/dataset_processing/nlp/financial_phrase_bank/$script_name"
else:
    print(f"{script_name} already downloaded.")

#### 2. Download the Financial PhraseBank Dataset

Download the `FinancialPhraseBank-v1.0.zip` dataset from [here](https://www.researchgate.net/profile/Pekka_Malo/publication/251231364_FinancialPhraseBank-v1.0/data/0c96051eee4fb1d56e000000/FinancialPhraseBank-v1.0.zip).

Then extract it under `./data`.

#### 3. Preprocess the dataset

In [None]:
!python3 prompt_learning_financial_phrase_bank_preprocessing.py

#### 4. Split the dataset to simulate clients
Next, we use three clients to simulate federated learning for p-tuning with NeMo.

In [None]:
!python3 data/split_financial_phrase_data.py --data_path data/FinancialPhraseBank-v1.0/financial_phrase_bank_train.jsonl --num_clients 3 --out_dir data/FinancialPhraseBank-v1.0_split

## Federated learning simulations
Next, we are using NVFlare's [simulator](https://nvflare.readthedocs.io/en/latest/user_guide/fl_simulator.html) to simulate each client training on their own dataset locally and all three clients training together using the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm implemented in NVFlare.

With this setting, we require a GPU with at least 16GB of memory to run all clients in parallel on the same GPU. 
If you have multiple GPUs in your system, you can use the `gpu` argument to assign one GPU for each client, e.g., `gpu="0,1"`.

We will use NVFlare's job command for each setting to create the configurations needed to train the models based on the [sag_nemo](https://github.com/NVIDIA/NVFlare/blob/main/job_templates/sag_pt_deploy_map/info.md) job template. This template allows the definition of different configurations for each client, which we will use to assign their local training data file to each of them.

#### 1. Convert NeMo PEFT script to FL

To run NeMo in an FL scenario, we convert the NeMo [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) using the new lightning client API. 

This conversion can be done with only a few lines of code changes, as highlighted in the figure below:

1. Import nvflare lightning api
2. Patch your lightning trainer
3. (Optionally) validate the current global model
4. Train as usually

<div>
<img src="./figs/lightning_client_api.png" alt="Drawing" style="width: 600px;"/>
</div>

You can directly use all the PEFT methods implemented in the NeMo script, by changing the value of [peft_scheme](./code/megatron_gpt_peft_tuning_config.yaml) in the client configuration shown below accordingly:
* p-tuning
* adapter + p-tuning
* adapter
* LoRa
* ia3

<div>
<img src="./figs/peft_config.png" alt="PEFT config" style="width: 400px;"/>
</div>

In this example, we will use p-tuning to run the following experiments.

#### 1. Local P-Tuning
First, we create the job files and modify them to include the data paths for each client and the pre-trained LLM using the `-f` option.
Note, the `app_config` options are specific to the app script (`megatron_gpt_peft_tuning.py`) and modify variables in the NeMo config file (`megatron_gpt_peft_tuning_config.yaml`) directly on execution.

At this point, we also modify the local number of clients, local epochs and FL rounds to simulate local training.

The PEFT method is p-tuning.

In [None]:
%env NVFLARE_HOME=/home/hroth/Code2/nvflare/nemo_peft_example
!python3 -m pip install -e /home/hroth/Code2/nvflare/nemo_peft_example

import os
peft_scheme="model.peft.peft_scheme\=ptuning" # can be either ptuning, adapter, lora, or ia3
app_script="megatron_gpt_peft_tuning.py"
restore_from_path=f"model.restore_from_path\={os. getcwd()}/megatron_gpt_345m.nemo"
trainer_config="trainer.max_steps\=2000 trainer.val_check_interval\=10"
val_files=f"model.data.validation_ds.file_names\=\[{os. getcwd()}/data/FinancialPhraseBank-v1.0/financial_phrase_bank_val.jsonl\]"
train_files_prefix=f"model.data.train_ds.file_names\=\[{os. getcwd()}/data/FinancialPhraseBank-v1.0_split/site"

!nvflare job create -force -j "./jobs/peft_p-tuning_local_345M" -w "sag_nemo" -sd "code" \
   -f app_1/config_fed_client.conf app_script={app_script} app_config="{peft_scheme} {restore_from_path} {trainer_config} {val_files} {train_files_prefix}-1.jsonl\]" \
   -f app_2/config_fed_client.conf app_script={app_script} app_config="{peft_scheme} {restore_from_path} {trainer_config} {val_files} {train_files_prefix}-2.jsonl\]" \
   -f app_3/config_fed_client.conf app_script={app_script} app_config="{peft_scheme} {restore_from_path} {trainer_config} {val_files} {train_files_prefix}-3.jsonl\]" \
   -f app_server/config_fed_server.conf num_rounds=1

Next, simulate each client p-tuning on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 50 p-tuning epochs on their local dataset.

In [None]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder="jobs/peft_p-tuning_local_345M",
    workspace="/tmp/nvflare/nemo/peft_p-tuning_local_345M",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

#### 2. Federated P-Tuning
Next, we use the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm to p-tune the model in a federated scenario. First, create and modify the configuration files again. 
This time, we increase the number of FL rounds and decrease the number of local epochs per round to match the federated scenario.

In [None]:
#!python3 create_configs.py --job_folder "jobs/peft_p-tuning_fedavg_345M" --num_clients 3 --max_steps 200 --num_rounds 50
import os
peft_scheme="model.peft.peft_scheme\=ptuning" # can be either ptuning, adapter, lora, or ia3
app_script="megatron_gpt_peft_tuning.py"
restore_from_path=f"model.restore_from_path\={os. getcwd()}/megatron_gpt_345m.nemo"
trainer_config="trainer.max_steps\=200 trainer.val_check_interval\=10"
val_files=f"model.data.validation_ds.file_names\=\[{os. getcwd()}/data/FinancialPhraseBank-v1.0/financial_phrase_bank_val.jsonl\]"
train_files_prefix=f"model.data.train_ds.file_names\=\[{os. getcwd()}/data/FinancialPhraseBank-v1.0_split/site"

!nvflare job create -force -j "./jobs/peft_p-tuning_fedavg_345M" -w "sag_nemo" -sd "code" \
   -f app_1/config_fed_client.conf app_script={app_script} app_config="{peft_scheme} {restore_from_path} {trainer_config} {val_files} {train_files_prefix}-1.jsonl\]" \
   -f app_2/config_fed_client.conf app_script={app_script} app_config="{peft_scheme} {restore_from_path} {trainer_config} {val_files} {train_files_prefix}-2.jsonl\]" \
   -f app_3/config_fed_client.conf app_script={app_script} app_config="{peft_scheme} {restore_from_path} {trainer_config} {val_files} {train_files_prefix}-3.jsonl\]" \
   -f app_server/config_fed_server.conf num_rounds=10

Next, simulate the federated p-tuning using FedAvg. Here, each client p-tunes for one local epoch before sending their local model updates to the server for aggregation. This is repeated for 50 FL rounds.

In [None]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder="jobs/peft_p-tuning_fedavg_345M",
    workspace="/tmp/nvflare/nemo/peft_p-tuning_fedavg_345M",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

You can visualize the training process using TensorBoard

In [None]:
!tensorboard --logdir /tmp/nvflare/nemo

## Results
In this scenario, all clients utilize the same validation set, allowing for a direct comparison between the locally p-tuned and federated global models. As anticipated, the FedAvg-trained models achieve a higher overall mean accuracy than those trained solely on their local datasets. This is because the global model has access to all client datasets and can, consequently, generalize better.

![validation loss](./figs/val_accuracy.svg)

## Inference

We can use `model.generate()` to run inference after p-tuning the model. 
Let's define some test examples to feed to the p-tuned model to see its predictions.

In [None]:
test_examples = [
    {"taskname": "sentiment", "sentence": "The products have a low salt and fat content ."},
    {"taskname": "sentiment", "sentence": "The agreement is valid for four years ."},
    {"taskname": "sentiment", "sentence": "Diluted EPS rose to EUR3 .68 from EUR0 .50 ."},
    {"taskname": "sentiment", "sentence": "The company is well positioned in Brazil and Uruguay ."},
    {"taskname": "sentiment", "sentence": "Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier ."},
]

Next, we will load the global model.

In [None]:
import torch
from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.collections.nlp.parts.peft_config import PEFT_CONFIG_MAP
from omegaconf import OmegaConf

# Load model configuration used by one of the clients
config = OmegaConf.load("jobs/gpt_p-tuning_fedavg_345M/server/config/megatron_gpt_prompt_learning_config.yaml")

# Set GPT model path
config.model.language_model_path = "megatron_gpt_345m.nemo"





# Load model configuration
cfg = OmegaConf.load(os.path.join(app_root, self.config_path))

# Build trainer
trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()

# Set restore from paths with pre-trained model(s)
cfg.model.restore_from_path = os.path.join(app_root, self.restore_from_path)
if self.peft_restore_from_path is not None:
    cfg.model.peft.restore_from_path = os.path.join(app_root, self.peft_restore_from_path)

# Set some dummy data file names (which will not be used and do not need to exist)
cfg.model.data.train_ds.file_names = ["dummy.jsonl"]
cfg.model.data.validation_ds.file_names = ["dummy.jsonl"]

model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg)
self.model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)
peft_cfg_cls = PEFT_CONFIG_MAP[cfg.model.peft.peft_scheme]

if cfg.model.peft.restore_from_path is not None:
    # initialize peft weights from a checkpoint instead of randomly
    # This is not the same as resume training because optimizer states are not restored.
    logging.info("PEFT Weights will be loaded from", cfg.model.peft.restore_from_path)
    self.model.load_adapters(cfg.model.peft.restore_from_path, peft_cfg_cls(model_cfg))
elif peft_cfg_cls is not None:
    logging.info("Adding adapter weights to the model for PEFT")
    self.model.add_adapter(peft_cfg_cls(model_cfg))
else:
    self.use_sft = True
    logging.info(f"Running full finetuning since no peft scheme is given.\n{self.model.summarize()}")

print("Model initialized", type(model))

Overwrite the prompt encoder with the best global model

In [None]:
#ckpt = torch.load("/tmp/nvflare/nemo/gpt_p-tuning_fedavg_345M/simulate_job/app_server/best_FL_global_model.pt")
#global_weights = ckpt["model"]

#n_loaded = load_weights(model, global_weights, device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
#print(f"Loaded {n_loaded} of {len(global_weights)} weights")

Run the model

In [None]:
response = model.generate(inputs=test_examples, length_params=None)

print('The prediction results of some sample queries with the trained model:')
for result in response['sentences']:
    print(result)
    print("-" * 30)

The expected output predictions look something like this

>      The products have a low salt and fat content . sentiment: neutral
>      ------------------------------
>      The agreement is valid for four years . sentiment: neutral
>      ------------------------------
>      Diluted EPS rose to EUR3 .68 from EUR0 .50 . sentiment: positive
>      ------------------------------
>      The company is well positioned in Brazil and Uruguay . sentiment: positive
>      ------------------------------
>      Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier . sentiment: negative
>      ------------------------------