#  Federated GPT-2 Tuning with Parameter Efficient methods in FATE-LLM

In this tutorial, we will demonstrate how to efficiently train federated large language models using the FATE-LLM framework. In FATE-LLM, we introduce the "pellm"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on GPT-2, and we will also emphasize the use of the Adapter mechanism for fine-tuning GPT-2, which enables us to effectively reduce communication volume and improve overall efficiency.

By following this tutorial, you will learn how to leverage the FATE-LLM framework to rapidly fine-tune federated large language models, such as GPT-2, with ease and efficiency.

## GPT2

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a causal language modeling (CLM) objective, conditioning on a left-to-right context window of 1024 tokens. In this tutorial, we will use GPT2, you can download the pretrained model from [here](https://huggingface.co/gpt2) (We choose the smallest version for this tutorial), or let the program automatically download it when you use it later.

## Dataset: IMDB Sentimental

In this section, we will introduce the process of preparing the IMDB dataset for use in our federated learning task. We use our tokenizer dataset(based on HuggingFace tokenizer) to preprocess the text data.

About IMDB Sentimental Dataset:

This is an binary classification dataset, you can download our processed dataset from here: 
- https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/IMDB.csv
and place it in the examples/data folder. 

The orgin data is from: 
- https://ai.stanford.edu/~amaas/data/sentiment/

### Check Dataset

In [1]:
import pandas as pd
df = pd.read_csv('../../../examples/data/IMDB.csv')

In [2]:
df

Unnamed: 0,id,text,label
0,0,One of the other reviewers has mentioned that ...,1
1,1,A wonderful little production. <br /><br />The...,1
2,2,I thought this was a wonderful way to spend ti...,1
3,3,Basically there's a family where a little boy ...,0
4,4,"Petter Mattei's ""Love in the Time of Money"" is...",1
...,...,...,...
1996,1996,THE CELL (2000) Rating: 8/10<br /><br />The Ce...,1
1997,1997,"This movie, despite its list of B, C, and D li...",0
1998,1998,I loved this movie! It was all I could do not ...,1
1999,1999,This was the worst movie I have ever seen Bill...,0


In [3]:
from fate_llm.dataset.nlp_tokenizer import TokenizerDataset

ds = TokenizerDataset(tokenizer_name_or_path="your model path", text_max_length=128, 
                      padding_side="left", return_input_ids=False, 
                      pad_token='<|endoftext|>')  # load tokenizer config from local pretrained tokenizer

ds.load('../../../examples/data/IMDB.csv')

In [4]:
ds[0]

({'input_ids': tensor([ 3198,   286,   262,   584, 30702,   468,  4750,   326,   706,  4964,
            655,   352, 18024,  4471,   345,  1183,   307, 23373,    13,  1119,
            389,   826,    11,   355,   428,   318,  3446,   644,  3022,   351,
            502, 29847,  1671,  1220,  6927,  1671, 11037,   464,   717,  1517,
            326,  7425,   502,   546, 18024,   373,   663, 24557,   290, 42880,
           8589,   278,  8188,   286,  3685,    11,   543,   900,   287,   826,
            422,   262,  1573, 10351,    13,  9870,   502,    11,   428,   318,
            407,   257,   905,   329,   262, 18107,  2612,   276,   393, 44295,
             13,   770,   905, 16194,   645, 25495,   351, 13957,   284,  5010,
             11,  1714,   393,  3685,    13,  6363,   318, 22823,    11,   287,
            262,  6833,   779,   286,   262,  1573, 29847,  1671,  1220,  6927,
           1671, 11037,  1026,   318,  1444,   440,    57,   355,   326,   318,
            262, 21814,  18

For more details of FATE-LLM dataset setting, we recommend that you read through these tutorials first: [NN Dataset Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-your-Dataset.ipynb), [Some Built-In Dataset](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Introduce-Built-In-Dataset.ipynb),

## PELLM Model with Adapter

In this section, we will guide you through the process of building a parameter-efficient language model using the FATE-LLM framework. We will focus on the implementation of the PELLM model and the integration of the Adapter mechanism, which enables efficient fine-tuning and reduces communication overhead in federated learning settings. Take GPT-2 as example you will learn how to leverage the FATE-LLM framework to rapidly develop and deploy a parameter-efficient language model using FATE-LLM built-in classes. Before starting this section, we recommend that you read through this tutorial first: [Model Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-Model.ipynb).

### PELLM Models

In this section we introduce the PELLM model, which is a parameter-efficient language model that can be used in federated learning settings. They are designed to be compatible with the FATE-LLM framework to enable federated model tuning/training.

PELLM models are located at federatedml.nn.model_zoo.pellm(federatedml/nn/model_zoo/pellm):

In [5]:
! ls ../../../fate/python/fate_llm/model_zoo/pellm

albert.py  bert.py     deberta.py     gpt2.py			  __pycache__
bart.py    chatglm.py  distilbert.py  parameter_efficient_llm.py  roberta.py


You can initialize your GPT2 model by loading the pretrained model from the model folder, or downloading the pretrained model from the Huggingface,
here we initialize the GPT2 model with the Lora Adapter, we will introduce Adapters in the following sub

#### Adapters

We can directly use adapters from the peft. See details for adapters on this page [Adapter Methods](https://huggingface.co/docs/peft/index#supported-methods) for more details. By specifying the adapter name and the adapter
config dict we can insert adapters into our language models:

In [13]:
from peft import LoraConfig, TaskType

# define lora config
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['c_attn'],
)

#### Init PELLM Model

In [None]:
from fate_llm.model_zoo.pellm.gpt2 import GPT2

# case 1 load pretrained weights from local pretrained weights, same as using the huggingface pretrained model
path_to_pretrained_folder = 'your model path'
gpt2 = GPT2(pretrained_path=path_to_pretrained_folder, 
            peft_type="LoraConfig", peft_config=lora_config.to_dict(), 
            num_labels=1, pad_token_id=50256)

# case 2 directly download models from huggingface
# gpt2 = GPT2(pretrained_path="gpt2", 
#             peft_type="LoraConfig", peft_config=lora_config, 
#             num_labels=1, pad_token_id=50256)

In this version we currently support these language model for federated training:
- ChatGLM
- Bert
- ALBert
- RoBerta
- GPT-2
- Bart
- DeBerta
- DistillBert

**During the training process, all weights of the pretrained language model exclusive classifier head's weihgts will be frozen, and weights of adapters are traininable. Thus, FATE-LLM only train in the local training and aggregate adapters' weights and classifier head's weights(If has) in the fedederation process**

Now available adapters are [Adapters Overview](https://huggingface.co/docs/peft/index) for details.


### Use PELLM Model in FATE with CustModel

In this [Model Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-Model.ipynb) tutorial, we demonstrate how to employ the t.nn.CustomModel class in fate_torch to parse a model's structure and submit it to a federated learning task. The CustomModel automatically imports the model class from the model_zoo and initializes the models with the parameters provided. Since these language models are built-in, we can directly use them in the CustomModel and easily add a classifier head to address the classification task at hand：

In [None]:
import torch as t
from pipeline import fate_torch_hook
from pipeline.component.nn import save_to_fate_llm
fate_torch_hook(t)

In [9]:
%%save_to_fate_llm model sigmoid.py

import torch as t

class Sigmoid(t.nn.Module):
    
    def __init__(self):
        super().__init__()
        self.sigmoid = t.nn.Sigmoid()
        
    def forward(self, x):
        return self.sigmoid(x.logits)

In [10]:
# build CustModel with PELLM, and add a classifier head
from transformers import GPT2Config

checkpoint_path = "your model path"
model = t.nn.Sequential(
    t.nn.CustModel(module_name='pellm.gpt2', class_name='GPT2', 
                   pretrained_path=checkpoint_path, 
                   peft_config=lora_config.to_dict(), peft_type="LoraConfig", 
                   num_labels=1,  pad_token_id=50256),
    t.nn.CustModel(module_name='sigmoid', class_name='Sigmoid')
)


Please note that during the training process, only trainable parameters will participate in the federated learning process.

## Local Test

Before submitting a federated learning task, we will demonstrate how to perform local testing to ensure the proper functionality of your custom dataset, model. We use the local mode of our FedAVGTrainer to test if our setting can run correctly.

In [None]:
from fate_llm.model_zoo.pellm.gpt2 import GPT2
from fate_llm.model_zoo.sigmoid import Sigmoid
from federatedml.nn.homo.trainer.fedavg_trainer import FedAVGTrainer
from transformers import GPT2Config
from fate_llm.dataset.nlp_tokenizer import TokenizerDataset

# load dataset
ds = TokenizerDataset(tokenizer_name_or_path="your model path", text_max_length=128, 
                      padding_side="left", return_input_ids=False, pad_token='<|endoftext|>')  # you can load tokenizer config from local pretrained tokenizer

ds.load('../../../examples/data/IMDB.csv')

checkpoint_path = "your model path"
model = t.nn.Sequential(
    GPT2(pretrained_path=checkpoint_path, peft_config=lora_config.to_dict(), peft_type="LoraConfig", num_labels=1,  pad_token_id=50256),
    Sigmoid()
)

trainer = FedAVGTrainer(epochs=1, batch_size=8, shuffle=True, data_loader_worker=8)
trainer.local_mode()
trainer.set_model(model)

In [12]:
opt = t.optim.Adam(model.parameters(), lr=0.001)
loss = t.nn.BCELoss()
# local test, here we only use CPU for training
trainer.train(ds, None, opt, loss)

epoch is 0
100%|██████████| 251/251 [04:39<00:00,  1.11s/it]
epoch loss is 0.5148034488660345


## Submit Federated Task
Once you have successfully completed local testing, We can submit a task to FATE. Please notice that this tutorial is ran on a standalone version. **Please notice that in this tutorial we are using a standalone version, if you are using a cluster version, you need to bind the data with the corresponding name&namespace on each machine.**

In this example we load pretrained weights for gpt2 model.

In [None]:
import torch as t
import os
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.component.homo_nn import DatasetParam, TrainerParam
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader
from pipeline.interface import Data
from transformers import GPT2Config


fate_torch_hook(t)


fate_project_path = "your model path"
guest_0 = 9999
host_1 = 9999
pipeline = PipeLine().set_initiator(role='guest', party_id=guest_0).set_roles(guest=guest_0, host=host_1,
                                                                              arbiter=guest_0)
data_0 = {"name": "imdb", "namespace": "experiment"}
data_path = fate_project_path + '/examples/data/IMDB.csv'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path)
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path)
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest_0).component_param(table=data_0)
reader_0.get_party_instance(role='host', party_id=host_1).component_param(table=data_0)

reader_1 = Reader(name="reader_1")
reader_1.get_party_instance(role='guest', party_id=guest_0).component_param(table=data_0)
reader_1.get_party_instance(role='host', party_id=host_1).component_param(table=data_0)


## Add your pretriained model path here, will load model&tokenizer from this path


## LoraConfig
from peft import LoraConfig, TaskType
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['c_attn']
)


model_path = 'your model path'
model = t.nn.Sequential(
    t.nn.CustModel(module_name='pellm.gpt2', class_name='GPT2', pretrained_path=model_path,
                   peft_config=lora_config.to_dict(), peft_type="LoraConfig", num_labels=1,  pad_token_id=50256),
    t.nn.CustModel(module_name='sigmoid', class_name='Sigmoid')
)

# DatasetParam
dataset_param = DatasetParam(dataset_name='nlp_tokenizer',text_max_length=128, tokenizer_name_or_path=model_path, 
                             padding_side="left", return_input_ids=False, pad_token='<|endoftext|>')
# TrainerParam
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
                             data_loader_worker=8)


nn_component = HomoNN(name='nn_0', model=model)

# set parameter for client 1
nn_component.get_party_instance(role='guest', party_id=guest_0).component_param(
    loss=t.nn.BCELoss(),
    optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
    dataset=dataset_param,       
    trainer=trainer_param,
    torch_seed=100 
)

# set parameter for client 2
nn_component.get_party_instance(role='host', party_id=host_1).component_param(
    loss=t.nn.BCELoss(),
    optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
    dataset=dataset_param,       
    trainer=trainer_param,
    torch_seed=100 
)

# set parameter for server
nn_component.get_party_instance(role='arbiter', party_id=guest_0).component_param(    
    trainer=trainer_param
)

pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.compile()

pipeline.fit()

You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here.

## Training with CUDA

You can use GPU by setting the cuda parameter of the FedAVGTrainer:

In [None]:
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, 
                             data_loader_worker=8, cuda=0)

The cuda parameter here accepts an integer value that corresponds to the index of the GPU you want to use for training. 
In the example above, the value is set to 0, which means that on every client the first available GPU in the system will be used. 
If you have multiple GPUs and would like to use a specific one, simply change the value of the cuda parameter to the appropriate index.