#  Federated GPT-2 Tuning with Parameter Efficient methods in FATE-LLM

In this tutorial, we will demonstrate how to efficiently train federated large language models using the FATE-LLM framework. In FATE-LLM, we introduce the "pellm"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on GPT-2, and we will also emphasize the use of the Adapter mechanism for fine-tuning GPT-2, which enables us to effectively reduce communication volume and improve overall efficiency.

By following this tutorial, you will learn how to leverage the FATE-LLM framework to rapidly fine-tune federated large language models, such as GPT-2, with ease and efficiency.

## GPT2

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a causal language modeling (CLM) objective, conditioning on a left-to-right context window of 1024 tokens. In this tutorial, we will use GPT2, you can download the pretrained model from [here](https://huggingface.co/gpt2) (We choose the smallest version for this tutorial), or let the program automatically download it when you use it later.

## Dataset: IMDB Sentimental

In this section, we will introduce the process of preparing the IMDB dataset for use in our federated learning task. We use our tokenizer dataset(based on HuggingFace tokenizer) to preprocess the text data.

About IMDB Sentimental Dataset:

This is an binary classification dataset, you can download our processed dataset from here: 
- https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/IMDB.csv
and place it in the examples/data folder. 

The orgin data is from: 
- https://ai.stanford.edu/~amaas/data/sentiment/

### Check Dataset

For more details of FATE-LLM dataset setting, we recommend that you read through these tutorials first: [NN Dataset Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-your-Dataset.ipynb), [Some Built-In Dataset](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Introduce-Built-In-Dataset.ipynb),

## PELLM Model with Adapter

In this section, we will guide you through the process of building a parameter-efficient language model using the FATE-LLM framework. We will focus on the implementation of the PELLM model and the integration of the Adapter mechanism, which enables efficient fine-tuning and reduces communication overhead in federated learning settings. Take GPT-2 as example you will learn how to leverage the FATE-LLM framework to rapidly develop and deploy a parameter-efficient language model using FATE-LLM built-in classes. Before starting this section, we recommend that you read through this tutorial first: [Model Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-Model.ipynb).

### PELLM Models

In this section we introduce the PELLM model, which is a parameter-efficient language model that can be used in federated learning settings. They are designed to be compatible with the FATE-LLM framework to enable federated model tuning/training.

PELLM models are located at federatedml.nn.model_zoo.pellm(federatedml/nn/model_zoo/pellm):

In [6]:
! ls ../../../fate/python/fate_llm/model_zoo/pellm

albert.py  bloom.py    distilbert.py  parameter_efficient_llm.py
bart.py    chatglm.py  gpt2.py	      roberta.py
bert.py    deberta.py  llama.py


You can initialize your GPT2 model by loading the pretrained model from the model folder, or downloading the pretrained model from the Huggingface,
here we initialize the GPT2 model with the Lora Adapter, we will introduce Adapters in the following sub

#### Adapters

We can directly use adapters from the peft. See details for adapters on this page [Adapter Methods](https://huggingface.co/docs/peft/index#supported-methods) for more details. By specifying the adapter name and the adapter
config dict we can insert adapters into our language models:

**During the training process, all weights of the pretrained language model exclusive classifier head's weihgts will be frozen, and weights of adapters are traininable. Thus, FATE-LLM only train in the local training and aggregate adapters' weights and classifier head's weights(If has) in the fedederation process**

Now available adapters are [Adapters Overview](https://huggingface.co/docs/peft/index) for details.


### Use PELLM Model in FATE with CustModel

In this [Model Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-Model.ipynb) tutorial, we demonstrate how to employ the t.nn.CustomModel class in fate_torch to parse a model's structure and submit it to a federated learning task. The CustomModel automatically imports the model class from the model_zoo and initializes the models with the parameters provided. Since these language models are built-in, we can directly use them in the CustomModel and easily add a classifier head to address the classification task at hand：

In [6]:
import torch as t
from pipeline import fate_torch_hook
from pipeline.component.nn import save_to_fate_llm
fate_torch_hook(t)

<module 'torch' from '/data/projects/fate/env/python/venv/lib/python3.8/site-packages/torch/__init__.py'>

In [7]:
%%save_to_fate_llm model sigmoid.py

import torch as t

class Sigmoid(t.nn.Module):
    
    def __init__(self):
        super().__init__()
        self.sigmoid = t.nn.Sigmoid()
        
    def forward(self, x):
        return self.sigmoid(x.logits)

## Submit Federated Task
Once you have successfully completed local testing, We can submit a task to FATE. Please notice that this tutorial is ran on a standalone version. **Please notice that in this tutorial we are using a standalone version, if you are using a cluster version, you need to bind the data with the corresponding name&namespace on each machine.**

In this example we load pretrained weights for gpt2 model.

In [11]:
from transformers import ViTForImageClassification

model = ViTForImageClassification.from_pretrained(
    'google/vit-base-patch16-224',
    num_labels=1000)

In [12]:
print(model)

ViTForImageClassification(
  (vit): ViTModel(
    (embeddings): ViTEmbeddings(
      (patch_embeddings): ViTPatchEmbeddings(
        (projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
      )
      (dropout): Dropout(p=0.0, inplace=False)
    )
    (encoder): ViTEncoder(
      (layer): ModuleList(
        (0): ViTLayer(
          (attention): ViTAttention(
            (attention): ViTSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
            (output): ViTSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
          )
          (intermediate): ViTIntermediate(
            (dense): Linear(in_features=768, out_

In [8]:
import torch as t
import os
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.component.homo_nn import DatasetParam, TrainerParam
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader
from pipeline.interface import Data
from transformers import ViTConfig


fate_torch_hook(t)


import os
fate_project_path = '/data/projects/fate'
guest = 9999
host = 9999

pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host,
                                                                            arbiter=host)
data_0 = {"name": "cifar10", "namespace": "experiment"}
data_path = fate_project_path + '/examples/data/cifar10/train'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path)
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path)

reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=data_0)

#reader_1 = Reader(name="reader_1")
#reader_1.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)
#reader_1.get_party_instance(role='host', party_id=host).component_param(table=data_0)
## Add your pretriained model path here, will load model&tokenizer from this path


## LoraConfig
from peft import LoraConfig, TaskType
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],
)
"""
    LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    #target_modules=['c_attn']
)
"""


model_path = 'google/vit-base-patch16-224'
model = t.nn.Sequential(
    t.nn.CustModel(module_name='pellm.vit', class_name='vit', pretrained_path=model_path,
                   peft_config=lora_config.to_dict(), peft_type="LoraConfig", num_labels=1000,  pad_token_id=50256),
    t.nn.CustModel(module_name='sigmoid', class_name='Sigmoid')
)

# DatasetParam
dataset_param = DatasetParam(dataset_name='image') #
#DatasetParam(dataset_name='nlp_tokenizer',text_max_length=128, tokenizer_name_or_path=model_path, 
#                             padding_side="left", return_input_ids=False, pad_token='<|endoftext|>')
# TrainerParam
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
                             data_loader_worker=8)


nn_component = HomoNN(name='nn_0', model=model)

# set parameter for client 1
nn_component.get_party_instance(role='guest', party_id=guest).component_param(
    loss=t.nn.BCELoss(),
    optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
    dataset=dataset_param,       
    trainer=trainer_param,
    torch_seed=100 
)

# set parameter for client 2
nn_component.get_party_instance(role='host', party_id=host).component_param(
    loss=t.nn.BCELoss(),
    optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
    dataset=dataset_param,       
    trainer=trainer_param,
    torch_seed=100 
)

# set parameter for server
nn_component.get_party_instance(role='arbiter', party_id=guest).component_param(    
    trainer=trainer_param
)

pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.compile()

pipeline.fit()

[32m2023-12-01 15:02:01.897[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202312011502015721950
[0m
[32m2023-12-01 15:02:01.905[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[0mm2023-12-01 15:02:02.920[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2023-12-01 15:02:02.921[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component reader_0, time elapse: 0:00:01[0m
[32m2023-12-01 15:02:03.943[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component reader_0, time elapse: 0:00:02[0m
[32m2023-12-01 15:02:04.979[0m | [1mI

ValueError: Job is failed, please check out job 202312011502015721950 by fate board or fate_flow cli

You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here.

## Training with CUDA

You can use GPU by setting the cuda parameter of the FedAVGTrainer:

In [None]:
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, 
                             data_loader_worker=8, cuda=0)

The cuda parameter here accepts an integer value that corresponds to the index of the GPU you want to use for training. 
In the example above, the value is set to 0, which means that on every client the first available GPU in the system will be used. 
If you have multiple GPUs and would like to use a specific one, simply change the value of the cuda parameter to the appropriate index.