# Offsite-tuning Tutorial

In this tutorial, we'll focus on how to leverage Offsite-Tuning framework in FATE to fine-tune your LLM. You'll learn how to:

1. Define models, including main models(which are at server side and will offer adapters and emulators) and submodel(which are at client side and will load adapters and emulators for local fine-tuning) compatible with Offsite-Tuning framework.
2. Get hands-on experience with the Offsite-Tuning trainer.
3. Define configurations for advanced setup(Using Deepspeed, offsite-tuning + federation) through FATE-pipeline.

## Introduction of Offsite-tuning

Offsite-Tuning is a novel approach designed for the efficient and privacy-preserving adaptation of large foundational models for specific downstream tasks. The framework allows data owners to fine-tune models locally without uploading sensitive data to the LLM owner's servers. Specifically, the LLM owner sends a lightweight "Adapter" and a lossy compressed "Emulator" to the data owner. Using these smaller components, the data owner can then fine-tune the model solely on their private data. The Adapter, once fine-tuned, is returned to the model owner and integrated back into the large model to enhance its performance on the specific dataset.

Offsite-Tuning addresses the challenge of unequal distribution of computational power and data. It allows thLLMel owner to enhance the model's capabilities without direct access to private data, while also enabling data owners who may not have the resources to train a full-scale model to fine-tune a portion of it using less computational power. This mutually beneficial arrangement accommodates both parties involve.

Beyond the standard two-party setup involving the model owner and the data ownin FATE-LLM, er, Offsite-Tunframework ing is also extendable to scenarios with multiple data owners. FATE supports multi-party Offsite-Tuning, allowing multiple data owners to fine-tune and aggregate their Adapters locally, further enhancing the flexibility and applicability of this framewrFor more details of Offsite-tuning, please refer to the [original paper](https://arxiv.org/pdf/2302.04870.pdf).







## Preliminary

We strongly recommend you finish reading our NN tutorial to get familiar with Model and Dataset customizations: [NN Tutorials](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/README.md)
You can add python path so that you can run codes in the notebook.

In [1]:
import sys
your_path_to_fate_python = '/data/projects/fate/fate/python'
sys.path.append(your_path_to_fate_python)

## Define Main Model and Sub Model

Main models are at server side and will provides weights of adapters and emulators to client sides, while Sub Models are at client side and will load adapters and emulators for local fine-tuning. In this chapter we will take a standard GPT2 as the example and show you how to quickly develop main model class and sub model class for offsite-tuning.

### Base Classes and Interfaces

The base classes for the Main and Sub Models are OffsiteTuningMainModel and OffsiteTuningSubModel, respectively. To build your own models upon these base classes, you need to:

1. Implement three key interfaces: get_base_model, get_model_transformer_blocks, and forward. The get_base_model interface should return the full Main or Sub Model. Meanwhile, the get_model_transformer_blocks function should return a ModuleList of all transformer blocks present in your language model, enabling the extraction of emulators and adapters from these blocks. Finally, you're required to implement the forward process for model inference.

2. Supply the parameters emulator_layer_num, adapter_top_layer_num, and adapter_bottom_layer_num to the parent class. This allows the framework to automatically generate the top and bottom adapters as well as the dropout emulator for you. Specifically, the top adapters are taken from the top of the transformer blocks, while the bottom adapters are taken from the bottom. The emulator uses a dropout emulator consistent with the paper's specifications. Once the adapter layers are removed, the emulator is formed by selecting transformer blocks at fixed intervals and finally stack them to make a dropout emulator.

Our framework will automatically detect the emulator and adapters of a main model, and send them to clients. Clients' models them load the weights of emulators and adapters to get trainable models.

### Example

Let us take a look of our built-in GPT-2 model. It will be easy for you to build main models and sub models based on the framework. Please notice that the GPT2LMHeadSubModel's base model is intialized from a GPTConfig, that is to say, it's weights are random and need to load pretrained weights from server.

In [None]:
from fate_llm.model_zoo.offsite_tuning.offsite_tuning_model import OffsiteTuningSubModel, OffsiteTuningMainModel
from transformers import GPT2LMHeadModel, GPT2Config
from torch import nn
import torch as t


class GPT2LMHeadMainModel(OffsiteTuningMainModel):

    def __init__(
            self,
            model_name_or_path,
            emulator_layer_num: int,
            adapter_top_layer_num: int = 2,
            adapter_bottom_layer_num: int = 2):

        self.model_name_or_path = model_name_or_path
        super().__init__(
            emulator_layer_num,
            adapter_top_layer_num,
            adapter_bottom_layer_num)

    def get_base_model(self):
        return GPT2LMHeadModel.from_pretrained(self.model_name_or_path)

    def get_model_transformer_blocks(self, model: GPT2LMHeadModel):
        return model.transformer.h

    def forward(self, x):
        return self.model(**x)

class GPT2LMHeadSubModel(OffsiteTuningSubModel):

    def __init__(
            self,
            model_name_or_path,
            emulator_layer_num: int,
            adapter_top_layer_num: int = 2,
            adapter_bottom_layer_num: int = 2,
            fp16_mix_precision=False,
            partial_weight_decay=None):

        self.model_name_or_path = model_name_or_path
        self.emulator_layer_num = emulator_layer_num
        self.adapter_top_layer_num = adapter_top_layer_num
        self.adapter_bottom_layer_num = adapter_bottom_layer_num
        super().__init__(
            emulator_layer_num,
            adapter_top_layer_num,
            adapter_bottom_layer_num,
            fp16_mix_precision)
        self.partial_weight_decay = partial_weight_decay

    def get_base_model(self):
        total_layer_num = self.emulator_layer_num + \
            self.adapter_top_layer_num + self.adapter_bottom_layer_num
        config = GPT2Config.from_pretrained(self.model_name_or_path)
        config.num_hidden_layers = total_layer_num
        # initialize a model without pretrained weights
        return GPT2LMHeadModel(config)

    def get_model_transformer_blocks(self, model: GPT2LMHeadModel):
        return model.transformer.h
        
    def forward(self, x):
        return self.model(**x)


We can define a server side model and a client side model that can work together in the offsite-tuning:

In [None]:
model_main = GPT2LMHeadMainModel('gpt2', 4, 2, 2)
model_sub = GPT2LMHeadSubModel('gpt2', 4, 2, 2)

### Share additional parameters with clients

Additionally, beyond the weights of emulators and adapters, you may also want to share other model parameters, such as embedding weights, with your client partners. To achieve this, you'll need to implement two more interfaces: get_additional_param_state_dict and load_additional_param_state_dict for both the Main and Sub Models.

### Special Attention for Large Objects

Please note that special attention is required when you need to share large objects, any object potentially exceeding 2GB, such as embedding weights. You should slice these large objects to manage them more efficiently. Below is a code snippet demonstrating this practice, taken directly from FATE's native GPT-2 implementation:

In [None]:
def get_additional_param_state_dict(self):
    # get parameter of additional parameter
    model = self.model
    param_dict = {
        'wte': model.transformer.wte,
        'wpe': model.transformer.wpe,
        'last_ln_f': model.transformer.ln_f
    }

    addition_weights = self.get_numpy_state_dict(param_dict)

    wte = addition_weights.pop('wte')
    wte_dict = split_numpy_array(wte, 10, 'wte')
    wpe = addition_weights.pop('wpe')
    wpe_dict = split_numpy_array(wpe, 10, 'wpe')
    addition_weights.update(wte_dict)
    addition_weights.update(wpe_dict)
    return addition_weights

def load_additional_param_state_dict(self, submodel_weights: dict):
    # load additional weights:
    model = self.model
    param_dict = {
        'wte': model.transformer.wte,
        'wpe': model.transformer.wpe,
        'last_ln_f': model.transformer.ln_f
    }

    new_submodel_weight = {}
    new_submodel_weight['last_ln_f'] = submodel_weights['last_ln_f']
    wte_dict, wpe_dict = {}, {}
    for k, v in submodel_weights.items():
        if 'wte' in k:
            wte_dict[k] = v
        if 'wpe' in k:
            wpe_dict[k] = v
    wte = recover_numpy_array(wte_dict, 'wte')
    wpe = recover_numpy_array(wpe_dict, 'wpe')
    new_submodel_weight['wte'] = wte
    new_submodel_weight['wpe'] = wpe

    self.load_numpy_state_dict(param_dict, new_submodel_weight)

From these codes we can see that we use 'split_numpy_array, recover_numpy_array' to cut embedding weights into pieces and recover them.

## Submit a Offsite-tuning Task - A QA Task Sample with GPT2

Now we are going to show you how to run a 2 party(server & client) offsite-tuning task using the GPT-2 model defined above. Before we submit the task we need to prepare the QA dataset.

### Prepare QA Dataset - Sciq

In this example, we use sciq dataset. You can use tools provided in our qa_dataset.py to tokenize the sciq dataset and save the tokenized result. 

In [10]:
from fate_llm.dataset.qa_dataset import tokenize_qa_dataset
from transformers import AutoTokenizer
tokenizer_name_or_path = '/data/projects/fate/cwj/gpt2'
tokenizer = AutoTokenizer.from_pretrained(gpt2_path)

if 'llama' in tokenizer_name_or_path:
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path, unk_token="<unk>",  bos_token="<s>", eos_token="</s>", add_eos_token=True)   
    tokenizer.pad_token = tokenizer.eos_token
else:
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path)
if 'gpt2' in tokenizer_name_or_path:
    tokenizer.pad_token = tokenizer.eos_token

import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('../../../')
rs = tokenize_qa_dataset('sciq', tokenizer, fate_project_path + '/sciq/', seq_max_len=600)  # we save the cache dataset to the fate root folder

ModuleNotFoundError: No module named 'fate_llm'

We can use our built-in QA dataset to load tokenized dataset, to see if everything is working correctly.

In [12]:
from fate_llm.dataset.qa_dataset import QaDataset

ds = QaDataset(tokenizer_name_or_path=tokenizer_name_or_path)
ds.load(fate_project_path + '/sciq/')

In [13]:
print(len(ds))  # train set length
print(ds[0]['input_ids'].__len__()) # first sample length

11679
600


## Submit a Task

Now the model and the dataset is prepared! We can submit a training task. 
After we submit the task below, the following process will occur: The server and client each initialize their respective models. The server extracts shared parameters and sends them to the client. The client then loads these parameters and conducts training on a miniaturized GPT-2 model composed of an emulator and adaptesr onSciqP We speicify the OffsiteTuningTrainer via TrainerParam. If you are not familiar with trainer configuration, please refer to [FATE-NN Tutorial](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/README.md).
 Upon completion of the training, the client sends the adapter parameters back to the server. Since we are directly using Hugging Face's LMHeadGPT2, there's no need to supply a loss function. Simply inputting the preprocessed data and labels into the model will calculate the correct loss and proceed with gradient descent

One thing to pay special attention to is that Offsite-Tuning differs from FedAvg within FATE. In Offsite-Tuning, the server (the arbiter role) needs to initialize the model. Therefore, please refer to the example below and set the 'nn_component' parameters separately for the client and the server. Also, don't forget to add the 'server_init=True' parameter to the server; otherwise, the arbiter side will not initialize the model.

To make this a quick demo, we only select 100 samples from the origin qa datset, see 'select_num=100' in the DatasetParam.

In [1]:
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Model

t = fate_torch_hook(t)

import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('../../../')
guest = 9997
arbiter = 9997
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, arbiter=arbiter)

# bind data path with name & namespace
data_0 = {"name": "sciq", "namespace": "experiment"}
data_path_0 = fate_project_path + '/sciq/'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)

reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)

gpt2_type = '/data/projects/fate/cwj/gpt2/'

from pipeline.component.nn import DatasetParam
dataset_param = DatasetParam(dataset_name='qa_dataset', tokenizer_name_or_path=gpt2_type, select_num=100)

from pipeline.component.homo_nn import TrainerParam  # Interface
sub_model_client = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadSubModel', model_name_or_path=gpt2_type \
                                  ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)
main_model_server = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadMainModel', model_name_or_path=gpt2_type \
                                  ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)

nn_component = HomoNN(name='nn_0')

nn_component.get_party_instance(role='guest', party_id=guest).component_param(model=sub_model_client, dataset=dataset_param,  # dataset
                                                                              trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=3, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \
                                                                                                   save_to_local_dir=True, cuda=0),
                                                                             optimizer=t.optim.Adam(lr=5e-5)
                                                                             )
nn_component.get_party_instance(role='arbiter', party_id=arbiter).component_param(model=main_model_server, 
                                                                                  trainer=TrainerParam(trainer_name='offsite_tuning_trainer', collate_fn='DataCollatorForTokenClassification', save_to_local_dir=True),
                                                                                  # Attention here
                                                                                  server_init=True # This parameter must be set True !!!!!!!!!!!
                                                                                )
pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.compile()

<pipeline.backend.pipeline.PipeLine at 0x7f81000ec850>

In [2]:
pipeline.fit()

## Add Deepspeed Setting

By simply adding a ds_config, we can run our task with a deepspeed backend:

In [5]:
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Model

t = fate_torch_hook(t)

import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('../../../')
guest = 9997
arbiter = 9997
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, arbiter=arbiter)

# bind data path with name & namespace
data_0 = {"name": "sciq", "namespace": "experiment"}
data_path_0 = fate_project_path + '/sciq/'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)

reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)

# deepspeed config
ds_config = {
    "train_micro_batch_size_per_gpu": 2,
    "gradient_accumulation_steps": 2,
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 5e-5
        }
    }
    ,
    "fp16": {
        "enabled": False
    }
    ,
    "zero_optimization": {
        "stage": 1,
        "offload_optimizer": {
            "device": "cpu"
        },
        "contiguous_gradients": True,
        "overlap_comm": True
    }
}

gpt2_type = '/data/projects/fate/cwj/gpt2/'

from pipeline.component.nn import DatasetParam
dataset_param = DatasetParam(dataset_name='qa_dataset', tokenizer_name_or_path=gpt2_type, select_num=100)

from pipeline.component.homo_nn import TrainerParam  # Interface
sub_model_client = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadSubModel', model_name_or_path=gpt2_type \
                                  ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)
main_model_server = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadMainModel', model_name_or_path=gpt2_type \
                                  ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)

nn_component = HomoNN(name='nn_0')

nn_component.get_party_instance(role='guest', party_id=guest).component_param(model=sub_model_client, dataset=dataset_param,  # dataset
                                                                              trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=3, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \
                                                                                                   save_to_local_dir=True),
                                                                             optimizer=t.optim.Adam(lr=5e-5)
                                                                             )
nn_component.get_party_instance(role='arbiter', party_id=arbiter).component_param(model=main_model_server, 
                                                                                  trainer=TrainerParam(trainer_name='offsite_tuning_trainer', collate_fn='DataCollatorForTokenClassification', save_to_local_dir=True),
                                                                                  # Attention here
                                                                                  server_init=True # This parameter must be set True !!!!!!!!!!!
                                                                                )
pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.compile()

<pipeline.backend.pipeline.PipeLine at 0x7f8002385e50>

In [8]:
from pipeline.runtime.entity import JobParameters
pipeline.fit(JobParameters(task_conf={
    "nn_0": {
        "launcher": "deepspeed",
        "world_size": 4
    }
}))

## Offsite-tuning + Multi Client Federation


The Offsite-Tuning + FedAVG federation is configured based on the standard Offsite-Tuning. The setup is a bit more complex, but we will walk you through it step by step. The pipeline code below contains detailed comments. When reading, please pay attention to the following points:

1. In a multi-party scenario, please fill in different party_ids based on your deployment.
2. The operation to bind the data path with the name & namespace needs to be run on the machines of all parties. For convenience, we've placed the code in one location.
3. When configuring Trainer parameters, make sure to add the 'need_aggregate=True' parameter to the OffsiteTuningTrainer for each client and server. So adapters will be aggregated during training.

In [None]:
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Model

t = fate_torch_hook(t)

import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('../../../')
guest = 9997
hosts = [9999, 10000]
arbiter = 9997
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, arbiter=arbiter, host=hosts)

data_9997 = {"name": "sciq-9997-gpt2", "namespace": "experiment"}
data_9999 = {"name": "sciq-9999-gpt2", "namespace": "experiment"}
data_10000 = {"name": "sciq-10000-gpt2", "namespace": "experiment"}

# run the binding codes on 9997
data_path_9997 = fate_project_path + '/sciq/'
pipeline.bind_table(name=data_9997['name'], namespace=data_9997['namespace'], path=data_path_9997)

# run the binding codes on 9998
data_path_9999 = fate_project_path + '/sciq/'
pipeline.bind_table(name=data_9999['name'], namespace=data_9999['namespace'], path=data_path_9999)

# run the binding codes on 10000
data_path_10000 = fate_project_path + '/sciq/'
pipeline.bind_table(name=data_10000['name'], namespace=data_10000['namespace'], path=data_path_10000)

In [None]:
# deepspeed config
ds_config = {
    "train_micro_batch_size_per_gpu": 2,
    "gradient_accumulation_steps": 2,
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 5e-5
        }
    }
    ,
    "fp16": {
        "enabled": False
    }
    ,
    "zero_optimization": {
        "stage": 1,
        "offload_optimizer": {
            "device": "cpu"
        },
        "contiguous_gradients": True,
        "overlap_comm": True
    }
}

In [None]:
model_path = 'gpt2'

In [None]:
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_9997)
reader_0.get_party_instance(role='host', party_id=hosts[0]).component_param(table=data_9999)
reader_0.get_party_instance(role='host', party_id=hosts[1]).component_param(table=data_10000)

In [None]:
from pipeline.component.nn import DatasetParam

# This demo utilizes the same dataset but selects distinct segments to mimic an equal data distribution across different parties. 
# We adopt this strategy for the sake of convenience.
dataset_param_0 = DatasetParam(dataset_name='qa_ds', tokenizer_name_or_path=model_path, start_idx=0, select_num=3893)
dataset_param_1 = DatasetParam(dataset_name='qa_ds', tokenizer_name_or_path=model_path, start_idx=3893, select_num=3893)
dataset_param_2 = DatasetParam(dataset_name='qa_ds', tokenizer_name_or_path=model_path, start_idx=7786, select_num=3893)


In [None]:
from pipeline.component.homo_nn import TrainerParam  # Interface

# define model structure
sub_model_client = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadSubModel', model_name_or_path=model_path \
                                  ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)
main_model_server = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadMainModel', model_name_or_path=model_path \
                                  ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)

In [None]:
nn_component = HomoNN(name='nn_0')

In [None]:
epochs = 8
# We have 4 party to set
# Please make sure that need_aggregate is True, and epochs parameter of all parties are the same
nn_component.get_party_instance(role='guest', party_id=guest).component_param(model=sub_model_client, dataset=dataset_param_0,  # dataset
                                                                              trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \
                                                                                                   save_to_local_dir=True, need_aggregate=True), ds_config=ds_config)

nn_component.get_party_instance(role='host', party_id=hosts[0]).component_param(model=sub_model_client, dataset=dataset_param_1,  # dataset
                                                                              trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \
                                                                                                   save_to_local_dir=True, need_aggregate=True), ds_config=ds_config)

nn_component.get_party_instance(role='host', party_id=hosts[1]).component_param(model=sub_model_client, dataset=dataset_param_2,  # dataset
                                                                              trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \
                                                                                                   save_to_local_dir=True, need_aggregate=True), ds_config=ds_config)


nn_component.get_party_instance(role='arbiter', party_id=arbiter).component_param(model=main_model_server,
                                                                                  trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, save_to_local_dir=True,
                                                                                                       need_aggregate=True),
                                                                                  server_init=True
                                                                                )

In [None]:
pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.compile()

In [None]:
from pipeline.runtime.entity import JobParameters
pipeline.fit(JobParameters(task_conf={
    "nn_0": {
        "launcher": "deepspeed",
        "world_size": 4
    }
}))