#  Federated Tuning with FedKSeed methods in FATE-LLM

In this tutorial, we will demonstrate how to efficiently train federated large language models using the FATE-LLM framework. In FATE-LLM, we introduce the "FedKSeed" module, specifically designed for federated learning with large language models. The Idea of FedKSeed is to use Zeroth-Order-Optimizer to optimize model along given direction that generated with random seed. This method can be used to train large language models in a federated learning setting with extremely low communication cost.

The Algorithm is based on the paper: [Federated Full-Parameter Tuning of Billion-Sized Language Models
with Communication Cost under 18 Kilobytes](https://arxiv.org/pdf/2312.06353.pdf) and the code is modified from the https://github.com/alibaba/FederatedScope/tree/FedKSeed. We refactor the code to make it more compatible with (transformers/PyTorch) framework and integrate it into the FATE-LLM framework.

The main works include:
1. An KSeedZerothOrderOptimizer class that can be used to optimize model along given direction that generated with random seed.
2. An KSeedZOExtendedTrainer subclass of Trainer from transformers that can be used to train large language models with KSeedZerothOrderOptimizer.
3. Trainers for federated learning with large language models.

In this tutorial, we will demonstrate how to use the FedKSeed method to train a large language model in a federated learning setting. 

## Model: datajuicer/LLaMA-1B-dj-refine-150B

This is the introduction from the Huggingface model hub: [datajuicer/LLaMA-1B-dj-refine-150B](https://huggingface.co/datajuicer/LLaMA-1B-dj-refine-150B)

> The model architecture is LLaMA-1.3B and we adopt the OpenLLaMA implementation. The model is pre-trained on 150B tokens of Data-Juicer's refined RedPajama and Pile. It achieves an average score of 34.21 over 16 HELM tasks, beating Falcon-1.3B (trained on 350B tokens from RefinedWeb), Pythia-1.4B (trained on 300B tokens from original Pile) and Open-LLaMA-1.3B (trained on 150B tokens from original RedPajama and Pile).

> For more details, please refer to our [paper](https://arxiv.org/abs/2309.02033).


In [1]:
# model_name_or_path = "datajuicer/LLaMA-1B-dj-refine-150B"
model_name_or_path = "gpt2"

## Dataset: databricks/databricks-dolly-15k

This is the introduction from the Huggingface dataset hub: [databricks/databricks-dolly-15k](https://huggingface.co/dataset/databricks/databricks-dolly-15k)

> databricks-dolly-15k is a corpus of more than 15,000 records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended free-form category. The contributors were instructed to avoid using information from any source on the web with the exception of Wikipedia (for particular subsets of instruction categories), and explicitly instructed to avoid using generative AI in formulating instructions or responses. Examples of each behavior were provided to motivate the types of questions and instructions appropriate to each category

To use this dataset, you first need to download it from the Huggingface dataset hub:

```bash
mkdir -p ../../../examples/data/dolly && cd ../../../examples/data/dolly && wget  wget https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl\?download\=true -O databricks-dolly-15k.jsonl
```

### Check Dataset

In [2]:
from fate_llm.dataset.hf_dataset import Dolly15K
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path)
special_tokens = tokenizer.special_tokens_map
if "pad_token" not in tokenizer.special_tokens_map:
    special_tokens["pad_token"] = special_tokens["eos_token"]

tokenizer.pad_token = tokenizer.eos_token
ds = Dolly15K(split="train", tokenizer_params={"pretrained_model_name_or_path": model_name_or_path, **special_tokens},
              tokenizer_apply_params=dict(truncation=True, max_length=tokenizer.model_max_length, padding="max_length", return_tensors="pt"))
ds = ds.load('../../../examples/data/dolly')

In [3]:
ds

Dataset({
    features: ['instruction', 'context', 'response', 'category', 'text', 'input_ids', 'attention_mask'],
    num_rows: 15011
})

For more details of FATE-LLM dataset setting, we recommend that you read through these tutorials first: [NN Dataset Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-your-Dataset.ipynb), [Some Built-In Dataset](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Introduce-Built-In-Dataset.ipynb),

### Check local training

Before submitting a federated learning task, we will demonstrate how to perform local testing to ensure the proper functionality of your custom dataset, model. 

In [16]:
from transformers import AutoModelForCausalLM, TrainingArguments, DataCollatorForLanguageModeling
from fate_llm.algo.fedkseed.trainer import KSeedZOExtendedTrainer, KSeedTrainingArguments
from fate_llm.algo.fedkseed.zo_utils import build_seed_candidates, get_even_seed_probabilities

def test_training(zo_mode=True):
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path, **special_tokens)
    data_collector = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
    model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_name_or_path)

    training_args = TrainingArguments(output_dir='./',
                                      dataloader_num_workers=1,
                                      dataloader_prefetch_factor=1,
                                      remove_unused_columns=True,
                                      learning_rate=1e-5,
                                      per_device_train_batch_size=1,
                                      num_train_epochs=0.01,
                                      )
    kseed_args = KSeedTrainingArguments(zo_optim=zo_mode)
    trainer = KSeedZOExtendedTrainer(model=model, train_dataset=ds, training_args=training_args, kseed_args=kseed_args,
                                     tokenizer=tokenizer, data_collator=data_collector)
    if zo_mode:
        seed_candidates = build_seed_candidates(k=kseed_args.k)
        seed_probabilities = get_even_seed_probabilities(k=kseed_args.k)
        trainer.configure_seed_candidates(seed_candidates, seed_probabilities)
    return trainer.train()

In [17]:
test_training(zo_mode=True)

Step,Training Loss


TrainOutput(global_step=151, training_loss=1.2660519429390005, metrics={'train_runtime': 61.8249, 'train_samples_per_second': 2.428, 'train_steps_per_second': 2.442, 'total_flos': 78910193664000.0, 'train_loss': 1.2660519429390005, 'epoch': 0.01})

In [18]:
test_training(zo_mode=False)

Step,Training Loss


TrainOutput(global_step=151, training_loss=0.6093456950408733, metrics={'train_runtime': 92.6158, 'train_samples_per_second': 1.621, 'train_steps_per_second': 1.63, 'total_flos': 78910193664000.0, 'train_loss': 0.6093456950408733, 'epoch': 0.01})

You can see that Zeroth-Order-Optimizer has much worse performance than AdamW, that's the price we need to pay for the low communication cost. 

## Submit Federated Task
Once you have successfully completed local testing, We can submit a task to FATE. Please notice that this tutorial is ran on a standalone version. **Please notice that in this tutorial we are using a standalone version, if you are using a cluster version, you need to bind the data with the corresponding name&namespace on each machine.**

In this example we load pretrained weights for gpt2 model.

In [None]:
import time
from fate_client.pipeline.components.fate.reader import Reader
from fate_client.pipeline import FateFlowPipeline
from fate_client.pipeline.components.fate.homo_nn import HomoNN, get_config_of_seq2seq_runner
from fate_client.pipeline.components.fate.nn.algo_params import TrainingArguments, FedAVGArguments
from fate_client.pipeline.components.fate.nn.loader import LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader

guest = '10000'
host = '10000'
arbiter = '10000'

epochs = 0.01
batch_size = 1
lr = 1e-5

pipeline = FateFlowPipeline().set_parties(guest=guest, arbiter=arbiter)
pipeline.bind_local_path(path="/data/projects/fate/examples/data/dolly", namespace="experiment",
                         name="dolly")
time.sleep(5)

reader_0 = Reader("reader_0", runtime_parties=dict(guest=guest, host=host))
reader_0.guest.task_parameters(
    namespace="experiment",
    name="dolly"
)
reader_0.hosts[0].task_parameters(
    namespace="experiment",
    name="dolly"
)

tokenizer_params = dict(
    pretrained_model_name_or_path="gpt2",
    trust_remote_code=True,
)
conf = get_config_of_seq2seq_runner(
    algo='fedkseed',
    model=LLMModelLoader(
        "hf_model",
        "HFAutoModelForCausalLM",
        # pretrained_model_name_or_path="datajuicer/LLaMA-1B-dj-refine-150B",
        pretrained_model_name_or_path="gpt2",
        trust_remote_code=True
    ),
    dataset=LLMDatasetLoader(
        "hf_dataset",
        "Dolly15K",
        split="train",
        tokenizer_params=tokenizer_params,
        tokenizer_apply_params=dict(
            truncation=True,
            max_length=1024,
        )),
    data_collator=LLMDataFuncLoader(
        "cust_func.cust_data_collator",
        "get_seq2seq_tokenizer",
        tokenizer_params=tokenizer_params,
    ),
    training_args=TrainingArguments(
        num_train_epochs=0.01,
        per_device_train_batch_size=batch_size,
        remove_unused_columns=True,
        learning_rate=lr,
        fp16=False,
        use_cpu=False,
        disable_tqdm=False,
        use_mps_device=True,
    ),
    fed_args=FedAVGArguments(),
    task_type='causal_lm',
    save_trainable_weights_only=True,
)

conf["fed_args_conf"] = {}

homo_nn_0 = HomoNN(
    'nn_0',
    runner_conf=conf,
    train_data=reader_0.outputs["output_data"],
    runner_module="fedkseed_runner",
    runner_class="FedKSeedRunner",
)

pipeline.add_tasks([reader_0, homo_nn_0])
pipeline.conf.set("task", dict(engine_run={"cores": 1}))

pipeline.compile()
pipeline.fit()

You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here.