# Offsite-tuning Tutorial

## Introduction of Inferdpt

Inferdpt is an advanced algorithm framework designed for efficient and privacy-preserving text generation using large language models (LLMs). The framework addresses privacy concerns related to data leakage and unauthorized information collection in LLMs. Inferdpt implements Differential Privacy mechanisms to protect sensitive information during the inference process with black-box LLMs.

Inferdpt comprises two key modules: the "perturbation module" and the "extraction module". The perturbation module utilizes a differentially private(DP) mechanism to generate a perturbed prompt from the raw document, facilitating privacy-preserving inference with black-box LLMs. The extraction module, inspired by knowledge distillation and retrieval-augmented generation, processes the perturbed text to produce coherent and consistent output. This ensures that the text generation quality of InferDPT is comparable to that of non-private LLMs, maintaining high utility while providing strong privacy guarantees.

To further enhance privacy protection, Inferdpt integrates a novel mechanism called RANTEXT. RANTEXT introduces the concept of random adjacency list for token-level perturbation, addressing the vulnerability of existing differentially private mechanisms to embedding inversion attacks.

For more details of Inferdpt, please refer to the [original paper](https://arxiv.org/pdf/2310.12214.pdf).

## Use InferDPT

In this section, we will guide you through the process of:
- Setting up the inferdpt toolkit with an existing language model.
- Creating a model inference tool using the built-in class.
- Executing a step-by-step walkthrough of an inference instance: Employing inferdpt to generate rationale responses for question-answering tasks.

### Create Inferdpt Kit

In alignment with the original paper, the implementation of differential privacy in inferdpt involves the random substitution of tokens in the original text with semantically similar words. To facilitate this process, it is necessary to precalculate the similarities between a subset of tokens from the vocabulary of the remote large language model. In this tutorial, we will utilize the Mistral-7B model as our remote large language model and the Qwen1.5-0.5B model as the local decoding model. For the sake of computational efficiency, we will select a subset of 11,400 tokens from the Mistral-7B vocabulary to perform the similarity calculations and use the built-in function to finally get the inferdpt-kit.

Firstly we load the mistral model to get the embedding set:

In [1]:
# load embeddings from mistral model
import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = '/data/cephfs/llm/models/Mistral-7B-Instruct-v0.2/'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
embeddings = tokenizer.get_vocab() # get embeddings matrix

In [13]:
# Get the embedding layer weights
dtype = np.float32
embedding_weights = model.get_input_embeddings().weight
# Convert the embedding layer weights to numpy
embedding_weights_np = embedding_weights.detach().numpy().astype(dtype)

Then we select english tokens from the vocabulary. Then we can get an embedding matrix and a corresponding token list.

In [26]:
import tqdm
import re

def contains_english_chars(string):
    pattern = r'[a-zA-Z]'
    match = re.search(pattern, string)
    return bool(match)

def contains_non_english_chars(string):
    pattern = r'[^a-zA-Z]'
    match = re.search(pattern, string)
    return bool(match)

def filter_tokens(token2index):
    filtered_index2token = {}
    for key, idx in tqdm.tqdm(token2index.items()):
        if key.startswith('<'):
            continue
        if not key.startswith('▁'):
            continue
        val_ = key.replace("▁", "")
        if val_ == val_.upper():
            continue
        if contains_non_english_chars(val_):
            continue
        if 3 < len(val_) < 16 and contains_english_chars(val_):
            filtered_index2token[idx] = key

    return filtered_index2token

filtered_index2token = filter_tokens(embeddings)
used_num_tokens = len(filtered_index2token)
print(used_num_tokens)
for idx, token in filtered_index2token.items():
    token_2_embedding[token] = embedding_weights_np[idx].tolist()
token_list = list(token_2_embedding.keys())
embedding_matrix = np.array(list(token_2_embedding.values()), dtype=dtype)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32000/32000 [00:00<00:00, 663000.04it/s]


11400


In [27]:
print('we got the embedding matrix:')
print(embedding_matrix)

we got the embedding matrix:
[[-6.1035156e-04 -4.5471191e-03 -5.2795410e-03 ... -1.3656616e-03
   4.2419434e-03 -8.1634521e-04]
 [ 4.8522949e-03  5.9814453e-03  1.1596680e-03 ... -2.6702881e-03
  -1.7471313e-03  9.9182129e-04]
 [-2.7465820e-03  4.3029785e-03  3.3874512e-03 ... -2.6092529e-03
  -1.2397766e-05 -3.4027100e-03]
 ...
 [-6.1340332e-03 -5.3405762e-03 -1.0910034e-03 ... -9.3841553e-04
  -7.4005127e-04 -7.3852539e-03]
 [-4.5166016e-03  8.2015991e-04  4.8217773e-03 ... -1.1978149e-03
  -1.0528564e-03 -2.1362305e-03]
 [ 1.2054443e-03  1.9836426e-03 -2.8419495e-04 ... -1.5792847e-03
  -2.8381348e-03 -7.1716309e-04]]


We can easily prepare the pre-computed data we needed for inferdpt by using the built-in function of the InferDPTKit class:

In [28]:
from fate_llm.algo.inferdpt.utils import InferDPTKit
param = InferDPTKit.make_inferdpt_kit_param(embedding_matrix, token_list)

11400it [00:37, 300.99it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4096/4096 [00:03<00:00, 1147.93it/s]


Great, the computation is complete! Now, let’s proceed to perturb a sentence using inferdpt with ε (epsilon) set to 3.0. We will also save the perturbed sentence to a designated folder for future reference.

In [33]:
inferdpt_kit = InferDPTKit(*param, tokenizer)

In [96]:
inferdpt_kit.perturb('From the river to the ocean', epsilon=3.0)

'into the tree to the woods'

In [97]:
save_kit_path = 'your path'
inferdpt_kit.save_to_path(save_kit_path)

### Go through Inferdpt Step by Step

Next, we will guide you through the process of using inferdpt step by step. We will simulate the interaction between the client and server locally. Before we begin, let’s discuss model inference. Within fate-llm's inferdpt module, we provide three types of model inference classes: vllm, vllm server, and Huggingface native. You can explore these classes in the [code files](../../../python/fate_llm/algo/inferdpt/inference/) or develop your own inference tool based on your specific needs. We highly recommend using vllm server. In this case, we will use the following two commands to launch two large model services, corresponding to the server’s LLM and the local decoding small model.

For this example, we have executed the process on a machine equipped with four V100-32G GPUs. We advise you to adjust the model path and GPU settings as necessary to accommodate the specifications of your own machine.

Start vllm server using commands below:

In [None]:
! python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8888 --model ./Mistral-7B-Instruct-v0.2  --dtype=half --enforce-eager --tensor-parallel-size 4 --gpu-memory-utilization 0.6

In [None]:
! python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8887 --model ./Qwen1.5-0.5B  --dtype=half --enforce-eager --tensor-parallel-size 4 --gpu-memory-utilization 0.2

Next, we will initialize the inference instance, which are the parameters for both the inferdpt client and server. This includes specifying the IP address, port, and the model name of the service that has been started.

In [130]:
from fate_llm.algo.inferdpt.inference.api import APICompletionInference
# for client
inference_client = APICompletionInference(api_url="http://127.0.0.1:8887/v1", model_name='./Qwen1.5-0.5B', api_key='EMPTY')
# for server
inference_server = APICompletionInference(api_url="http://127.0.0.1:8888/v1", model_name='./Mistral-7B-Instruct-v0.2', api_key='EMPTY')

In [135]:
ret = inference_client.inference(['Hello how are you?'], inference_kwargs={
    'stop': ['<|im_end|>', '\n'],
    'temperature': 0.01,
    'max_tokens': 16
})
print(ret[0])

 I am a new user of this forum. I am a 20 year


In [138]:
ret = inference_server.inference(['<s>[INST]Who are u?[/INST]'], inference_kwargs={
    'stop': ['</s>'],
    'temperature': 0.01,
    'max_tokens': 128
})
print(ret[0])

 I am an artificial intelligence designed to assist with information and answer questions to the best of my ability. I don't have the ability to have a personal identity or emotions. I'm here to help you with any inquiries you may have. How can I assist you today?


In this tutorial, we will use a question-answering (QA) task as our illustrative example. To do so, we will extract a sample from the ARC-E dataset for demonstration purposes, here is the example:

In [100]:
test_example = {'id': 'Mercury_7220990',
'question': 'Which factor will most likely cause a person to develop a fever?',
'choices': {'text': ['a leg muscle relaxing after exercise',
'a bacterial population in the bloodstream',
'several viral particles on the skin',
'carbohydrates being digested in the stomach'],
'label': ['A', 'B', 'C', 'D']},
'answerKey': 'B'}

Before initiating the inference, it's crucial to understand the sequence of steps involved. We will leverage the Jinja2 template engine to structure our documentation as follows:

1. **Document Template Organization**: The initial step is to organize the document dictionary using the DOC TEMPLATE. This template will provide the structure for the input document.

2. **Differential Privacy Perturbation**: Apply Differential Privacy (DP) to perturb the structured document string. This will result in a perturbed document. The perturbed document is then added to the original document under the key 'perturbed_doc'. Note that you can modify this key according to your parameter settings.

3. **Instruction Addition**: Use the INSTRUCTION TEMPLATE to add instructions (or few-shot examples) to the perturbed document. This modified document is then sent to the server side for processing. The server's response is captured, and this perturbed response is appended to the original document under the key 'perturbed_response'. As before, this key can be adjusted as needed.

4. **Decode Template Formatting**: Finally, employ the decode template to format the decode prompt. The resulting inference is then added to the original dictionary under the key 'inferdpt_result'. This key, like the others, can be customized to fit your specific parameters.

By following these steps, the inferdpt framework enables a structured and privacy-preserving inference process, leading to a final output that incorporates the perturbed data and the model's response.
For more details, you can refer to the source codes:


The templates for this example are defined on the client side. Below is the Jinja template we use:

In [141]:
doc_template = """{{question}} 
Choices:{{choices.text}}
"""

instruction_template="""
<s>[INST]
Select Answer from Choices and explain it in "Rationale" with few words. Please refer to the example to write the rationale.
Use <end> to finish your rationle."

Example(s):
Question:George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?
Choices:['dry palms', 'wet palms', 'palms covered with oil', 'palms covered with lotion']
Rationale:Friction between two surfaces generates heat due to the conversion of kinetic energy into thermal energy. Dry palms produce the most heat when rubbed together as they create higher friction compared to wet or lubricated palms, which reduce friction.  Therefore, the answer is 'dry palms'.<end>

Please explain:
Question:{{perturbed_doc}}
Rationale:
[/INST]
"""

decode_template = """Select Answer from Choices and explain it in "Rationale" with few words. Please refer to the example to write the rationale.
Use <end> to finish your rationle."

Example(s):
Question:George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?
Choices:['dry palms', 'wet palms', 'palms covered with oil', 'palms covered with lotion']
Rationale:Friction between two surfaces generates heat due to the conversion of kinetic energy into thermal energy. Dry palms produce the most heat when rubbed together as they create higher friction compared to wet or lubricated palms, which reduce friction.  Therefore, the answer is 'dry palms'.<end>

Question:{{perturbed_doc}}
Rationale:{{perturbed_response | replace('\n', '')}}<end>

Please explain:
Question:{{question}} 
Choices:{{choices.text}}
Rationale:
"""

Please be aware that we have included a one-shot example in the prompt to ensure that the Large Language Model (LLM) responds as anticipated.

Now we create two script: 
- inferdpt_client.py
- inferdpt_server.py

And run codes provided below:

#### Client Side: inferdpt_client.py

In [None]:
from fate_llm.algo.inferdpt.inference.api import APICompletionInference
from fate_llm.algo.inferdpt import inferdpt
from fate_llm.algo.inferdpt.utils import InferDPTKit
from fate_llm.algo.inferdpt.inferdpt import InferDPTClient, InferDPTServer
from jinja2 import Template
from fate.arch import Context
import sys


arbiter = ("arbiter", 10000)
guest = ("guest", 10000)
host = ("host", 9999)
name = "fed1"


def create_ctx(local):
    from fate.arch import Context
    from fate.arch.computing.backends.standalone import CSession
    from fate.arch.federation.backends.standalone import StandaloneFederation
    import logging

    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    console_handler = logging.StreamHandler()
    console_handler.setLevel(logging.INFO)

    formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
    console_handler.setFormatter(formatter)

    logger.addHandler(console_handler)
    computing = CSession(data_dir="./session_dir")
    return Context(computing=computing, federation=StandaloneFederation(computing, name, local, [guest, host, arbiter]))


ctx = create_ctx(guest)
save_kit_path = 'your path'
kit = InferDPTKit.load_from_path(save_kit_path)
inference = APICompletionInference(api_url="http://127.0.0.1:8887/v1", model_name='./Qwen1.5-0.5B', api_key='EMPTY')

test_example = {'id': 'Mercury_7220990',
'question': 'Which factor will most likely cause a person to develop a fever?',
'choices': {'text': ['a leg muscle relaxing after exercise',
'a bacterial population in the bloodstream',
'several viral particles on the skin',
'carbohydrates being digested in the stomach'],
'label': ['A', 'B', 'C', 'D']},
'answerKey': 'B'}


doc_template = """{{question}} 
Choices:{{choices.text}}
"""

instruction_template="""
<s>[INST]
Select Answer from Choices and explain it in "Rationale" with few words. Please refer to the example to write the rationale.
Use <end> to finish your rationle."

Example(s):
Question:George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?
Choices:['dry palms', 'wet palms', 'palms covered with oil', 'palms covered with lotion']
Rationale:Friction between two surfaces generates heat due to the conversion of kinetic energy into thermal energy. Dry palms produce the most heat when rubbed together as they create higher friction compared to wet or lubricated palms, which reduce friction.  Therefore, the answer is 'dry palms'.<end>

Please explain:
Question:{{perturbed_doc}}
Rationale:
[/INST]
"""

decode_template = """Select Answer from Choices and explain it in "Rationale" with few words. Please refer to the example to write the rationale.
Use <end> to finish your rationle."

Example(s):
Question:George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?
Choices:['dry palms', 'wet palms', 'palms covered with oil', 'palms covered with lotion']
Rationale:Friction between two surfaces generates heat due to the conversion of kinetic energy into thermal energy. Dry palms produce the most heat when rubbed together as they create higher friction compared to wet or lubricated palms, which reduce friction.  Therefore, the answer is 'dry palms'.<end>

Question:{{perturbed_doc}}
Rationale:{{perturbed_response | replace('\n', '')}}<end>

Please explain:
Question:{{question}} 
Choices:{{choices.text}}
Rationale:
"""

inferdpt_client = inferdpt.InferDPTClient(ctx, kit, inference, epsilon=3.0)
result = inferdpt_client.inference([test_example], doc_template, instruction_template, decode_template, \
                                 remote_inference_kwargs={
                                    'stop': ['<\s>'],
                                    'temperature': 0.01,
                                    'max_tokens': 256
                                 },
                                 local_inference_kwargs={
                                    'stop': ['<|im_end|>', '<end>', '<end>\n', '<end>\n\n', '.\n\n\n\n\n', '<|end_of_text|>', '>\n\n\n'],
                                    'temperature': 0.01,
                                    'max_tokens': 256
                                 })
print('result is {}'.format(result[0]['inferdpt_result']))

#### Server Side: inferdpt_server.py

In [None]:
from fate_llm.algo.inferdpt.utils import InferDPTKit
from fate_llm.algo.inferdpt.inferdpt import InferDPTClient, InferDPTServer
from jinja2 import Template
from fate.arch import Context
import sys
from fate_llm.algo.inferdpt.inference.api import APICompletionInference


arbiter = ("arbiter", 10000)
guest = ("guest", 10000)
host = ("host", 9999)
name = "fed1"


def create_ctx(local):
    from fate.arch import Context
    from fate.arch.computing.backends.standalone import CSession
    from fate.arch.federation.backends.standalone import StandaloneFederation
    import logging

    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    console_handler = logging.StreamHandler()
    console_handler.setLevel(logging.INFO)

    formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
    console_handler.setFormatter(formatter)

    logger.addHandler(console_handler)
    computing = CSession(data_dir="./session_dir")
    return Context(computing=computing, federation=StandaloneFederation(computing, name, local, [guest, host, arbiter]))


ctx = create_ctx(arbiter)
inference_server = APICompletionInference(api_url="http://127.0.0.1:8888/v1", model_name='./Mistral-7B-Instruct-v0.2', api_key='EMPTY')
inferdpt_server = InferDPTServer(ctx, inference)
inferdpt_server.inference()

Start two terminal and launch client&server scripts simultaneously.
On the client side we can get the answer:

```
The given question asks which factor will most likely cause a person to develop a fever. The factors mentioned are a leg muscle relaxing after exercise, a bacterial population in the bloodstream, several viral particles on the skin, and carbohydrates being digested in the stomach. The question is asking which factor is most likely to cause a person to develop a fever. The factors are all related to the body's internal environment, but the most likely factor is a bacterial population in the bloodstream. This is because bacteria can cause a fever, and the body's immune system responds to the infection by producing antibodies that can fight off the bacteria. Therefore, the answer is 'a bacterial population in the bloodstream'
```

## Use Inferdpt in FATE Pipeline

We can leverage the FATE pipeline to submit inference tasks for industrial applications. When operating in pipeline mode, to safeguard against privacy breaches such as API key or server path leakage, it is crucial to create initialization scripts for establishing inferdpt client instances. Alternatively, you can modify the provided scripts within the fate_llm/algo/inferdpt/init folder.

Below, we provide an overview of the default_init.py script, which serves as an example of how to create an [initialization class](../../../python/fate_llm/algo/inferdpt/init/default_init.py). By customizing the static variables within this class, you can configure the client and server to interact with the Large Language Model (LLM) interfaces as intended.

In [None]:
from fate_llm.algo.inferdpt.init._init import InferDPTInit
from fate_llm.algo.inferdpt.inference.api import APICompletionInference
from fate_llm.algo.inferdpt import inferdpt
from fate_llm.algo.inferdpt.utils import InferDPTKit


class InferDPTAPIClientInit(InferDPTInit):

    api_url = ''
    api_model_name = ''
    api_key = 'EMPTY'
    inferdpt_kit_path = ''
    eps = 3.0

    def __init__(self, ctx):
        super().__init__(ctx)
        self.ctx = ctx

    def get_inferdpt_inst(self):
        inference = APICompletionInference(api_url=self.api_url, model_name=self.api_model_name, api_key=self.api_key)
        kit = InferDPTKit.load_from_path(self.inferdpt_kit_path)
        inferdpt_client = inferdpt.InferDPTClient(self.ctx, kit, inference, epsilon=self.eps)
        return inferdpt_client


class InferDPTAPIServerInit(InferDPTInit):

    api_url = ''
    api_model_name = ''
    api_key = 'EMPTY'

    def __init__(self, ctx):
        super().__init__(ctx)
        self.ctx = ctx

    def get_inferdpt_inst(self):
        inference = APICompletionInference(api_url=self.api_url, model_name=self.api_model_name, api_key=self.api_key)
        inferdpt_server = inferdpt.InferDPTServer(self.ctx,inference_inst=inference)
        return inferdpt_server


In the pipeline example, we use arc_easy dataset and our built-in huggingface dataset. Only HuggingfaceDataset is supported in the pipeline mode:

In [1]:
from fate_llm.dataset.hf_dataset import HuggingfaceDataset

In [3]:
from datasets import load_dataset
dataset = load_dataset('arc_easy')
dataset.save_to_disk('your_path/arc_easy')

In [6]:
ds = HuggingfaceDataset(load_from_disk= True, data_split_key='train')
ds.load('your_path/arc_easy')

In [7]:
print(ds[0])

{'id': 'Mercury_7220990', 'question': 'Which factor will most likely cause a person to develop a fever?', 'choices': {'text': ['a leg muscle relaxing after exercise', 'a bacterial population in the bloodstream', 'several viral particles on the skin', 'carbohydrates being digested in the stomach'], 'label': ['A', 'B', 'C', 'D']}, 'answerKey': 'B'}


After that, we can associate the dataset path with a name and namespace. By specifying the dataset configuration, the HuggingfaceDataset will be initialized and the dataset will be loaded from the specified path. 
```
flow table bind --namespace experiment --name arc_e --path 'your_path/arc_easy'
```
Once these initialization scripts are in place, you can submit a pipeline task by specifying the initialization class in the configuration files. For more information, refer to the script provided below:

In [None]:
import argparse
from fate_client.pipeline.utils import test_utils
from fate_client.pipeline.components.fate.evaluation import Evaluation
from fate_client.pipeline.components.fate.reader import Reader
from fate_client.pipeline import FateFlowPipeline
from fate_client.pipeline.components.fate.nn.torch import nn, optim
from fate_client.pipeline.components.fate.nn.torch.base import Sequential
from fate_client.pipeline.components.fate.homo_nn import HomoNN, get_config_of_default_runner
from fate_client.pipeline.components.fate.nn.algo_params import TrainingArguments, FedAVGArguments


def main(config="../../config.yaml", namespace=""):
    # obtain config
    if isinstance(config, str):
        config = test_utils.load_job_config(config)
    parties = config.parties
    guest = parties.guest[0]
    arbiter = parties.arbiter[0]

    pipeline = FateFlowPipeline().set_parties(guest=guest, arbiter=arbiter)

    reader_0 = Reader("reader_0", runtime_parties=dict(guest=guest))
    reader_0.guest.task_parameters(
        namespace=f"experiment{namespace}",
        name="arc_e"
    )

    inferdpt_init_conf_client = {
        'module_name': 'fate_llm.algo.inferdpt.init.default_init',
        'item_name': 'InferDPTAPIClientInit'
    }

    dataset_conf = {
        'module_name': 'fate_llm.dataset.hf_dataset',
        'item_name': 'HuggingfaceDataset',
        'kwargs':{
            'load_from_disk': True,
            'data_split_key': 'train'
        }
    }

    doc_template = """{{question}} 
    Choices:{{choices.text}}
    """

    instruction_template="""
    <|im_start|>system
    You are a helpful assistant.<|im_end|>
    <|im_start|>user
    Select Answer from Choices and explain it in "Rationale" with few words. Please refer to the example to write the rationale.
    Use <end> to finish your rationle."

    Example(s):
    Question:George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?
    Choices:['dry palms', 'wet palms', 'palms covered with oil', 'palms covered with lotion']
    Rationale:Friction between two surfaces generates heat due to the conversion of kinetic energy into thermal energy. Dry palms produce the most heat when rubbed together as they create higher friction compared to wet or lubricated palms, which reduce friction.  Therefore, the answer is 'dry palms'.<end>

    Please explain:
    Question:{{perturbed_doc}}
    Rationale:
    <|im_end|>
    <|im_start|>assistant
    """

    decode_template = """Select Answer from Choices and explain it in "Rationale" with few words. Please refer to the example to write the rationale.
    Use <end> to finish your rationle."

    Example(s):
    Question:George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?
    Choices:['dry palms', 'wet palms', 'palms covered with oil', 'palms covered with lotion']
    Rationale:Friction between two surfaces generates heat due to the conversion of kinetic energy into thermal energy. Dry palms produce the most heat when rubbed together as they create higher friction compared to wet or lubricated palms, which reduce friction.  Therefore, the answer is 'dry palms'.<end>

    Question:{{perturbed_doc}}
    Rationale:{{perturbed_response | replace('\n', '')}}<end>

    Please explain:
    Question:{{question}} 
    Choices:{{choices.text}}
    Rationale:
    """

    remote_inference_kwargs={
        'stop': [['<\s>']],
        'temperature': 0.01,
        'max_tokens': 256
    }

    local_inference_kwargs={
        'stop': ['<|im_end|>', '<end>', '<end>\n', '<end>\n\n', '.\n\n\n\n\n', '<|end_of_text|>', '>\n\n\n'],
        'temperature': 0.01,
        'max_tokens': 256
    }

    inferdpt_client_conf = {
        'inferdpt_init_conf': inferdpt_init_conf_client,
        'dataset_conf': dataset_conf,
        'doc_template': doc_template,
        'instruction_template': instruction_template,
        'decode_template': decode_template,
        'dataset_conf': dataset_conf,
        'remote_inference_kwargs': remote_inference_kwargs,
        'local_inference_kwargs': local_inference_kwargs
    }

    inferdpt_init_conf_server = {
        'module_name': 'fate_llm.algo.inferdpt.init.default_init',
        'item_name': 'InferDPTAPIServerInit'
    }

    inferdpt_server_conf = {
        'inferdpt_init_conf': inferdpt_init_conf_server
    }

    homo_nn_0 = HomoNN(
        'nn_0',
        runner_module='inferdpt_runner',
        runner_class='InferDPTRunner',
        train_data=reader_0.outputs["output_data"]
    )

    homo_nn_0.guest.task_parameters(runner_conf=inferdpt_client_conf)
    homo_nn_0.arbiter.task_parameters(runner_conf=inferdpt_server_conf)
    pipeline.add_tasks([reader_0, homo_nn_0])
    pipeline.compile()
    pipeline.fit()


if __name__ == "__main__":
    parser = argparse.ArgumentParser("PIPELINE DEMO")
    parser.add_argument("--config", type=str, default="../config.yaml",
                        help="config file")
    parser.add_argument("--namespace", type=str, default="",
                        help="namespace for data stored in FATE")
    args = parser.parse_args()
    main(config=args.config, namespace=args.namespace)
