## PAPILLON Tutorial
In this notebook, we will walk through how to set up your own PAPILLON pipeline locally with a GPU server step-by-step.

### What is PAPILLON?
PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles (PAPILLON) is a framework where trusted but weaker models can use untrusted but more powerful models as tools in order to preserve user inference-time privacy.

You can refer to the original paper [here](https://arxiv.org/abs/2410.17127) for how we constructed the benchmark for our task.

![Overview of the PAPILLON pipeline](figs/1.png)

For this tutorial, we will use **GPT-4o-mini** as the untrusted model and **Llama-3.1-8B-Instruct** as the trusted, locally-hosted model.

### Install Dependencies

In [None]:
%pip install dspy-ai==2.5.41 openai pandas sglang[all]

### Launch Llama-3.1-8B-Instruct
We will host this model using SGLang. If you have the model hosted somewhere else, that should also be okay, you can just adjust the `local_lm` variable accordingly in the following sections.

In [None]:
%pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ 

PORT_NUMBER = 7501 # You can change the port number here

!CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --port $PORT_NUMBER --model-path meta-llama/Llama-3.1-8B-Instruct

### Initialize Local LM and Remote LLM
The Local LM would correspond to the trusted (but usually weaker) model. The Local LM should ideally be the only component of the pipeline that manages your private information. The Remote LM would correspond to the untrusted (but usually more potent) model. The goal of the PAPILLON pipeline is to produce high-quality outputs while leaking as little of your private information as possible to the Remote LM.

In [None]:
import dspy
import os

os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

local_lm = dspy.LM('openai/default', api_base=f"http://127.0.0.1:{PORT_NUMBER}/v1", api_key="", max_tokens=4000)
dspy.configure(lm=local_lm)

openai_lm = dspy.LM(model="openai/gpt-4o-mini", max_tokens=4000)

### Define the PAPILLON DSPy Module
We will now define the Prompt Creator and Information Aggregator modules according to the diagram earlier in this notebook. 

After defining the module, we can then optimize the prompts for these modules using the MIPRO v2 DSPy prompt optimizer, so that you can keep creating new PAPILLON pipelines for your specific needs.

In [None]:

class CreateOnePrompt(dspy.Signature):
    """
    You are a helpful assistant that is very mindful of user privacy. You have access to a powerful large language model that you can query. Given a user request, create a prompt for your large language model that preserves user privacy, so that this model can help you complete the user request. Provide the prompt directly without any preamble. DO NOT COMPLETE THE USER QUERY, ONLY GENERATE A PROMPT.
    """
    userQuery = dspy.InputField(desc="The user's request to be fulfilled.")
    createdPrompt = dspy.OutputField()

class InfoAggregator(dspy.Signature):
    """
    You are a helpful assistant. Respond to queries from the user.
    """

    userQuery = dspy.InputField(desc="The user's request to be fulfilled.")
    modelExampleResponses = dspy.InputField(desc="Information from a more powerful language model responding to related queries. Complete the user query by referencing this information. Only you have access to this information.")
    finalOutput = dspy.OutputField()


class PAPILLON(dspy.Module):
    def __init__(self, untrusted_model):
        self.prompt_creater = dspy.ChainOfThought(CreateOnePrompt)
        self.info_aggregator = dspy.Predict(InfoAggregator)
        self.untrusted_model = untrusted_model
    
    def forward(self, user_query):
        try:
            prompt = self.prompt_creater(userQuery=user_query).createdPrompt
            response = self.untrusted_model(prompt)[0]
            output = self.info_aggregator(userQuery=user_query, modelExampleResponses=response)
        except Exception:
            return dspy.Prediction(prompt="", output="", gptResponse="")
        
        return dspy.Prediction(prompt=prompt, output=output.finalOutput, gptResponse=response)

### Optimize Your PAPILLON Pipeline
#### Let's First Load Some Data

In our paper, we proposed the PUPA benchmark, available both on Huggingface (TBD) and in this repository (`pupa/*.csv`).

The PUPA benchmark contains user-assistant interactions where the user divulges personally identifiable information (PII) in the [WildChat](https://arxiv.org/abs/2405.01470) dataset. We use annotation schemas detailed in the [Trust No Bot](https://arxiv.org/abs/2407.11438) paper to determine whether an interaction contains private information of specific types. PUPA consists of two parts:
1. `PUPA-TNB` is data derived from the original annotations from the Trust No Bot paper.
2. `PUPA-New` is data derived from the rest of the WildChat dataset that is not annotated in the Trust No Bot paper.

We will use PUPA-New to optimize our PAPILLON program and for evaluation before and after the optimization. In our paper, we used PUPA-TNB for standardized evaluation across different models, but we are just using PUPA-New here for demonstration purposes.

In [10]:
# Let's first take a look at some data examples
import pandas
from IPython.display import display

pupa_tnb = pandas.read_csv("pupa/PUPA_TNB.csv")
pupa_tnb = pupa_tnb[['user_query', 'pii_units', 'target_response']].copy()

display(pupa_tnb.head(5))


Unnamed: 0,user_query,pii_units,target_response
0,Rachel Zheng is a clerk in the Administrative ...,rachel||johnny||univeristy of westminster||wes...,Rachel Zheng 是 H&R Technology 行政部门的职员。当她回到办公室时...
1,Please describe the job knowledge requirements...,university,The International Advising Assistant position ...
2,i want to write a letter to the royal college ...,Sudan,"[Your Name] [Your Address] [City, State, ZIP C..."
3,critically analyze the film You Shall Not Kill...,krzysztof kieślowski,"As an AI language model, I am not programmed t..."
4,list all job offering from SPARC companies wit...,sparc,"I apologize, but I am not able to provide spec..."


In the cell above,
* `user_query` is the original user query containing private information
* `pii_units` are PII information as extracted by GPT-4o-mini; as you can see in the example, there are instances of over-redaction
* `target_response` are the original GPT-3.5 or GPT-4 responses according to WildChat.

In [None]:
# Data Processing function
from dspy import Example

def synthesize_tvt(data_file):
    df = pandas.read_csv(data_file, index_col=False)
    train, val, test = [], [], []
    for i, row in df.iterrows():
        if pandas.isna(row["pii_units"]) or not isinstance(row["pii_units"], str) or len(row["pii_units"]) == 0:
            continue
        new_dp = Example({"target_response": row["target_response"],
                          "user_query": row["user_query"],
                          "pii_str": row["pii_units"]}).with_inputs("user_query")
        if i < 150:
            train.append(new_dp)
        elif 150 <= i < 300:
            val.append(new_dp)
        else:
            test.append(new_dp)
    return train, val, test


The goal of the optimization process is to:
1. Make sure the PAPILLON responses are at least on par with the `target_response`
2. Leak as few `pii_units` to the Remote LLM as possible

#### Define the LLM Judge
The definition of the LLM judge here is not the same as the definition in our work, for simplicity. If you are interested in the slightly more complex construction, please refer to `papillon/llm_judge.py`.

In [None]:
class JudgeQuality(dspy.Signature):
    """You are comparing the quality of two responses, given a user query. Is response A at least as good as response B? Respond with yes or no."""

    user_query = dspy.InputField(desc="The user's request to be fulfilled.")
    response_A = dspy.InputField()
    response_B = dspy.InputField()
    output = dspy.OutputField()

class JudgeLeakage(dspy.Signature):
    """Determine whether a piece of information is present in the prompt given. Respond with yes or no."""

    information_piece = dspy.InputField(desc="The piece of information that we wish to check. If this piece of information is a celebrity name, respond with no.")
    prompt = dspy.InputField()
    output = dspy.OutputField()

class JudgePromptQual(dspy.Signature):
    """Determine whether a piece of text is a valid prompt for a language model. Respond with yes or no."""
    prompt_input = dspy.InputField()
    output = dspy.OutputField()


class LLMJudge(dspy.Module):
    def __init__(self):
        super().__init__()
        self.quality_judge = dspy.ChainOfThought(JudgeQuality)
        self.fact_checker = dspy.Predict(JudgeLeakage)
        self.prompt_qual = dspy.ChainOfThought(JudgePromptQual)

    def forward(self, user_query, og_resp, new_resp=None, updated_query=None, pii_str=None):
        if new_resp:
            answer = self.quality_judge(user_query=user_query, response_A=new_resp, response_B=og_resp)
            judgment = int((answer.output.lower()).startswith("yes"))
        else:
            judgment = -1
        if updated_query:
            if isinstance(pii_str, str):
                # The pii_str field must be separated by ||
                all_pii_pieces = set(pii_str.split("||"))
                pii_score = 0
                for p in all_pii_pieces:
                    answer = self.fact_checker(information_piece=p, prompt=updated_query)
                    pii_score += int((answer.output.lower()).startswith("yes"))
                pii_score = pii_score / len(all_pii_pieces)
            else:
                pii_score = -1
        else:
            pii_score = -1
        
        return dspy.Prediction(
            quality=judgment,
            leakage=pii_score
        )


llm_judge = LLMJudge()

#### Define the Metric
This will guide the prompt optimization process.

In [None]:
def metric(gold, pred, trace=None):
    og_model_output, og_user_query, og_pii = gold.target_response, gold.user_query, gold.pii_str
    pred_prompt, pred_out = pred.prompt, pred.output
    if len(pred_prompt) == 0:
        return 0
    with dspy.context(lm=openai_lm):
        score_dict = llm_judge(user_query=og_user_query, new_resp=pred_out, og_resp=og_model_output,
                                            updated_query=pred_prompt, pii_str=og_pii)       
        final_quality_score = score_dict.quality
        leakage_sc = score_dict.leakage
        try:
            assert leakage_sc != -1
        except AssertionError:
            return 0
    # Want to maximize quality and minimize percentage of leakage
    final_total_score = final_quality_score - leakage_sc
    if trace is not None: return final_total_score >= 1
    return final_total_score


#### Evaluate Zeroshot PAPILLON
For this section, we will first synthesize the train, validation, and test data from the `PUPA-New` split of PUPA. We will then evaluate the performance of the zero-shot PAPILLON module using the metrics and LLM judge we just defined.

In [None]:
from dspy.evaluate.evaluate import Evaluate
from dspy.teleprompt import MIPROv2
import json

DATA_PATH = "pupa/PUPA_New.csv"

train, val, test = synthesize_tvt(DATA_PATH)
zeroshot = PAPILLON(openai_lm)
INCOMPLIANCE = 0
evaluate = Evaluate(metric=metric, devset=val, num_threads=8, display_progress=True, display_table=5, max_errors=100)
try:
    eval_score = evaluate(zeroshot)
except Exception as e:
    INCOMPLIANCE += 1
eval_scores = {}
eval_scores.update({"before_optimization": eval_score})
print(eval_score)

#### Optimize with MIPRO v2

In [None]:
# Where you want to store the optimized prompts
PROMPT_OUTPUT_FILE = "output_prompt.json" 
# You can choose whatever prompt model you would like, we are just sticking with GPT-4o-mini since it is the cheapest
# It is important that your task_model is your trusted model (local_lm)
teleprompter = MIPROv2(prompt_model=openai_lm, task_model=local_lm, metric=metric, num_candidates=10, init_temperature=1.0)
kwargs = dict(num_threads=8, display_progress=True, display_table=0)
compiled_prompt_opt = teleprompter.compile(zeroshot, trainset=train, num_batches=200, max_bootstrapped_demos=0, max_labeled_demos=0, eval_kwargs=kwargs)
compiled_prompt_opt.save(PROMPT_OUTPUT_FILE)

In [None]:
try:
    eval_score = evaluate(compiled_prompt_opt, devset=val, **kwargs)
    print(eval_score)
    eval_scores.update({"after_optimization": eval_score})
    
except ValueError as e:
    print(e)
    local_lm.inspect_history()

In [None]:
EVAL_FILE = PROMPT_OUTPUT_FILE.replace(".json", "_eval_socres.json")
json.dump(eval_scores, open(EVAL_FILE, "w+"))

### Trying Your Optimized PAPILLON Module

You have finished optimizing your PAPILLON module! Huzzah!! Now you can just load the newly optimized prompt and use it on user queries similar to those in PUPA.

In [None]:
priv_prompt = PAPILLON(openai_lm)
    
priv_prompt.load(PROMPT_OUTPUT_FILE)

while True:
    user_query = input("Your Query > ")
    pred = priv_prompt(user_query)
    print("PAPILLON PROMPT > ", pred.prompt)
    print("PAPILLON OUTPUT > ", pred.output)
