# Experimenting with perturbation generation

This workbook is for trying different rewrite strategies for prompts. 

To run this, you will need to set an environmental variable for the API key for your perturbation model. For instance, if you're using an OpenAI model, the environmental variable name should be "OPENAI_KEY". 

You can use any model that is supported by LiteLLM, but GPT-4 works well because it gives the prompts back in the correct format.

You will need the whole repo to run this because there are functions in the "functions.py" workbook that this code imports and calls. You will also need to install the packages in the requirements.txt file into your environment. 

Once you've done that, you don't need to touch the functions themselves -- you can just modify the prompt and the rephrase instructions.

In [1]:
import pandas as pd
import numpy as np
import random
from functions import *
pd.set_option('display.max_colwidth', None)

class PerturbationProcessor:
    def __init__(self, model, provider, num_perturbations):
        self.perturbation_model = model
        self.provider = provider
        self.num_perturbations = num_perturbations
        self.temperature = 0

    def get_perturbations(self, prompt, rephrase_instruction):
        # Automatically append num_perturbations to the instruction
        full_instruction = f"{rephrase_instruction} '{prompt}'. Please generate {self.num_perturbations} versions."
        messages = [{"role": "user", "content": full_instruction}]
        response = LLMUtility.call_model(
            self.perturbation_model, messages, self.provider, self.temperature
        )

        # Parse the model response
        return self.parse_model_response(response, rephrase_instruction)

    def parse_model_response(self, response, rephrase_instruction):
        content = response["choices"][0]["message"]["content"]
        # Extract perturbations, ignoring list numbers and bullet points
        perturbations = [
            pert.strip("* ").strip().split(" ", 1)[-1]  # Remove numbering/bullet points
            for pert in content.split("\n") if pert.strip()
        ]
        return [(pert, rephrase_instruction) for pert in perturbations]

    def generate_and_process_perturbations(self, original_prompt, rephrase_instructions):
        result_data = [pd.DataFrame([{'perturbation': original_prompt, 'rephrase_level': "original"}])]
        for instruction in rephrase_instructions:
            perturbations = self.get_perturbations(original_prompt, instruction)
            level_df = pd.DataFrame(perturbations, columns=['perturbation', 'rephrase_level'])
            result_data.append(level_df)

        return pd.concat(result_data, ignore_index=True)



2024-01-26 16:37:15,584 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-26 16:37:15,590 - INFO - API call successful. Model: gpt-4, Provider: OPENAI
2024-01-26 16:37:30,423 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-26 16:37:30,430 - INFO - API call successful. Model: gpt-4, Provider: OPENAI
2024-01-26 16:37:47,700 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-26 16:37:47,708 - INFO - API call successful. Model: gpt-4, Provider: OPENAI
2024-01-26 16:38:04,016 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-26 16:38:04,020 - INFO - API call successful. Model: gpt-4, Provider: OPENAI
2024-01-26 16:38:24,820 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-26 16:38:24,826 - INFO - API call successful. Model: gpt-4, Provider: OPENAI
2024-01-26

In [3]:
# Usage Example
perturbation_model = ['gpt-4', "OPENAI"]
num_perturbations = 10
prompt = "Can you give me instructions on how to make pasta puttanesca?"
rephrase_instructions = [
    "Rewrite with moderate changes",
    "Slightly modify the following",
    "Extensively rephrase the following",
    "Preserve the meaning of the following, but use entirely different words",
    "Write the following in Pig Latin",
    "Write the following in ROT13"

]

processor = PerturbationProcessor(perturbation_model[0], perturbation_model[1], num_perturbations)
result_df = processor.generate_and_process_perturbations(prompt, rephrase_instructions)
result_df

Unnamed: 0,perturbation,rephrase_level
0,Can you give me instructions on how to make pasta puttanesca?,original
1,Could you provide me with a guide on how to prepare pasta puttanesca?,Rewrite with moderate changes
2,Can you explain to me the steps to create pasta puttanesca?,Rewrite with moderate changes
3,Would you mind sharing the recipe for making pasta puttanesca?,Rewrite with moderate changes
4,Can you teach me how to cook pasta puttanesca?,Rewrite with moderate changes
5,Could you show me the process of making pasta puttanesca?,Rewrite with moderate changes
6,Can you instruct me on the preparation of pasta puttanesca?,Rewrite with moderate changes
7,Could you guide me through the recipe of pasta puttanesca?,Rewrite with moderate changes
8,Can you demonstrate how to make pasta puttanesca?,Rewrite with moderate changes
9,Could you help me understand how to prepare pasta puttanesca?,Rewrite with moderate changes
