# EvoPrompt Demo
> In this notebook we will implement EvoPrompt(GA) with GPT-3.5 using OpenAI's API. This implementation leverages a Genetic Algorithm as the evolutionary operator.

## Setup:

In [1]:
import transformers
import openai
import rouge_score
import pandas as pd
import random
import numpy as np

In [2]:
from googleapiclient import discovery
with open("C:/Users/danie/OneDrive/Desktop/openai_youtube_api_key.txt") as f:
    api_key = f.readline()

openai.api_key = api_key

#### Let's look at a sample of how to use OpenAI's ChatCompletion endpoint so we know what to give our algorithm.

In [14]:
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=[
        {"role": "system", "content": "You are a first grade english teacher."},
        {"role": "user", "content": "Explain in as few words as possible how to structure a paragraph to a seven year old."}
    ]
)
print(response['choices'][0]['message']['content'])

A paragraph is like a sandwich. It has a topic sentence at the beginning, details in the middle, and a closing sentence at the end.


It looks like we need to provide a role and a user query. 

## Implementation:

Let's make our implementation match our pseudocode as closely as possible:
![EvoPrompt(GA)_pseudocode.png](attachment:EvoPrompt(GA)_pseudocode.png)

In [4]:
def AddHumanEngineeredPrompts():
    messages = []
    role = "You are an AI assistant. You will be given a task. Complete the task and explain in detail how you came to your conclusion."
    user_query = """Your task is to determine if the following two sentences convey the same meaning. First, answer either with yes or no, then explain your reasoning. 
Sentence 1: The dog eagerly dug up the ground to find the bone it had buried yesterday.
Sentence 2: Excited to retrieve what it hid, the dog unearthed the bone it stashed away the previous day."""

    messages.append({"role": "system", "content": role})
    messages.append({"role": "user", "content": user_query})
    return messages

In [5]:
def GenerateRandomPromptsLLM(N, user_prompt):
    num_generated = N-1
    prompts = []
    i = 0
    while i < num_generated:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=[
                {"role": "system", "content": "You're an AI assistant who completes its given its task as closely to the prompt as possible."}, 
                {"role": "user", "content": f"""Your task is to generate a response which is in the same format as the following python list: \n {user_prompt}
                This content sections in the dictionaries within your list should convey the same idea as those in original but make sure your response is in some way different than the original list.
                Finally and most importantly, if the example mentions comparing sentences make sure to use the same sentences from the example in your response."""}
            ]
        )
        if response['choices'][0]['message']['content'] != user_prompt:
            prompts.append(eval(response['choices'][0]['message']['content']))
            i += 1
        
    return prompts

In [6]:
def print_starting_generation(prompts):
    print("Generation 0 (Initialization Prompts):")
    for i, p in enumerate(prompts):
        print(f"Prompt {i}:")
        print("System Role:", p[0]['content'])
        print("User Prompt:", p[1]['content'])
        print()

In [7]:
def print_generation_best(gen_num, prompts, argmax):
    content = prompts[argmax]
    print(f"Generation {gen_num} Best Candidate:")
    print(f"Prompt {argmax+1}:")
    print("System Role:", content[0]['content'])
    print("User Prompt:", content[1]['content'])
    print()

In [8]:
def roulette_wheel_selection(scores):
    # Min-Max normalization
    min_val = np.min(scores)
    max_val = np.max(scores)
    normalized_scores = (scores - min_val) / (max_val - min_val)
    
    # Make sure they sum to 1 for probabilities
    normalized_scores /= normalized_scores.sum()
    
    # Randomly select two indices based on their probabilities
    selected_indices = np.random.choice(len(scores), 2, replace=False, p=normalized_scores)
    
    return selected_indices

In [9]:
import evaluate

rouge_score = evaluate.load("rouge")

In [10]:
def evoprompt_ga(num_prompts, num_iterations, target_response):
    
    #Step 1: Initialize Populations
    P = [] # Prompt Population
    failed = True
    while failed:
        try:
            human_prompt = AddHumanEngineeredPrompts()
            hp_response = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo-0613",
                    messages=human_prompt
            )
            hp_pred = hp_response['choices'][0]['message']['content']
            hp_score = rouge_score.compute(
                predictions=[hp_pred],
                references=[target_response]
            )['rouge1']
            P.append(human_prompt)
            failed=False
        except:
            print("Invalid Role Assignment. Please enter another.")
            continue
    
    failed = True
    while failed:
        try:
            model_prompts = GenerateRandomPromptsLLM(num_prompts, P[0])
            failed = False
        except SyntaxError:
            continue
        
    for prompt in model_prompts:
        P.append(prompt)
    print_starting_generation(P)
    print()
    print('---------------------------------------------------------------------')
    
    #Step 2: Evolutionary Loop
    for t in range(num_iterations):
        failed = True
        while failed:
            try:
                incomplete = True
                while incomplete:
                    try:
                        scores = []
                        for p in P: #Calculate rouge scores of all prompts responses  
                            response = openai.ChatCompletion.create(
                                model="gpt-3.5-turbo-0613",
                                messages=p
                            )
                            pred = response['choices'][0]['message']['content']
                            p_i_score = rouge_score.compute(
                                predictions=[pred],
                                references=[target_response]
                            )  
                            scores.append(p_i_score['rouge1'])
                        print_generation_best(t, P, argmax=np.argmax(scores))
                        #perform roulette wheel selection 
                        parents = np.array(P)[roulette_wheel_selection(scores)].tolist()
                        p1 = str(parents[0])
                        p2 = str(parents[1])
                        incomplete = False
                    except ValueError:
                        continue
                print(f"Generation {t}: Selection Stage Complete")
                #Crossover
                SYS_ROLE = "You are an AI assistant who completes its given task as closely as possible to the prompt."
                CROSSOVER_PROMPT = f"""Given the following two parent prompts:
                
                Prompt 1: {p1}
                
                Prompt 2: {p2}
                
                Your task is to create a new prompt which exactly matches the dictionary format of the originals but changes the content sections.
                Change the content sections of your response by combining portions of the original two prompts' content sections.
                Importantly, if the example mentions comparing sentences make sure to use the same sentences from the examples in your response.
                Do not add new line characters.
                """
                crossover_prompts = []
                i = 0
                while i < num_prompts:
                    try:
                        new_prompt = openai.ChatCompletion.create(
                            model="gpt-3.5-turbo-0613",
                            messages=[
                                {"role": "system", "content": SYS_ROLE}, 
                                {"role": "user", "content": CROSSOVER_PROMPT}
                            ],
                            temperature = .99
                        )
                        content = new_prompt['choices'][0]['message']['content']
                        if content != p1 and content != p2:
                            crossover_prompts.append(eval(content))
                            i += 1
                    except Exception as e:
                        print("Uninterpretable response generated. Retrying!")
                        continue
                
                print(f"Generation {t}: Crossover Stage Complete")
                #print(crossover_prompts)
                #Mutate
                mutated_prompts = []
                for co_prompt in crossover_prompts:
                    MUTATE_PROMPT = f"""Given the following prompt:
                    {co_prompt}
                    
                    Without changing the structure of the dictionary, mutate the following prompt's content sections.
                    Importantly, if the prompt mentions comparing sentences or statements make sure to use the original sentences or statements in you response.
                    Change these sections by replacing the words with synonyms or rephrasing ideas.
                    """
                    incomplete = True
                    while incomplete:
                        try:
                            new_prompt = openai.ChatCompletion.create(
                                model="gpt-3.5-turbo-0613",
                                messages=[
                                    {"role": "system", "content": SYS_ROLE}, 
                                    {"role": "user", "content": MUTATE_PROMPT}
                                ],
                                temperature = .99
                            )
                            content = new_prompt['choices'][0]['message']['content']
                            if content != co_prompt:
                                mutated_prompts.append(eval(content))
                                incomplete = False
                        except Exception as e:
                            print("Uninterpretable response generated. Retrying!")
                            continue
                
                print(f"Generation {t}: Mutation Stage Complete")
                #Evaluation:
                for m_prompt in mutated_prompts:
                    P.append(m_prompt)
                
                survival_scores = []
                for p in P: #Calculate rouge scores of all prompt responses  
                    print("Calculating Scores.")
                    response = openai.ChatCompletion.create(
                        model="gpt-3.5-turbo-0613",
                        messages=p
                    )
                    pred = response['choices'][0]['message']['content']
                    p_i_score = rouge_score.compute(
                        predictions=[pred],
                        references=[target_response]
                    )  
                    survival_scores.append(p_i_score['rouge1'])
                    
                print(f"Generation {t}: Evaluation Stage Complete")
            
                #Select next generation:
                max_indices = np.argsort(survival_scores)
                P = [P[i] for i in max_indices[3:]]
                print(f"Generation {t}: Generation Complete. Offspring Selected!")
                failed=False
            except Exception as e:
                print("Generation Failed due to:", e) 
                continue       
        
    final_scores = []
    final_preds = []
    for p in P: #Calculate rouge scores of all prompts responses  
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=p
        )
        pred = response['choices'][0]['message']['content']
        final_preds.append(pred)
        p_i_score = rouge_score.compute(
            predictions=[pred],
            references=[target_response]
        )  
        final_scores.append(p_i_score['rouge1'])
        
    print()
    print('---------------------------------------------------------------------')
    print(f"Original Prompt:")
    print("System Role:", human_prompt[0]['content'])
    print("User Prompt:", human_prompt[1]['content'])
    print()
    print("Response:", f"\n{hp_pred}")
    print("Rouge1 Score:", hp_score)
    
    final_prompt_idx = np.argmax(final_scores)
    content = P[final_prompt_idx]
    final_response = final_preds[final_prompt_idx]
    print()
    print('---------------------------------------------------------------------')
    print(f"Final Prompt:")
    print("System Role:", content[0]['content'])
    print("User Prompt:", content[1]['content'])
    print()
    print("Response:", f"\n{final_response}")
    print("Rouge1 Score:", final_scores[final_prompt_idx])
        
        
        

In [12]:
evoprompt_ga(3, 4, "Yes, the two sentences convey the same meaning.\n\nIn both sentences, the subject is a dog that is actively searching for a buried bone. In both sentences, the dog is described as being eager, excited, and motivated to find the bone. Additionally, both sentences indicate that the bone was buried by the dog and that it had been buried the previous day.\n\nAlthough the wording and phrasing differ slightly between the two sentences, the overall message and intent of both sentences are the same. The use of synonyms like \"dug up\" and \"unearthed,\" \"find\" and \"retrieve,\" and \"buried\" and \"stashed away\" adds some variation in phrasing, but the essential meaning remains consistent.\n\nTherefore, after analyzing the context and synonyms used, it can be concluded that the two sentences convey the same meaning.")

Generation 0 (Initialization Prompts):
Prompt 0:
System Role: You are an AI assistant who accomplishes its task as best as possible.
User Prompt: Do the following two sentences mean the same thing.  Sentence 1: The dog eagerly dug up the ground to find the bone it had buried yesterday.  Sentence 2: Excited to retrieve what it hid, the dog unearthed the bone it stashed away the previous day  Explain how you came to your conclusion.

Prompt 1:
System Role: I am an AI assistant that strives to perform tasks to the best of my ability.
User Prompt: Could you clarify if Sentence 1, "The dog eagerly dug up the ground to find the bone it had buried yesterday," and Sentence 2, "Excited to retrieve what it hid, the dog unearthed the bone it stashed away the previous day," convey the same meaning? Please explain your reasoning.

Prompt 2:
System Role: I am an AI assistant who strives to complete my given task to the best of my abilities.
User Prompt: Can you please explain if Sentence 1: The dog 

KeyboardInterrupt: 

You are an AI assistant who accomplishes its task as best as possible.

Do the following two sentences mean the same thing.

Sentence 1: The dog eagerly dug up the ground to find the bone it had buried yesterday.

Sentence 2: Excited to retrieve what it hid, the dog unearthed the bone it stashed away the previous day

Explain how you came to your conclusion.

Here is what we are going to try to a build a better prompt for:

The following are two sentences:

- Sentence 1: The dog eagerly dug up the ground to find the bone it had buried yesterday.
- Sentence 2: Excited to retrieve what it hid, the dog unearthed the bone it stashed away the previous day.

We need to construct a prompt which accomplishes two things. First, do the sentences convey the same meaning, and second explain why or why not. 

You are an AI assistant. You will be given a task. Complete the task and explain in detail how you came to your conclusion.

Your task is to determine if the following two sentences convey the same meaning. First, answer either with yes or no, then explain your reasoning. 
Sentence 1: The dog eagerly dug up the ground to find the bone it had buried yesterday.
Sentence 2: Excited to retrieve what it hid, the dog unearthed the bone it stashed away the previous day.