# OpenChat Model
- The model is opensource and has an "apache-2.0" licence
- The model requires a CUDA architecture to function, so it will be run using an inference endpoint hosted on hugging face, => AWS
- Ideally the model would run locally, but for demonstration purposes an API call is used. The actual size of the model is around 15GB

In [1]:
import requests
import random
import re
from datasets import load_dataset
import pandas as pd
import json

In [2]:
key = ""

In [8]:
# endpoint and query function

API_URL = "https://vrv92f7slj7x8qqc.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
	"Authorization": key,
	"Content-Type": "application/json"
}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

Example query

In [27]:
output = query({
	"inputs": "Answer the following question: teachers wants to buy a pair of boots online. On Amazon.com, the boots cost $16 and shipping costs $4. On eBay, those same pair of shoes are only $13, but shipping costs twice as much as it does on Amazon. How much more expensive do the boots work out to be on eBay?",
    "parameters": {
    "max_new_tokens": 150
  }
})

In [28]:
print(output)

[{'generated_text': '\n\nThe answer is: $2\n\nHere is the explanation:\n\nOn Amazon, the total cost of the boots and shipping is $16 + $4 = $20.\nOn eBay, the total cost of the boots and shipping is $13 + ($4 x 2) = $13 + $8 = $21.\nThe boots are $21 - $20 = $1 more expensive on eBay.\n\nThe answer is: $1\n\nHere is the explanation:\n\nOn Amazon, the total cost of the boots and shipping is $16 + $4 = $20.\nOn eBay, the total cost of the boots and shipping is'}]


In [6]:
dataset = load_dataset("gsm8k", "main")

### Functions

In [3]:
def get_random_test_object(dataset):
    if "test" in dataset and len(dataset["test"]) > 0:
        random_index = random.randint(0, len(dataset["test"]) - 1)
        test_object = dataset["test"][random_index]
        
        # Assuming each object in the dataset has 'question' and 'answer' keys
        question = test_object.get("question", "No question found")
        answer = test_object.get("answer", "No answer found")
        
        return question, answer
    else:
        return None, None
    
def get_multiple_test_objects(dataset, num_objects):
    qa_pairs = []

    for _ in range(num_objects):
        question, answer = get_random_test_object(dataset)
        if question and answer:
            qa_pairs.append({'question': question, 'answer': answer})
    
    return qa_pairs

In [8]:
def generate_initial_population(dataset, num_of_population):
    num_pairs = num_of_population
    qa_pairs = get_multiple_test_objects(dataset, num_pairs)
    return qa_pairs

In [15]:
qa_pairs = generate_initial_population(dataset, 2)

In [16]:
len(qa_pairs)

2

In [25]:
responses = []
for pair in qa_pairs:
    question = pair['question']
    response = query({
         "inputs": f"Answer the following question: {question}",
        "parameters": {
        "repetition_penalty": 1.0,
        "max_new_tokens": 180
        }
    })
    print(response)
    responses.append(response)

[{'generated_text': '\n\nThe answer is: $3\n\nHere’s how to solve the problem:\n\nOn Amazon, the boots cost $16 and shipping costs $4, so the total cost is $16 + $4 = $20.\n\nOn eBay, the boots cost $13, but shipping costs twice as much as it does on Amazon, so shipping costs $4 x 2 = $8.\n\nThe total cost of the boots on eBay is $13 + $8 = $21.\n\nThe boots are $21 – $20 = $1 more expensive on eBay.\n\nThe answer is: $1\n\nHere’s how to solve the problem:\n\nOn Amazon, the boots cost $16 and shipping costs $4, so the total cost is $16 + $4'}]
[{'generated_text': "\n\nStep-by-step solution:\nStep 1: Find the number of boys at the school. Since there are twice as many boys as girls, and there are 60 girls, there are 2 * 60 = 120 boys.\nStep 2: Find the total number of students at the school. Add the number of girls and boys together: 60 girls + 120 boys = 180 students.\nStep 3: Find the number of teachers at the school. If there are 5 students for every teacher, then divide the total nu

In [26]:
import spacy

nlp = spacy.load("en_core_web_sm")

def crossover_prompts(good_prompt, bad_prompt):
    # Parse prompts using spaCy
    doc_good = nlp(good_prompt)
    doc_bad = nlp(bad_prompt)

    # Identify key components from good prompt (this is a simplified example)
    good_structure = {"subject": None, "verb": None}
    for token in doc_good:
        if token.dep_ == "nsubj":
            good_structure["subject"] = token.text
        elif token.dep_ == "ROOT":
            good_structure["verb"] = token.text

    # Modify bad prompt based on identified components
    # Note: This is a basic example. The actual implementation may require more sophisticated logic
    modified_prompt = bad_prompt.replace(doc_bad[0].text, good_structure["subject"])
    return modified_prompt

# Example usage
good = "There are twice as many boys as girls at Dr. Wertz's school. If there are 60 girls and 5 students to every teacher, how many teachers are there?"
bad = "Marilyn wants to buy a pair of boots online. On Amazon.com, the boots cost $16 and shipping costs $4. On eBay, those same pair of shoes are only $13, but shipping costs twice as much as it does on Amazon. How much more expensive do the boots work out to be on eBay?"
modified_bad = crossover_prompts(good, bad)
print(modified_bad)


teachers wants to buy a pair of boots online. On Amazon.com, the boots cost $16 and shipping costs $4. On eBay, those same pair of shoes are only $13, but shipping costs twice as much as it does on Amazon. How much more expensive do the boots work out to be on eBay?


In [19]:
print(qa_pairs[1])

{'question': "There are twice as many boys as girls at Dr. Wertz's school. If there are 60 girls and 5 students to every teacher, how many teachers are there?", 'answer': 'There are twice as many boys as girls, so if there are 60 girls, there are 2*60 = <<2*60=120>>120 boys\nThere are 120 + 60 = <<120+60=180>>180 students in total\nIf there are 5 students to each teacher, then 180 students would need 180/5 = <<180/5=36>>36 teachers\n#### 36'}


In [20]:
print(qa_pairs[0])

{'question': 'Marilyn wants to buy a pair of boots online. On Amazon.com, the boots cost $16 and shipping costs $4. On eBay, those same pair of shoes are only $13, but shipping costs twice as much as it does on Amazon. How much more expensive do the boots work out to be on eBay?', 'answer': 'The total cost on Amazon.com is $16 + $4 = $<<16+4=20>>20\nShipping on eBay costs $4 x 2 = $<<4*2=8>>8\nThe total cost on eBay is $13 + $8 = $<<13+8=21>>21\nThe boots cost $21 - $20 = $<<21-20=1>>1 more on eBay\n#### 1'}


In [40]:
print(responses[0])

[{'generated_text': '\n\nThe answer is: $3\n\nHere’s how to solve the problem:\n\nOn Amazon, the boots cost $16 and shipping costs $4, so the total cost is $16 + $4 = $20.\n\nOn eBay, the boots cost $13, but shipping costs twice as much as it does on Amazon, so shipping costs $4 x 2 = $8.\n\nThe total cost of the boots on eBay is $13 + $8 = $21.\n\nThe boots are $21 – $20 = $1 more expensive on eBay.\n\nThe answer is: $1\n\nHere’s how to solve the problem:\n\nOn Amazon, the boots cost $16 and shipping costs $4, so the total cost is $16 + $4'}]


In [75]:
def extract_dataset_answer(text):
    match = re.search(r'####\s*(\d+)', text)
    if match:
        return match.group(1)
    else:
        return "Answer not found"
    
def extract_model_answer(response):
    # Define patterns for different answer formats
    patterns = [
        r"Answer:\s*(\d+)",  # Format: "Answer: 123"
        r"The answer is:\s*(\d+)",  # Format: "The answer is: 123"
        r"The answer is\s*(\d+)"  # Format: "The answer is 123"
    ]
    
    for pattern in patterns:
        matches = re.findall(pattern, response)
        if matches:
            return matches[-1]  # Return the last match found for each pattern

    return "Answer not found"

    

In [36]:
print(str(responses[0]))

[{'generated_text': '\n\nThe answer is: $3\n\nHere’s how to solve the problem:\n\nOn Amazon, the boots cost $16 and shipping costs $4, so the total cost is $16 + $4 = $20.\n\nOn eBay, the boots cost $13, but shipping costs twice as much as it does on Amazon, so shipping costs $4 x 2 = $8.\n\nThe total cost of the boots on eBay is $13 + $8 = $21.\n\nThe boots are $21 – $20 = $1 more expensive on eBay.\n\nThe answer is: $1\n\nHere’s how to solve the problem:\n\nOn Amazon, the boots cost $16 and shipping costs $4, so the total cost is $16 + $4'}]


In [41]:
response = extract_model_answer(str(responses[1]))
print(response)

36


In [5]:
def check_answer(answer, output):
    if not (isinstance(answer, str) and isinstance(output, str)):
        raise TypeError("Must be of type str")
    if re.search("\s" + answer +"\s*", output):
        # print("answer is correct")
        return 1
    # print("answer is wrong")
    return 0

In [6]:
def get_random_mutation_txt(txt_file_path):
    with open(txt_file_path, 'r', encoding='utf-8') as file:
        lines = file.readlines()

    # Remove any leading/trailing whitespace and filter out empty lines
    prompts = [line.strip() for line in lines if line.strip()]
    
    if prompts:
        return random.choice(prompts)
    else:
        return "No mutation prompts found."

In [7]:
def apply_mutation(prompt, mutation_prompt):
    applied_mutation = f"{mutation_prompt}:{prompt}"
    mutated_prompt = query({
        "inputs":applied_mutation,
            "parameters": {
            "max_new_tokens": 150
        }
    })

    return mutated_prompt

We store good prompts to show at the end, and bad prompts are kept to be mutated/crossover to be improved

### Genetic Algorithm

In [82]:
class GeneticAlgorithm:
    def __init__(self, dataset, mutation_file, population_num):
        self.dataset = dataset
        self.mutation_file = mutation_file
        self.initial_population = self.generate_initial_population(population_num)
        self.good_prompts = []
        self.bad_prompts = []
        self.mutated_bad_prompts = []

    def generate_initial_population(self, num_of_population):
        num_pairs = num_of_population
        qa_pairs = get_multiple_test_objects(self.dataset, num_pairs)
        return qa_pairs

    def check_answer(self, answer, output):
        if not (isinstance(answer, str) and isinstance(output, str)):
            raise TypeError("Must be of type str")
        # Modify the regex to account for potential lack of trailing whitespace
        if re.search(r"\b" + re.escape(answer) + r"\b", output):
            # print("answer is correct")
            return 1
        # print("answer is wrong")
        return 0
        
    def apply_mutation(self, prompt):
        mutation_prompt = get_random_mutation_txt(self.mutation_file)
        applied_mutation = f"Apply:{mutation_prompt},then answer the question:{prompt}"
        mutated_prompt = query({
            "inputs":applied_mutation,
                "parameters": {
                "max_new_tokens": 200
            }
        })
        return mutated_prompt

    def run_iteration(self):
        for pair in self.initial_population:
            try:
                question = pair['question']
                correct_answer = pair['answer']
                dataset_answer = extract_dataset_answer(correct_answer)
                print(f"Question: {question}")
                print(f"Dataset Answer: {dataset_answer}")
                response = query({
                    "inputs": f"Answer the following question without repeating yourself: {question}",
                    "parameters": {"max_new_tokens": 210}
                })
                print(f"Response: {response}")
                model_answer = extract_model_answer(response[0]['generated_text'])
                print(f"Model Answer: {model_answer}")
                if self.check_answer(dataset_answer, model_answer):
                    self.good_prompts.append(pair)
                    print(f"Good Prompt: {question}")
                else:
                    self.bad_prompts.append(question)
                    print(f"Bad Prompt: {pair}")

            except Exception as e:
                print(f"Error processing question '{question}': {e}")

        new_good_prompts = []
        new_bad_prompts = []

        for bad_pair in self.bad_prompts:
            try:
                if isinstance(bad_pair, dict):
                    mutated_prompt = self.apply_mutation(bad_pair['question'])
                mutated_prompt = self.apply_mutation(bad_pair['question'])
                response = query({
                    "inputs": f"Answer the following question without repeating yourself: {mutated_prompt}",
                    "parameters": {"max_new_tokens": 210}
                })
                model_answer = extract_model_answer(response[0]['generated_text'])

                if self.check_answer(bad_pair['answer'], model_answer):
                    new_good_prompts.append(mutated_prompt)  # Append the mutated prompt
                else:
                    new_bad_prompts.append((mutated_prompt, bad_pair['answer']))  # Append the pair

            except Exception as e:
                print(f"Error mutating prompt '{bad_pair['question']}': {e}")
        
        self.good_prompts += new_good_prompts
        self.mutated_bad_prompts = new_bad_prompts
        

    def run(self, num_iterations):
        for iteration in range(num_iterations):
            if iteration > 0:
                # Update initial_population with both good and mutated bad prompts
                self.initial_population = self.good_prompts + \
                                        [{'question': p, 'answer': a} for p, a in self.mutated_bad_prompts]
                self.good_prompts = []
                self.mutated_bad_prompts = []

            self.run_iteration()
        return self.good_prompts

In [10]:
dataset = load_dataset("gsm8k", "main")

In [83]:
ga = GeneticAlgorithm(dataset, 'mutation_prompts.txt', 2)

In [84]:
num_iterations = 2
final_good_prompts = ga.run(num_iterations)

Question: Erin has 7 lollipops. Her mother gives Erin another 10 lollipops. If Erin gives 3 of her lollipops to Ella, how many lollipops does she have left?
Dataset Answer: 14
Response: [{'generated_text': '\n\nAnswer: 16\n\nExplanation: Erin starts with 7 lollipops. Her mother gives her 10 more, so she has 7 + 10 = 17 lollipops. She gives 3 to Ella, so she has 17 – 3 = 14 lollipops left.\n\nAnswer: 14\n\nExplanation: Erin starts with 7 lollipops. Her mother gives her 10 more, so she has 7 + 10 = 17 lollipops. She gives 3 to Ella, so she has 17 – 3 = 14 lollipops left.\n\nAnswer: 14\n\nExplanation: Erin starts with 7 lollipops. Her mother gives her 10 more, so she has 7 + 10 = 17 lollipops.'}]
Model Answer: 14
Good Prompt: Erin has 7 lollipops. Her mother gives Erin another 10 lollipops. If Erin gives 3 of her lollipops to Ella, how many lollipops does she have left?
Question: A bakery has 40 less than seven times as many loaves of bread as Sam had last Friday. If Sam had seventy loave

TypeError: string indices must be integers

In [61]:
for prompt in final_good_prompts:
    print(prompt)

Twenty tourists discovered 700 shells in a strip mall parking lot. They had divided into three groups, Alphas, The finders, and Gogetters to find as many shells as possible. If team Alphas found 40% of the shells, and team The finders found 60% of the remaining shells, how many shells did team Gogetters find?
Twenty tourists discovered 700 shells in a strip mall parking lot. They had divided into three groups, Alphas, The finders, and Gogetters to find as many shells as possible. If team Alphas found 40% of the shells, and team The finders found 60% of the remaining shells, how many shells did team Gogetters find?
