# Prompt Testing

This notebook serves to showcase my work while trying to find the best prompt to generate the content in the [`gen_course.ipynb`](gen_course.ipynb) file. The process I use is an iterative one, starting with copying the prompts of the first part of the pipeline which can be found in [Fetching Data](../FetchingData/) repository.

## Techniques

The techniques I want to try out in the beginning are the following:

- Prompt Chaining
- Meta Prompting
- Tree of thought prompting
- Chain of though prompting

I'm thinking of combining multiple techniques to help me be more efficient. This is especially the case for **Meta Prompting**, as it will help me generate a lot of prompts in a short amount of times. All the generated and manually created prompts can be found in the [prompt](./prompts) folder. They will be paired with their inputs and results.

In [1]:
# Dependencies of the pipeline
import os
from openai import OpenAI
from github import Github
from github import Auth
from dotenv import load_dotenv
from pydantic import BaseModel
import pandas as pd

github_token = os.getenv('GITHUB_TOKEN')
github_obj = Github(github_token)
client = OpenAI()

theory_lesson_prompts = []

In [2]:
def viewPromptsAsText(df, file_name):
    """
    This functions generates a .txt file to properly visualize the text and tasks created
    by the prompt. Its input is an entire pandas dataframe. The output in the .txt file 
    will be formatted as followed:
    1. Input
    2. Prompt
    3. Ouptut of Prompt with each task displayed sequentially
    """

    file_path = os.path.join("prompts", file_name)
    with open(file_path, "w", encoding="utf-8") as f:
        for i, row in df.iterrows():
            input_txt = "INPUT\n\n" + f"{row[1]}\n\n"
            prompt_txt = "PROMPT\n\n" + f"{row[2]}\n\n"
            output_title = "OUTPUT\n\n"
            end_txt = "----------------\n\n"

            setup_txt = f"{input_txt}" + f"{prompt_txt}" + f"{output_title}"
            f.write(setup_txt)
            for j in range(3,len(row)):
                task_txt = f"{row[j]}\n\n"
                f.write(task_txt)
            
            f.write(end_txt)

## Prompts for Theory Lessons

In this section, I will showcase my work for creating and selecting a good prompt to generate the content for the lessons of type **theory** in the pipeline. The output of the prompts will be the of the Lesson type.

In [6]:
class TheoryTaskContent(BaseModel):
    title: str
    description: str
    additional_readings: list[str]

class TheoryLessonContent(BaseModel):
    title: str
    tasks: list[TheoryTaskContent]

### Copy Pasted from The First Half of Pipeline

This prompt has been created by following the same style of prompts which were used in the first half of the pipeline. An issue that has occured is that only the first task has a title, while the rest of them don't have any.

In [13]:
input_data = {
    "issue": "https://github.com/mattermost/mattermost/issues/29338",
    "pr": "https://github.com/mattermost/mattermost/pull/29341"
}
prompt = """You are an instructor creating programming exercises from closed GitHub issues.

            Input:
            $ISSUE, a github url linking to the GitHub Issue
            $PR, a link to the pull request solving the said issue.

            Output: A lesson teaching the student the necessary theoretical background to solve the issue.
            Each task of the lesson must be consice and relevant.
            Each task should only explain ONE concept.
            For each task, provide extra reading for the student in form of links.
            Make sure that the links work.
            Each task should NOT divulge any concrete solutions to solve the issue.
            The format of the output should be as follows:
            {
                "title": the title of the lesson
                "tasks:  a list of each task
            }
            The elements of the "tasks" field should be in the following format:
            {
                "title": the title of the task
                "description": the content of the task, explaining the theoretical concept
                "additional_readings": the list of urls of the additional readings as a list of strings
            }

            You will now be given a pair of ($ISSUE, $PR). Generate the output following the instructions as closely as possible.
"""
input = "$ISSUE = " + f"{input_data["issue"]}" + ", $PR = " + f"{input_data["pr"]}"

output = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    response_format=TheoryLessonContent,
    messages = [{"role": "system", "content": prompt}, {"role": "user", "content": input}]
)

print(output.choices[0].message.parsed)

ParsedChatCompletion[TheoryLessonContent](id='chatcmpl-BsAC9LTCPvuLepv6o7L4zVGDXHpx1', choices=[ParsedChoice[TheoryLessonContent](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[TheoryLessonContent](content='{\n  "title": "Understanding API Pagination in Web Applications",\n  "tasks": [\n    {\n      "title": "What is API Pagination?",\n      "description": "API pagination is a technique used to divide a large dataset into smaller, manageable chunks, or \'pages\'. Each page contains a limited number of results, which helps in improving performance and user experience by reducing load times. When working with RESTful APIs, pagination is necessary to retrieve large datasets in multiple requests.",\n      "additional_readings": [\n        "https://www.restapitutorial.com/lessons/advanced.html#pagination",\n        "https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#pagination"\n      ]\n    },\n    {\n      "title": "Common Pagination Strategies",\n

In [40]:
response = output.choices[0].message.parsed

parsed_output = {"input": input, "prompt": prompt, "title": response.title}
index = 1
for task in response.tasks:
    task_key = f"task{index}"
    parsed_output[task_key] = task.description
    link_key = f"{task_key}_links"
    parsed_output[link_key] = task.additional_readings
    index += 1

theory_lesson_prompts.append(parsed_output)
theory_lesson_Df = pd.DataFrame(theory_lesson_prompts)
theory_lesson_Df.to_csv(path_or_buf="prompts/theory_prompts.csv")

In [38]:
theory_lesson_Df.head

<bound method NDFrame.head of                                                input  \
0  $ISSUE = \n$https://github.com/mattermost/matt...   
1  $ISSUE = \n$https://github.com/mattermost/matt...   
2  $ISSUE = \n$https://github.com/mattermost/matt...   

                                              prompt  \
0  You are an instructor creating programming exe...   
1  You are an instructor creating programming exe...   
2  You are an instructor creating programming exe...   

                                              title  \
0  Understanding API Pagination in Web Applications   
1  Understanding API Pagination in Web Applications   
2  Understanding API Pagination in Web Applications   

                             What is API Pagination?  \
0  API pagination is a technique used to divide a...   
1                                                NaN   
2                                                NaN   

                                         task1_links  \
0  [https://www.re

### Prompt Chaining

In this part, I'll try to chain different prompts together to have a more fine-tuned and correct output. This method could be extended to accept user input as well, making the course generation more responsive to the instructor's wishes.

### Meta Prompting

In this part, I will give AI the role of creating and reviewing the prompts.

## Prompts for the Introduction Lessons

## Prompts for the Debrief Lesson