<a href="https://colab.research.google.com/github/mshumer/gpt-prompt-engineer/blob/main/gpt_prompt_engineer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Auto Prompt Engineering 
Original Notebook By Matt Shumer (https://twitter.com/mattshumer_)

Original Github repo: https://github.com/mshumer/gpt-prompt-engineer

In this notebook, we will simulate how to generate an optimal prompt/instruction for a given task.

## Installations

In [None]:
!pip install -r ../requirements.txt -qq

## Import Libraries

In [None]:
from datasets import load_dataset, Dataset
import pandas as pd
from ast import literal_eval
import sys
from pathlib import Path
import os

In [None]:
path_to_src_code = '../'
sys.path.append(path_to_src_code)

In [None]:
from src.utils.functions import *
from src.run_autoprompt import run_autoprompts

## Import Example Dataset

We will load an example dataset from the datasets library that includes a question, ground truth, answer, and contexts. Make sure that ground truths is a list of contexts that are retrieved for the specific question. Press "Yes" when prompted to run the custom code. 

In [None]:
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
data = amnesty_qa['eval']

In [None]:
data.to_pandas().head()

### Ragas

This section goes over how to call the Ragas library on the input dataset. This comes directly from the ragas library: https://github.com/explodinggradients/ragas

In [None]:
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness, context_precision, answer_similarity
from langchain_community.chat_models import BedrockChat
from langchain_aws import ChatBedrock
from botocore.client import Config
from langchain_community.embeddings import BedrockEmbeddings

First, initiate Bedrock Chat and Bedrock Embeddings (the large language models used to generate evaluation metrics from Ragas). 

In [None]:
config = {
    "region_name": "us-east-1",  # E.g. "us-east-1"
    "model_id": 'anthropic.claude-3-haiku-20240307-v1:0',  # E.g "anthropic.claude-v2"
    "model_kwargs": {'max_tokens':1000, 'temperature': 0.4,
                                'stop_sequences': ['Question']},
}
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 2})
bedrock_client = boto3.client('bedrock-runtime')

bedrock_model = ChatBedrock(region_name=config['region_name'], model_id=config['model_id'],
                            client=bedrock_client, model_kwargs=config['model_kwargs'])
# init the embeddings
bedrock_embeddings = BedrockEmbeddings(
    region_name=config["region_name"],
)

In this example, we will generate faithfulness, answer correctness, context precision, and answer similarity metrics. Some metrics will fail to parse output but that is ok, this is just to showcase an example of running Ragas on a sample dataset. Experiment with different versions of Bedrock to get more optimal outputs if necessary. 

In [None]:
score = evaluate(data,metrics=[faithfulness,answer_correctness, context_precision, answer_similarity], llm=bedrock_model,
    embeddings=bedrock_embeddings);

In [None]:
score.to_pandas().head()

In [None]:
score.to_pandas().to_csv("../data/ragas_output.csv", index=False)

## Automatic Prompt Generation

Here, automatic prompt generation is done using specific templates (available in instruction_generation_templates folder). Feel free to add more txt files to the list below based on the name of the templates in that folder. 

In [None]:
files = [
    "one-paragraph_instruction_with_demo.txt", 
    "one-sentence_instruction_with_demo.txt", 
    "step-by-step_instruction_with_demo.txt"
]

Run the following command to obtain generated instructions. For a sample dataset of size 20 and 10 candidate instructions generated for each sample (in ../src/utils/config.py NUMBER_OF_PROMPTS=5), it takes about 3 minutes to run. Modify the NUMBER_OF_PROMPTS accordingly. 

In [None]:
%%time
# use the csv file uploaded from Ragas as the input to autoprompt. 
run_autoprompts(True, files, path_to_ragas_outputs="../data/ragas_output.csv");

The output to running the autoprompt function above is stored in the data folder of this repository as 'new_prompts.csv'. In addition, a list of prompts are outputted in the data folder as 'prompt_id.csv'. Below is the output of the best three instructions for each question in the dataset. 

In [None]:
new_prompts = pd.read_csv('../data/new_prompts.csv')

In [None]:
new_prompts.head()

Below are the prompts/instructions mapped to ids. 

In [None]:
prompt_ids = pd.read_csv('../data/prompt_ids.csv')

In [None]:
prompt_ids.head()