<a href="https://colab.research.google.com/github/cobusgreyling/AutomaticPromptEngineering/blob/main/simple_ape_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#@title Install Dependencies
! pip install git+https://github.com/keirp/automatic_prompt_engineer

Collecting git+https://github.com/keirp/automatic_prompt_engineer
  Cloning https://github.com/keirp/automatic_prompt_engineer to /tmp/pip-req-build-w2fr9h58
  Running command git clone --filter=blob:none --quiet https://github.com/keirp/automatic_prompt_engineer /tmp/pip-req-build-w2fr9h58
  Resolved https://github.com/keirp/automatic_prompt_engineer to commit eac521c79a78965245ce7745dcc9f6b0792c7ec7
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [2]:
pip install openai



In [4]:
import openai
openai.api_key = 'xxxxxxxxxxxxxxxxx'

# Optimizing Prompts with **Automatic Prompt Engineer** (APE)

This notebook demonstrates how to use Automatic Prompt Engineer (APE) (arxiv link) to optimize prompts for text generation. In its simplest form, APE takes as input a dataset (a list of inputs and a list of outputs), a prompt template, and optimizes this prompt template so that it generates the outputs given the inputs.

APE accomplishes this in two steps. First, it uses a language model to generate a set of candidate prompts. Then, it uses a prompt evaluation function to evaluate the quality of each candidate prompt. Finally, it returns the prompt with the highest evaluation score.

In [5]:
# First, let's define a simple dataset consisting of words and their antonyms.
words = ["sane", "direct", "informally", "unpopular", "subtractive", "nonresidential",
    "inexact", "uptown", "incomparable", "powerful", "gaseous", "evenly", "formality",
    "deliberately", "off"]
antonyms = ["insane", "indirect", "formally", "popular", "additive", "residential",
    "exact", "downtown", "comparable", "powerless", "solid", "unevenly", "informality",
    "accidentally", "on"]

In [6]:
# Now, we need to define the format of the prompt that we are using.

eval_template = \
"""Instruction: [PROMPT]
Input: [INPUT]
Output: [OUTPUT]"""

In [7]:
# Now, let's use APE to find prompts that generate antonyms for each word.
from automatic_prompt_engineer import ape

result, demo_fn = ape.simple_ape(
    dataset=(words, antonyms),
    eval_template=eval_template,
)

Generating prompts...
[GPT_forward] Generating 50 completions, split into 1 batches of size 2000


100%|██████████| 1/1 [00:00<00:00,  1.25it/s]


Model returned 50 prompts. Deduplicating...
Deduplicated to 26 prompts.
Evaluating prompts...


Evaluating prompts: 100%|██████████| 20/20 [00:09<00:00,  2.00it/s]

Finished evaluating.





In [9]:
# Let's see the results.
print(result)

score: prompt
----------------
-0.24:  give the opposite of the word provided.
-0.25:  produce an antonym for each word provided.
-0.27:  produce an antonym (opposite) for each word given.
-0.28:  "change all adjectives to their antonyms."
-0.29:  produce an antonym for each word given.
-0.31:  produce an input-output pair in which the output is the opposite of the input.
-0.33:  use an online thesaurus to find a word with the opposite meaning.
-0.51:  produce the opposite of the input.
-0.63:  make a list of antonyms.
-0.85:  "find the opposite of each word."



Let's compare with a prompt written by a human:

"*Write an antonym to the following word.*"

In [10]:
from automatic_prompt_engineer import ape

manual_prompt = "Write an antonym to the following word."

human_result = ape.simple_eval(
    dataset=(words, antonyms),
    eval_template=eval_template,
    prompts=[manual_prompt],
)

In [11]:
print(human_result)

log(p): prompt
----------------
-0.24: Write an antonym to the following word.

