In [None]:
import os
import openai
import pandas as pd
from pprint import pprint

In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [None]:
items = pd.read_csv('../data/item_list.txt',sep="\t")

The item list is a list of everything that The Dram Shop has had on tap since inception. Run the next cell a few times to get a sense for the chaos of their naming conventions.

In [None]:
items.sample(10)

In [None]:
items.iloc[531]

## Few Shot Learning

In "few shot" learning, we give the AI some examples to follow. The idea is to "seed" the model with some examples it can learn from. The performance improvements can be large, as you can see in this [paper](https://cameronrwolfe.substack.com/p/practical-prompt-engineering-part#%C2%A7few-shot-learning). In the code below, I'd like you to modify the (not-great) examples I'm giving you to incorporate what you did in the zero-shot learning exercise. 

In [None]:
# Incorporate the prompt engineering you did in Zero Shot Learning.ipynb into 
# the below prompts. Note that I'm making use of the fact that 
# python will glue together strings placed within parentheses.
 
system_prompt = """You are a world-class Cicerone, trained in beers throughout the world."""

user_prompt_stub = (
  "I have access to the sales data from a growler fill station. Our data is messy. "
  "This item description contains the name of a beer, but "
  "it also contains other information like the tap number, "
  "the brewery, and maybe other things. I want your help "
  "extracting the beer names.\n\n"
  "I'm going to give you eight examples of what I'm looking for. "
  "After these examples, I'll give you an item description. Return "
  "a line with the same formatting as the examples. The first column "
  "is the raw item, the second is the cleaned beer name.\n\n"
)


We'll look at this prompt in two ways. `pprint` will preserve the returns so it's
easier to read. The `print` option will just show these as long strings, which is closer to
how you'd send it in to ChatGPT.

In [None]:
pprint(user_prompt_stub)

In [None]:
print(user_prompt_stub)

Now we'll make some training examples. Run the below cell until you get at least 10 items where you can figure out how to extract the beer name. 

In [None]:
items.sample(10)

In [None]:
num_items = 10

random_items = items.sample(num_items)['Item'].tolist()

random_items

Now, I'm going to set up a place for you to put your training data. Once you've filled this in, you'll be able to make the few-shot learner. 

In [None]:
raw_items = random_items # store these so we can refer to them. 

clean_items = [""] *len(raw_items)

# fill in the above list with your cleaned items. 
# For instance, if the item was '17G 2 x Thor Double IPA - Melvin', you 
# might put '2 x Thor Double IPA' in the list. Feel free to shrink raw_items
# so that it only has beer in it.
 

Now we're ready for our examples. We'll add them on to `user_prompt_stub`.

In [None]:
for idx, item in enumerate(raw_items):

  user_prompt_stub += f"| {item} | {clean_items[idx]} |\n"


user_prompt_stub += "Here is the new item: "

In [None]:
pprint(user_prompt_stub)

Now we're ready to append the new random items and test the efficacy of this method. 

In [None]:
num_items = 10

random_items = items.sample(num_items)

total_tokens = 0 

for item in random_items.itertuples():
    this_item = item[1]
    user_prompt = user_prompt_stub + f" |{this_item}|"

    chat_response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0,
        max_tokens=50,
    )

    print("-"*30)
    print(f"The item was {this_item}.")
    print("----- The full is on the next line. -----")
    print(chat_response.choices[0].message.content)

    total_tokens += chat_response.usage["total_tokens"]


print(f"\n\nThis cost ${total_tokens/1000*0.0015:.4f}.")
print(f"If you did this 1000 times it'd be ${total_tokens*0.0015:.2f}")