In [1]:
import os
import openai
import pandas as pd
import random
from pprint import pprint

In [2]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [3]:
items = pd.read_csv('../data/item_list.txt',sep="\t")

The item list is a list of everything that The Dram Shop has had on tap since inception. Run the next cell a few times to get a sense for the chaos of their naming conventions.

In [4]:
items.sample(10)

Unnamed: 0,Item
1840,16 Pineapple Sculpin IPA - Ballast Point
3081,AG Rosé Cider - Anthem
8877,"25G (12oz pour) Ginger, Lemon & Hibiscus Hard ..."
4254,23P Hoegaarden - Whitbier
7774,10P Amora IPA - Überbrew
3828,Cirasa Nero d'Avola
6353,F13P Fresh Squeezed IPA - Deschutes
10858,13C Grazing Clouds Hazy IPA - Mountains Walking
2917,25S Sour Brown - Teton
9716,09M Last Best Pale Ale - Blackfoot


In [5]:
items.iloc[531]

Item    16 Go To Session IPA - Stone
Name: 531, dtype: object

## Few Shot Learning

In "few shot" learning, we give the AI some examples to follow. The idea is to "seed" the model with some examples it can learn from. The performance improvements can be large, as you can see in this [paper](https://cameronrwolfe.substack.com/p/practical-prompt-engineering-part#%C2%A7few-shot-learning). In the code below, I'd like you to modify the (not-great) examples I'm giving you to incorporate what you did in the zero-shot learning exercise. 

In [6]:
# Incorporate the prompt engineering you did in Zero Shot Learning.ipynb into 
# the below prompts. Note that I'm making use of the fact that 
# python will glue together strings placed within parentheses.
 
system_prompt = """You are a world-class Cicerone, trained in beers throughout the world."""

user_prompt_stub = (
  "I have access to the sales data from a growler fill station. Our data is messy. "
  "This item description contains the name of a beer, but "
  "it also contains other information like the tap number, "
  "the brewery, and maybe other things. I want your help "
  "extracting the beer names.\n\n"
  "I'm going to give you eight examples of what I'm looking for. "
  "After these examples, I'll give you an item description. Return "
  "a line with the same formatting as the examples. The first column "
  "is the raw item, the second is the cleaned beer name.\n\n"
)


We'll look at this prompt in two ways. `pprint` will preserve the returns so it's
easier to read. The `print` option will just show these as long strings, which is closer to
how you'd send it in to ChatGPT.

In [7]:
pprint(user_prompt_stub)

('I have access to the sales data from a growler fill station. Our data is '
 'messy. This item description contains the name of a beer, but it also '
 'contains other information like the tap number, the brewery, and maybe other '
 'things. I want your help extracting the beer names.\n'
 '\n'
 "I'm going to give you eight examples of what I'm looking for. After these "
 "examples, I'll give you an item description. Return a line with the same "
 'formatting as the examples. The first column is the raw item, the second is '
 'the cleaned beer name.\n'
 '\n')


In [8]:
print(user_prompt_stub)

I have access to the sales data from a growler fill station. Our data is messy. This item description contains the name of a beer, but it also contains other information like the tap number, the brewery, and maybe other things. I want your help extracting the beer names.

I'm going to give you eight examples of what I'm looking for. After these examples, I'll give you an item description. Return a line with the same formatting as the examples. The first column is the raw item, the second is the cleaned beer name.




Now we'll make some training examples. Run the below cell until you get at least 10 items where you can figure out how to extract the beer name. 

In [9]:
items.sample(10)

Unnamed: 0,Item
751,36 Rickshaw Pinot Noir
3212,7M Superfuzz - Elysian
8326,19P Hard Wired NITRO Cof. Porter - Left hand
8565,25M Twisted Karma - Mountains Walking
5276,02 Fonsainte Gris de Gris Magnum
1091,9 Punkuccino - Elysian
25,29 Hops - Anthem
8768,Jauma - Danby Grenache - 2020
4210,Oyster River Dry Cider
3054,14S Dark Theory - Odell


In [10]:
random.seed(20231020) # set the seed so we get the same results every time
  # Feel free to modify seed to get an interesting list of items
num_items = 10
random_items = items.sample(num_items)['Item'].tolist()
random_items

['11P Katabatic Airwaves and Evergreens IPA',
 'Les Lunes Astral Blend',
 'Meinklang - Burgenland Osterreich Red 2019',
 '01 STEIN!',
 '21P Oops! All Beer - Gild',
 '01G Odell Oktoberfest',
 '23S Not The Stoic - Deschutes',
 '8M Nelson Pale - Butte Brewing',
 '06P Yard Sale Amber - Tamarackw',
 '20 Stone Bitter Chocolate STout']

Now, I'm going to set up a place for you to put your training data. Once you've filled this in, you'll be able to make the few-shot learner. 

In [None]:
raw_items = random_items # store these so we can refer to them. 

clean_items = ["",
"",
"",
"",
"",
"",
"",
"",
"",
""]



# fill in the above list with your cleaned items. 
# For instance, if the item was '17G 2 x Thor Double IPA - Melvin', you 
# might put '2 x Thor Double IPA' in the list. Feel free to shrink raw_items
# so that it only has beer in it.
 

Now we're ready for our examples. We'll add them on to `user_prompt_stub`.

In [None]:
for idx, item in enumerate(raw_items):

  user_prompt_stub += f"| {item} | {clean_items[idx]} |\n"


user_prompt_stub += "Here is the new item: "

In [None]:
pprint(user_prompt_stub)

Now we're ready to append the new random items and test the efficacy of this method. 

In [None]:
num_items = 10

random_items = items.sample(num_items)

total_tokens = 0 

for item in random_items.itertuples():
    this_item = item[1]
    user_prompt = user_prompt_stub + f" |{this_item}|"

    chat_response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0,
        max_tokens=50,
    )

    print("-"*30)
    print(f"The item was {this_item}.")
    print("----- The full response is on the next line. -----")
    print(chat_response.choices[0].message.content)

    total_tokens += chat_response.usage["total_tokens"]


print(f"\n\nThis cost ${total_tokens/1000*0.0015:.4f}.")
print(f"If you did this 1000 times it'd be ${total_tokens*0.0015:.2f}")

Again, fine tune this a bit, then run 50 items through it and see how it does. Note, you can keep your training samples exactly the same or add in additional samples to the prompt. 