# Lab 5 Description

**Note this notebook should be run in Google Colab**

Lab 5 will give you practice with prompt engineering to solicit an appropriate response from large language models.  


You will use OpenAI's API to answer some questions.  For each question, show code and prompt used to extract the correct response from the indicated OpenAI model.  Use the provided example code below to guide you.

Note: This lab requires you to iteratively refine your prompts so that the total cost of each API call is mimimized.  (ie use the fewest number of tokens in your questioning AND have the model use the fewest tokens in its response for all questions below)

1. Show your prompts and code used to answer the following questions with OpenAI using the 'gpt-3.5-turbo-0125' model.  
  1. What were the top 5 most impactful discoveries in phyics ?
  1. Explain step-by-step how to integrate f(x)=x^2 + x
  1. Show python code to calculate the first 20 digits of Pi, ensure the code is efficient
  1. Calculate the cost of your API calls in the above three prompts
1. Now use the 'gpt-4-1106-preview' to answer the same three questions above
  1. What were the top 5 most impactful discoveries in phyics ?
  1. Explain step-by-step how to integrate f(x)=x^2 + x
  1. Show python code to calculate the first 20 digits of Pi, ensure the code is efficient
  1. Calculate the cost of your API calls in the above three prompts
1. Describe differences in performance between the two models in questions 1 and 2 above
1. Select an OpenAI image generation model of your choice and show your prompt and code to generate the below images.  Expect to iterate through different prompts to before you get to the ideal result.  As you iterate, download and display the created image along with the prompt used in this notebook.  You will need to download the generated image locally and then upload into your Colab instance and use your favourite image display tool
  1. An image of cars in a race
  1. An image of a character who is happily assembling a chair using a hammer and nail
1. Compare how easy or hard it was to get the model to generate the intended image
1. Calculate the total cost of your API calls for image generation
1. Comment on what differences you see in cost to generate responses for Q1, Q2, Q4

Note bonus marks for those who had the lowest cost in each question (3 bonuses to be given out total)

===============

# Setting up with OpenAI
1. Create an [OpenAI account](https://platform.openai.com/signup)
1. Sign up for Open API pay-as-you-go (ChatGPT Plus monthly subscription is optional, not needed for course)
1. [Create OpenAI API Key](https://platform.openai.com/account/api-keys)
1. copy the key and save to text file `openai.txt`
1. Go to Google Drive and create `API_Keys` folder under `My Drive/Colab Notebooks`
1. Upload file `openai.txt` to `My Drive/Colab Notebooks/API_Keys`


# Large Language Model Concepts

1. Costs of using GenAI tools
  1. Computation Costs - building the model
  1. Computation Costs - API calls to a model
1. Estimating cost of API calls
  1. [OpenAI pricing details](https://openai.com/pricing)
  1. [Tokens](https://platform.openai.com/tokenizer)


# References

[Get up and running with the OpenAI API](https://platform.openai.com/docs/quickstart?context=python)

[Open AI Costs and Best Practices](https://platform.openai.com/docs/guides/production-best-practices)

[Interactive API Playground](https://platform.openai.com/playground?mode=chat)

# Install libraries

In [None]:
try:
  import openai
  import tiktoken
except ModuleNotFoundError:
  !pip install openai # install openAI library to the notebook instance
  !pip install tiktoken # install tokenizer library
finally:
  import openai
  import tiktoken

Collecting openai
  Downloading openai-1.13.3-py3-none-any.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.4/227.4 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.4-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.4 ht

# Load Libraries

In [None]:
import matplotlib.pyplot as plt

import openai
import tiktoken


import pandas as pd
import os
from PIL import Image

import json
import requests

import seaborn as sns
import numpy as np
import re
from scipy.io.arff import loadarff

os.getcwd()

'/content'

# Setup Google Drive access to API keys

In [None]:
import google.colab.auth
google.colab.auth.authenticate_user()

In [None]:
#@markdown connect to drive for API keys stored in <br> `My Drive/Colab Notebooks/API_Keys`
import os, sys
from google.colab import drive
drive.mount('/content/mnt')
nb_path = '/content/notebooks'
os.symlink('/content/mnt/My Drive/Colab Notebooks', nb_path)
sys.path.insert(0, nb_path)  # or append(nb_path)


Mounted at /content/mnt


In [None]:
!ls -al mnt
# !ls -al notebooks/API_Keys/openai.txt

total 20
dr-x------  2 root root 4096 Mar  5 21:39 .file-revisions-by-id
drwx------ 22 root root 4096 Mar  5 21:39 MyDrive
drwx------  2 root root 4096 Mar  5 21:39 Othercomputers
dr-x------  2 root root 4096 Mar  5 21:39 .shortcut-targets-by-id
drwx------  5 root root 4096 Mar  5 21:39 .Trash-0


# Setup Colab and Kaggle API Keys

In [None]:
# this cell is for using Colab with Kaggle
# 1) create and save your kaggle API key as a json file on your Google Drive Colab Notebooks folder here:  `Colab Notebooks/API_Keys/kaggle.json`
# 2) mount google drive and copy API key file into ~/.kaggle/kaggle.json

# import os, sys
# from google.colab import drive
# drive.mount('/content/mnt')
# nb_path = '/content/notebooks'
# os.symlink('/content/mnt/My Drive/Colab Notebooks', nb_path)
# sys.path.insert(0, nb_path)  # or append(nb_path)

!mkdir ~/.kaggle
!cp /content/notebooks/API_Keys/kaggle.json ~/.kaggle/
!ls -al ~/.kaggle/

total 16
drwxr-xr-x 2 root root 4096 Mar  5 21:39 .
drwx------ 1 root root 4096 Mar  5 21:39 ..
-rw------- 1 root root   63 Mar  5 21:39 kaggle.json


# Setup OpenAPI key

In [None]:
#@markdown read api key file to: 1) set environment variable<br> 2) set api key for openai library
my_api_key_path = '/content/notebooks/API_Keys/openai.txt'
with open(my_api_key_path,'r') as f:
  openai_key = f.readline()

os.environ['OPENAI_API_KEY'] = openai_key

from openai import OpenAI
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],  # this is also the default, it can be omitted
)

In [None]:
#@markdown show list of models accessible with our API key
model_list = client.models.list()
print([m['id'] for m in model_list.dict()['data'] if 'gpt' in m['id']])


['gpt-3.5-turbo-16k-0613', 'gpt-4-vision-preview', 'gpt-3.5-turbo-0125', 'gpt-4-0613', 'gpt-4', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0613', 'gpt-3.5-turbo-0301', 'gpt-3.5-turbo-instruct-0914', 'gpt-4-1106-preview', 'gpt-3.5-turbo-instruct', 'gpt-4-turbo-preview', 'gpt-3.5-turbo-1106', 'gpt-3.5-turbo-16k', 'gpt-4-0125-preview']


# Q1

In [29]:
# your code here

# Q1

In [None]:
# your code here

# Q2

In [None]:
# your code here

# Q3

In [None]:
# your code here

# Q4

In [None]:
# your code here

# Reference materials below
[Chat Completion API Reference](https://platform.openai.com/docs/api-reference/chat)

In [None]:
completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
  ]
)

print(completion.choices[0].message)

{
  "role": "assistant",
  "content": "In the realms of code, where logic unfurls,\nThere lies a concept that captures all pearls.\nRecursion, dear friend, let me be your guide,\nTo unravel the beauty you can't deny.\n\nImagine a function, calling itself anew,\nLike an echo in time, it rings through and through.\nA task at hand, its duty to fulfill,\nBut within its heart, a secret, oh so still.\n\nWith every call, a problem's broken down,\nInto smaller parts, like a puzzle, unbound.\nBound by a condition, a base case, you see,\nTo end the cycle, unlocking destiny.\n\nA function, like a mirror, reflects its own face,\nAs it spirals deeper, in infinite space.\nThrough layers and layers, it travels with grace,\nUntil the base case reveals its true place.\n\nBut remember, my friend, it's a dance to take care,\nFor without a base, the cycle ensnares.\nInto the depths of chaos, it will persist,\nWith no end in sight, it will cease to assist.\n\nYet when used with wisdom, in a programmer's ha

In [None]:
completion['usage']['prompt_tokens'], completion['usage']['completion_tokens'], completion['usage']['total_tokens']

(39, 337, 376)

# Tokenization

Human readable text is made usable for LLMs by first transforming it into tokens

OpenAI uses the open-source `tiktoken` tokenizer

In [None]:
def tokenize_text(text, use_encoding="cl100k_base", show_tokenize_info=False):

  encoding = tiktoken.get_encoding(use_encoding)
  tokenized_text = encoding.encode(text)
  if show_tokenize_info:
    text_tokens = [ enc.decode_single_token_bytes(token).decode('utf-8') for token in tokenized_text]
    print(f"the text: {text}\n is represented by {len(tokenized_text)} tokens: {text_tokens}")
  return tokenized_text

In [None]:

enc = tiktoken.get_encoding("cl100k_base") # choose tokenizer to use (cl100k_base, p50k_base, r50k_base)
assert enc.decode(enc.encode("hello world")) == "hello world"

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-4")

In [None]:
text_to_llm = "ecommerce platforms make shopping easier"

tokenized_text = tokenize_text(text_to_llm, show_tokenize_info=True)


the text: ecommerce platforms make shopping easier
 is represented by 6 tokens: ['ecom', 'merce', ' platforms', ' make', ' shopping', ' easier']


## Example output token count from [Shop.ai](shop.ai)
Prompt:
```
recommend a database schema to support an ecommerce platform
```

Output:
```
To support an ecommerce platform, a common database schema includes the following tables:

Users: This table stores information about the users of the platform, such as their name, email, password, and other relevant details.

Products: This table contains information about the products available for sale, including attributes like product name, description, price, stock quantity, and any other relevant details.

Categories: This table stores the different categories or types of products available. Each product can be associated with one or more categories.

Orders: This table tracks the orders placed by users. It includes information such as the user who placed the order, the products ordered, quantity, total price, shipping address, and payment details.

Order Items: This table stores the individual items within an order. It includes information such as the order ID, product ID, quantity, and price at the time of purchase.

Reviews: This table allows users to leave reviews for products. It includes information such as the user who left the review, the product being reviewed, the rating, and the review text.

Cart: This table tracks the items added to a user's shopping cart before they proceed to checkout. It includes information such as the user who added the item, the product ID, quantity, and any other relevant details.

These are just some of the essential tables in an ecommerce database schema. Depending on the specific requirements of your ecommerce platform, you may need to add additional tables or modify the existing ones.
```

In [None]:
prompt = "recommend a database schema to support an ecommerce platform"
prompt_output = """
To support an ecommerce platform, a common database schema includes the following tables:

Users: This table stores information about the users of the platform, such as their name, email, password, and other relevant details.

Products: This table contains information about the products available for sale, including attributes like product name, description, price, stock quantity, and any other relevant details.

Categories: This table stores the different categories or types of products available. Each product can be associated with one or more categories.

Orders: This table tracks the orders placed by users. It includes information such as the user who placed the order, the products ordered, quantity, total price, shipping address, and payment details.

Order Items: This table stores the individual items within an order. It includes information such as the order ID, product ID, quantity, and price at the time of purchase.

Reviews: This table allows users to leave reviews for products. It includes information such as the user who left the review, the product being reviewed, the rating, and the review text.

Cart: This table tracks the items added to a user's shopping cart before they proceed to checkout. It includes information such as the user who added the item, the product ID, quantity, and any other relevant details.

These are just some of the essential tables in an ecommerce database schema. Depending on the specific requirements of your ecommerce platform, you may need to add additional tables or modify the existing ones.
"""

In [None]:
encoding = tiktoken.get_encoding("cl100k_base")
tokenized_text = encoding.encode(prompt_output)
print(f"the prompt output: {prompt_output}\n is represented by {len(tokenized_text)} tokens")


the prompt output: 
To support an ecommerce platform, a common database schema includes the following tables:

Users: This table stores information about the users of the platform, such as their name, email, password, and other relevant details.

Products: This table contains information about the products available for sale, including attributes like product name, description, price, stock quantity, and any other relevant details.

Categories: This table stores the different categories or types of products available. Each product can be associated with one or more categories.

Orders: This table tracks the orders placed by users. It includes information such as the user who placed the order, the products ordered, quantity, total price, shipping address, and payment details.

Order Items: This table stores the individual items within an order. It includes information such as the order ID, product ID, quantity, and price at the time of purchase.

Reviews: This table allows users to leav

In [None]:
encoding = tiktoken.get_encoding("cl100k_base")
tokenized_text = encoding.encode(prompt)
print(f"the prompt output: {prompt}\n is represented by {len(tokenized_text)} tokens")


the prompt output: recommend a database schema to support an ecommerce platform
 is represented by 9 tokens


In [None]:
print(prompt)
print(tokenized_text)

recommend a database schema to support an ecommerce platform
[67689, 264, 4729, 11036, 311, 1862, 459, 85243, 5452]


In [None]:
len(prompt), len(prompt_output)

(60, 1559)

In [None]:
input_text="ecommerce"
encoding = tiktoken.get_encoding("cl100k_base")
tokenized_text = encoding.encode(input_text)
print(f"the prompt output: {input_text}\n is represented by {len(tokenized_text)} tokens")


the prompt output: ecommerce
 is represented by 2 tokens


# Test Chat Completion API Request

[Chat Completion API Reference](https://platform.openai.com/docs/api-reference/chat)

In [None]:
completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
  ]
)

print(completion.choices[0].message)

{
  "role": "assistant",
  "content": "In the realms of code, where logic unfurls,\nThere lies a concept that captures all pearls.\nRecursion, dear friend, let me be your guide,\nTo unravel the beauty you can't deny.\n\nImagine a function, calling itself anew,\nLike an echo in time, it rings through and through.\nA task at hand, its duty to fulfill,\nBut within its heart, a secret, oh so still.\n\nWith every call, a problem's broken down,\nInto smaller parts, like a puzzle, unbound.\nBound by a condition, a base case, you see,\nTo end the cycle, unlocking destiny.\n\nA function, like a mirror, reflects its own face,\nAs it spirals deeper, in infinite space.\nThrough layers and layers, it travels with grace,\nUntil the base case reveals its true place.\n\nBut remember, my friend, it's a dance to take care,\nFor without a base, the cycle ensnares.\nInto the depths of chaos, it will persist,\nWith no end in sight, it will cease to assist.\n\nYet when used with wisdom, in a programmer's ha

In [None]:
completion['usage']['prompt_tokens'], completion['usage']['completion_tokens'], completion['usage']['total_tokens']

(39, 337, 376)

# Test Image Generation API Request

[Image Generation API Reference](https://platform.openai.com/docs/api-reference/images)

In [None]:
image_filename = 'generated_image.png'

In [None]:
response = openai.Image.create(
  prompt="A cute baby sea otter",
  n=2,
  size="1024x1024"
)
print(response)


{
  "created": 1698179865,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-hArYDQ8I4jiqkKUTPPHRimVz/user-BadAFzKOV0j53ennI8rm2gjA/img-lpNPbyDloKJb3CYERylyW2qw.png?st=2023-10-24T19%3A37%3A45Z&se=2023-10-24T21%3A37%3A45Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-10-24T17%3A26%3A17Z&ske=2023-10-25T17%3A26%3A17Z&sks=b&skv=2021-08-06&sig=JwJ3mbOlMWs5KTjNMIze2pofzwThDJTPr9Q4GuQLvOA%3D"
    },
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-hArYDQ8I4jiqkKUTPPHRimVz/user-BadAFzKOV0j53ennI8rm2gjA/img-wg0zOQLVtyLLoiqgsqxJnnn5.png?st=2023-10-24T19%3A37%3A44Z&se=2023-10-24T21%3A37%3A44Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-10-24T17%3A26%3A17Z&ske=2023-10-25T17%3A26%3A17Z&sks=b&skv=2021-08-06&sig=tt3adUNLO4f7rN

In [None]:
!curl "{response['data'][0]['url']}" --output {image_filename}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3074k  100 3074k    0     0  3531k      0 --:--:-- --:--:-- --:--:-- 3529k


# Test Embeddings API Request

Convert text to a vector representation

[Embeddings API Reference](https://platform.openai.com/docs/api-reference/embeddings)

In [None]:
response = openai.Embedding.create(
  model="text-embedding-ada-002",
  input="The food was delicious and the waiter..."
)
# print(response)

In [None]:
show_num_elements = 5
print(f"first {show_num_elements} elements of embedding {response['data'][0]['embedding'][:show_num_elements]}")
print(f"length of embedding {len(response['data'][0]['embedding'])}")


first 5 elements of embedding [0.002253931947052479, -0.00933318305760622, 0.015745779499411583, -0.007790350820869207, -0.004711035173386335]
length of embedding 1536


In [None]:
response['model'], response['usage']['prompt_tokens'], response['usage']['total_tokens']

('text-embedding-ada-002-v2', 8, 8)

In [None]:
input_prompt = "show me running shoes"
response = openai.Embedding.create(
  model="text-embedding-ada-002",
  input=input_prompt
)
embedding1 = response['data'][0]['embedding']

print(response['usage']['prompt_tokens'], response['usage']['total_tokens'], embedding1)

4 4 [-0.020932648330926895, -0.023470353335142136, -0.014101416803896427, -0.011886067688465118, -0.007194740232080221, 0.008635059930384159, -0.015335976146161556, -0.020658301189541817, -0.010006792843341827, -0.013827069662511349, 0.01838122494518757, 0.004060330335050821, 0.003648810088634491, -0.019163111224770546, 0.0089985691010952, -0.006354553624987602, 0.008772233501076698, -0.00536347646266222, 0.023854440078139305, -0.03056221455335617, 0.0012671385193243623, 0.0056241056881845, 0.0049691032618284225, -0.029053308069705963, 0.01262680348008871, -0.00814809463918209, 0.015143933705985546, -0.014348329044878483, -0.009938206523656845, -0.05478702113032341, 4.5867327571613714e-05, 0.006388847250491381, -0.017777660861611366, -0.03176933899521828, -0.0031909942626953125, -0.0039060101844370365, -0.00023876730119809508, 0.006388847250491381, 0.004464991390705109, -0.01537712849676609, 0.0046021644957363605, 0.01746216230094433, -0.004382687620818615, -0.011844915337860584, -0.00

In [None]:
input_prompt = "show me sneakers"
response = openai.Embedding.create(
  model="text-embedding-ada-002",
  input=input_prompt
)
embedding2 = response['data'][0]['embedding']

print(response['usage']['prompt_tokens'], response['usage']['total_tokens'], embedding2)

3 3 [-0.006860113702714443, -0.025054888799786568, -0.004522903822362423, -0.013043242506682873, 0.0009622856741771102, 0.0071373553946614265, -0.02527410350739956, 0.001403937698341906, -0.015938159078359604, -0.03605428338050842, 0.02341723069548607, -0.00016914548177737743, -0.00986463762819767, -0.015796314924955368, 0.01584789529442787, -0.017253443598747253, 0.026112275198101997, -0.0025354695972055197, 0.03497110679745674, -0.03267580643296242, -0.0022453332785516977, -0.014983932487666607, -0.0022098722402006388, -0.03058682382106781, 0.00821408350020647, -0.014971037395298481, 0.004755013156682253, -0.01767897792160511, 0.0011814997997134924, -0.050264518707990646, 0.010096746496856213, 0.013062585145235062, -0.01940690167248249, -0.038865379989147186, -0.013023899868130684, -0.006334644742310047, -0.004961332306265831, 0.001576407696120441, -0.008562247268855572, -0.010328855365514755, 0.00872988160699606, 0.010502937249839306, -0.0044261920265853405, -0.004045790992677212, 0