<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="https://raw.githubusercontent.com/DataForScience/Networks/master/data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0" width=150px> </div>
    <div style="float: left; margin-left: 10px;"> <h1>Generative AI with OpenAI API</h1>
<h1>GPT Models</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter
from pprint import pprint
from datetime import datetime
import json

import pandas as pd
import numpy as np

import matplotlib
import matplotlib.pyplot as plt 

import openai
from openai import OpenAI

import termcolor
from termcolor import colored

import os
import gzip

import tqdm as tq
from tqdm.notebook import tqdm

import watermark

%load_ext watermark
%matplotlib inline

We start by printing out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.12.3

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 24.3.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 3a7a9a8b6856eb5855cd2ac76a384e203382ab54

numpy     : 1.26.4
matplotlib: 3.8.0
pandas    : 2.2.3
openai    : 1.30.5
tqdm      : 4.66.4
json      : 2.0.9
termcolor : 2.4.0
watermark : 2.4.3



Load default figure style

In [3]:
plt.style.use('d4sci.mplstyle')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

# Basic Usage

The first step is generate API key on the OpenAI website and store it as the "OPENAI_API_KEY" variable in your local environment. Without it we won't be able to do anything. You can find your API key in your using settings: https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key

Then we are ready to instantiate the client

In [4]:
client = OpenAI()

We start by getting a list of supported models.

In [5]:
model_list = json.loads(client.models.list().model_dump_json())["data"]

In total we have 49 models

In [7]:
len(model_list)

49

Along with some information about each model...

In [8]:
model_list[:5]

[{'id': 'dall-e-3',
  'created': 1698785189,
  'object': 'model',
  'owned_by': 'system'},
 {'id': 'dall-e-2',
  'created': 1698798177,
  'object': 'model',
  'owned_by': 'system'},
 {'id': 'o1-mini-2024-09-12',
  'created': 1725648979,
  'object': 'model',
  'owned_by': 'system'},
 {'id': 'gpt-4o-mini-realtime-preview-2024-12-17',
  'created': 1734112601,
  'object': 'model',
  'owned_by': 'system'},
 {'id': 'o1-preview-2024-09-12',
  'created': 1725648865,
  'object': 'model',
  'owned_by': 'system'}]

But let's just get a list of model names

In [9]:
print("\n".join(sorted([model["id"] for model in model_list])))

babbage-002
chatgpt-4o-latest
dall-e-2
dall-e-3
davinci-002
gpt-3.5-turbo
gpt-3.5-turbo-0125
gpt-3.5-turbo-1106
gpt-3.5-turbo-16k
gpt-3.5-turbo-16k-0613
gpt-3.5-turbo-instruct
gpt-3.5-turbo-instruct-0914
gpt-4
gpt-4-0125-preview
gpt-4-0613
gpt-4-1106-preview
gpt-4-turbo
gpt-4-turbo-2024-04-09
gpt-4-turbo-preview
gpt-4o
gpt-4o-2024-05-13
gpt-4o-2024-08-06
gpt-4o-2024-11-20
gpt-4o-audio-preview
gpt-4o-audio-preview-2024-10-01
gpt-4o-audio-preview-2024-12-17
gpt-4o-mini
gpt-4o-mini-2024-07-18
gpt-4o-mini-audio-preview
gpt-4o-mini-audio-preview-2024-12-17
gpt-4o-mini-realtime-preview
gpt-4o-mini-realtime-preview-2024-12-17
gpt-4o-realtime-preview
gpt-4o-realtime-preview-2024-10-01
gpt-4o-realtime-preview-2024-12-17
o1-mini
o1-mini-2024-09-12
o1-preview
o1-preview-2024-09-12
omni-moderation-2024-09-26
omni-moderation-latest
text-embedding-3-large
text-embedding-3-small
text-embedding-ada-002
tts-1
tts-1-1106
tts-1-hd
tts-1-hd-1106
whisper-1


## Basic Prompt

The recommended model for exploration is `gpt-3.5-turbo`, so we'll stick with it for now. The basic setup is relatively straightforward:

In [10]:
%%time
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
        {
            "role": "user", 
            "content": "What was Superman's weakness?"
        },
    ]
)

CPU times: user 11.9 ms, sys: 2.83 ms, total: 14.8 ms
Wall time: 925 ms


Which produces a response object

In [11]:
type(response)

openai.types.chat.chat_completion.ChatCompletion

Which we can treat as a named tuple

The model answer can be found in the "message" dictionary inside the "choices" list

In [12]:
response.choices[0]

Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Superman's weakness is Kryptonite, a mineral from his home planet Krypton that emits radiation harmful to Kryptonians like himself. Exposure to Kryptonite weakens Superman and can even be fatal if he is exposed to a large amount of it.", role='assistant', function_call=None, tool_calls=None, refusal=None))

In [13]:
response.choices[0].message.content

"Superman's weakness is Kryptonite, a mineral from his home planet Krypton that emits radiation harmful to Kryptonians like himself. Exposure to Kryptonite weakens Superman and can even be fatal if he is exposed to a large amount of it."

To request multiple answers, we must include the `n` parameter with the number of answers we want

In [14]:
%%time
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "user", 
            "content": "What are the different kinds of Kryptonite?"
        },
    ],
    n=3
)

CPU times: user 9.11 ms, sys: 2.54 ms, total: 11.6 ms
Wall time: 4.32 s


And we can access each of the answers individually int he choices list

In [15]:
for output in response.choices:
    print("==========")
    print(output.message.role.title()) 
    print("==========")
    print(output.message.content)
    print("==========\n")

Assistant
There are several different types of Kryptonite in DC Comics, each with unique effects on Superman and other Kryptonians:

1. Green Kryptonite: The most common form of Kryptonite, green Kryptonite is deadly to Kryptonians and weakens them upon exposure. Prolonged exposure can be fatal.

2. Red Kryptonite: Red Kryptonite has unpredictable effects on Kryptonians, causing temporary changes in their personalities, powers, or physical appearance.

3. Blue Kryptonite: Blue Kryptonite has similar effects to green Kryptonite but is specifically harmful to Bizarro, a flawed duplicate of Superman.

4. Gold Kryptonite: Gold Kryptonite permanently removes a Kryptonian's powers, rendering them completely human.

5. Black Kryptonite: Black Kryptonite can split a Kryptonian into two separate beings, one good and one evil, or merge two separate beings into one.

6. White Kryptonite: White Kryptonite is toxic to plant life, causing plants to wither and die upon exposure.

7. X-Kryptonite: X-K

In [16]:
response.usage

CompletionUsage(completion_tokens=854, prompt_tokens=17, total_tokens=871, prompt_tokens_details={'cached_tokens': 0, 'audio_tokens': 0}, completion_tokens_details={'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0})

# Temperature

In [17]:
%%time
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Tell me a dad joke"},
    ],
    temperature=1.8
)

CPU times: user 8.31 ms, sys: 2.53 ms, total: 10.8 ms
Wall time: 583 ms


In [18]:
print(response.choices[0].message.content)

Why couldn't the bicycle stand up by itself?

Because it was two tired. 😝


In [19]:
%%time
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Tell me a dad joke"},
    ],
    temperature=0
)

CPU times: user 9.18 ms, sys: 2.33 ms, total: 11.5 ms
Wall time: 538 ms


In [20]:
print(response.choices[0].message.content)

Why couldn't the bicycle stand up by itself?

Because it was two tired!


# Function Calls

In [21]:
def chat(messages, functions):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        # Define the functions the model is allowed to use
        functions=functions
    )
    
    return response

In [22]:
def pretty_print_conversation(messages):
    role_to_color = {
        "system": "red",
        "user": "green",
        "assistant": "blue",
        "function": "magenta",
    }
    
    for message in messages:
#         print(message)
        if message["role"] == "system":
            print(colored(f"system: {message['content']}\n", role_to_color[message['role']]))
        elif message["role"] == "user":
            print(colored(f"user: {message['content']}\n", role_to_color[message['role']]))
        elif message["role"] == "assistant" and message['function_call']:
            print(colored(f"assistant: {message['function_call']}\n", role_to_color[message['role']]))
        elif message["role"] == "assistant" and not message['function_call']:
            print(colored(f"assistant: {message['content']}\n", role_to_color[message['role']]))
        elif message["role"] == "function":
            print(colored(f"function ({message.name}): {message.content}\n", role_to_color[message.role]))


Let's create some function specifications to interface with a hypothetical weather API. We'll pass these function specification to the Chat Completions API in order to generate function arguments that adhere to the specification.

In [45]:
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the users location.",
                },
            },
            "required": ["location", "format"],
        },
    },
]

If we prompt the model about the current weather, it will respond with some clarifying questions.

In [48]:
messages = []

messages.append(
    {"role": "system", 
     "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous."
    })

messages.append(
    {"role": "user", 
     "content": "What's the weather like today"
    })

In [49]:
chat_response = chat(messages, functions=functions)

assistant_message = chat_response.choices[0].message

messages.append({
 "role":  assistant_message.role,
 "content":  assistant_message.content,
 "function_call":  assistant_message.function_call,
})

In [51]:
pretty_print_conversation(messages)

[31msystem: Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.
[0m
[32muser: What's the weather like today
[0m
[34massistant: Sure, I can help with that. Could you please provide me with your current location or the city and state you are in?
[0m


Once we provide the missing information, it will generate the appropriate function arguments for us.

In [52]:
messages.append(
    {"role": "user", 
     "content": "I'm in New York, NY."
    })

In [53]:
chat_response = chat(messages, functions=functions)

assistant_message = chat_response.choices[0].message

messages.append({
 "role":  assistant_message.role,
 "content":  assistant_message.content,
 "function_call":  assistant_message.function_call,
})

In [54]:
pretty_print_conversation(messages)

[31msystem: Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.
[0m
[32muser: What's the weather like today
[0m
[34massistant: Sure, I can help with that. Could you please provide me with your current location or the city and state you are in?
[0m
[32muser: I'm in New York, NY.
[0m
[34massistant: FunctionCall(arguments='{"location":"New York, NY","format":"celsius"}', name='get_current_weather')
[0m


# Prompt-Engineering

## Few-shot prompting

We can also provide several examples of mappings between input and output.

In [31]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant."},
        {"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
        {"role": "assistant", "content": "Sure, I'd be happy to!"},
        {"role": "user", "content": "New synergies will help drive top-line growth."},
        {"role": "assistant", "content": "Things working well together will increase revenue."},
        {"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

This last-minute change means we can't spend excessive time on the client's project.


## Formatted output

In [32]:
%%time
userInput = "blueberry pancakes"

prompt = """return a recipe for %s.
        Provide your response as a JSON object with the following schema:
        {"dish": "%s", "ingredients": ["", "", ...],
        "instructions": ["", "", ... ]}""" % (userInput, userInput)

response = client.chat.completions.create(
          model = "gpt-3.5-turbo",
          messages = [
            { "role": "system", "content": "You are a helpful recipe assistant."},
            { "role": "user",   "content": prompt }
          ]
)

CPU times: user 9.19 ms, sys: 2.19 ms, total: 11.4 ms
Wall time: 3.17 s


In [33]:
json_output = response.choices[0].message.content

In [34]:
output = json.loads(json_output)

In [35]:
pprint(output)

{'dish': 'blueberry pancakes',
 'ingredients': ['1 cup all-purpose flour',
                 '2 tbsp sugar',
                 '1 tsp baking powder',
                 '1/2 tsp baking soda',
                 '1/4 tsp salt',
                 '1 cup buttermilk',
                 '1 large egg',
                 '2 tbsp melted butter',
                 '1 cup fresh blueberries',
                 'Butter or oil for cooking'],
 'instructions': ['In a large bowl, whisk together flour, sugar, baking '
                  'powder, baking soda, and salt.',
                  'In a separate bowl, mix buttermilk, egg, and melted butter.',
                  'Pour the wet ingredients into the dry ingredients and stir '
                  "until just combined. Be careful not to overmix; it's okay "
                  'if the batter is a bit lumpy.',
                  'Gently fold in the fresh blueberries.',
                  'Heat a non-stick skillet or griddle over medium heat and '
                  'add a

In [36]:
output["ingredients"]

['1 cup all-purpose flour',
 '2 tbsp sugar',
 '1 tsp baking powder',
 '1/2 tsp baking soda',
 '1/4 tsp salt',
 '1 cup buttermilk',
 '1 large egg',
 '2 tbsp melted butter',
 '1 cup fresh blueberries',
 'Butter or oil for cooking']

In [37]:
output["instructions"]

['In a large bowl, whisk together flour, sugar, baking powder, baking soda, and salt.',
 'In a separate bowl, mix buttermilk, egg, and melted butter.',
 "Pour the wet ingredients into the dry ingredients and stir until just combined. Be careful not to overmix; it's okay if the batter is a bit lumpy.",
 'Gently fold in the fresh blueberries.',
 'Heat a non-stick skillet or griddle over medium heat and add a small amount of butter or oil.',
 'Pour about 1/4 cup of batter onto the skillet for each pancake.',
 'Cook until bubbles form on the surface of the pancake, then flip and cook for another 1-2 minutes until golden brown.',
 'Repeat with the remaining batter, adding more butter or oil to the skillet as needed.',
 'Serve the blueberry pancakes warm with maple syrup, additional blueberries, and a pat of butter.']

# Translation

In [38]:
response = client.chat.completions.create(
    model='gpt-3.5-turbo',
    messages=[{"role": "system", "content": "You're a professional English-Italian translator."}, 
              {"role": "user", "content": "Translate 'Be the change that you wish to see in the world.' into Italian"}],
    temperature=0,
)

In [39]:
response.choices[0].message.content

'"Sii il cambiamento che desideri vedere nel mondo."'

# Process unstructured information

Inspired by https://platform.openai.com/examples/default-parse-data

In [40]:
prompt = """There are many fruits that were found on the recently discovered planet Goocrux. 
There are neoskizzles that grow there, which are purple and taste like candy. There are also 
loheckles, which are a grayish blue fruit and are very tart, a little bit like a lemon. Pounits 
are a bright green color and are more savory than sweet. There are also plenty of loopnovas which 
are a neon pink flavor and taste like cotton candy. Finally, there are fruits called glowls, which 
have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them."""

In [41]:
response = client.chat.completions.create(
    model='gpt-3.5-turbo',
    messages=[
        {"role": "system", 
         "content": "You will be provided with unstructured data, and your task is to parse it into CSV format."}, 
        {"role": "user", 
         "content": prompt}],
    temperature=0,
)

In [42]:
print(response.choices[0].message.content)

Fruit,Color,Flavor
neoskizzles,Purple,Candy
loheckles,Grayish blue,Tart
pounits,Bright green,Savory
loopnovas,Neon pink,Cotton candy
glowls,Pale orange,Sour and bitter


In [43]:
response = client.chat.completions.create(
    model='gpt-3.5-turbo',
    messages=[{
        "role": "system", 
        "content": """
            Read this paragraph 
            
            `%s` 
            
            and use it to answer some questions.""" % prompt
                }, 
                {
        "role": "user", 
        "content": "What are pounits?"
    }],
    temperature=0,
)

In [44]:
print(response.choices[0].message.content)

Pounits are bright green fruits found on the planet Goocrux. They are described as more savory than sweet.


<center>
     <img src="https://raw.githubusercontent.com/DataForScience/Networks/master/data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>