Using GPT-3.5 and GPT-4 via the OpenAI API in Python

When Should You Use the API Rather Than The Web Interface?

The ChatGPT web application is a great interface to GPT models. However, if you want to include AI into a data pipeline or into software, the API is more appropriate. Some possible use cases for data practitioners include:

Pulling in data from a database or another API, then asking GPT to summarize it or generate a report about it
Embedding GPT functionality in a dashboard to automatically provide a text summary of the results
Providing a natural language interface to your data mart
Performing research by pulling in journal papers through the scholarly (PyPI, Conda) package, and getting GPT to summarize the results
Setup An OpenAI Developer Account

To use the API, you need to create a developer account with OpenAI. You'll need to have your email address, phone number, and debit or credit card details handy.

Follow these steps:

Go to the API signup page.
Create your account (you'll need to provide your email address and your phone number).
Go to the API keys page.
Create a new secret key.
Take a copy of this key. (If you lose it, delete the key and create a new one.)
Go to the Payment Methods page.
Click 'Add payment method' and fill in your card details.
Securely Store Your Account Credentials

The secret key needs to be kept secret! Otherwise, other people can use it to access the API, and you will pay for it. The following steps describe how to securely store your key using DataCamp Workspace. If you are using a different platform, please check the documentation for that platform. You can also ask ChatGPT for advice. Here's a suggested prompt:

> You are an IT security expert. You are speaking to a data scientist. Explain the best practices for securely storing private keys used for API access.

In your workspace, click on Integrations
Click on the "Create integration" plus button
Select an "Environment Variables" integration
In the "Name" field, type "OPENAI". In the "Value" field, paste in your secret key
Click "Create", and connect the new integration
Setting up Python

To use GPT via the API, you need to import the os and openai Python packages.

If you are using a Jupyter Notebook (like DataCamp Workspace), it's also helpful to import some functions from IPython.display.

One example also uses the yfinance package to retrieve stock prices.

In [None]:
# Import the os package
import os

# Import the openai package
import openai

# From the IPython.display package, import display and Markdown
from IPython.display import display, Markdown

# Import yfinance as yf
import yfinance as yf

In [None]:
# Set openai.api_key to the OPENAI environment variable
openai.api_key = os.environ["OPENAI"]

In [None]:
response = openai.ChatCompletion.create(
              model="MODEL_NAME",
              messages=[{"role": "system", "content": 'SPECIFY HOW THE AI ASSISTANT SHOULD BEHAVE'},
                        {"role": "user", "content": 'SPECIFY WANT YOU WANT THE AI ASSISTANT TO SAY'}
              ])

In [None]:
# Define the system message
system_msg = 'You are a helpful assistant who understands data science.'

# Define the user message
user_msg = 'Create a small dataset about total sales over the last year. The format of the dataset should be a data frame with 12 rows and 2 columns. The columns should be called "month" and "total_sales_usd". The "month" column should contain the shortened forms of month names from "Jan" to "Dec". The "total_sales_usd" column should contain random numeric values taken from a normal distribution with mean 100000 and standard deviation 5000. Provide Python code to generate the dataset, then provide the output in the format of a markdown table.'

# Create a dataset using GPT
response = openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                        messages=[{"role": "system", "content": system_msg},
                                         {"role": "user", "content": user_msg}])

In [None]:
response["choices"][0]["finish_reason"]

In [None]:
response["choices"][0]["message"]["content"]

In [None]:
Here's the Python code to generate the dataset:

import numpy as np
import pandas as pd
# Set random seed for reproducibility
np.random.seed(42)
# Generate random sales data
sales_data = np.random.normal(loc=100000, scale=5000, size=12)
# Create month abbreviation list
month_abbr = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
# Create dataframe
sales_df = pd.DataFrame({'month': month_abbr, 'total_sales_usd': sales_data})
# Print dataframe
print(sales_df)

And here's the output in markdown format:

| month | total_sales_usd |

|-------|----------------|

| Jan | 98728.961189 |

| Feb | 106931.030292 |

| Mar | 101599.514152 |

| Apr | 97644.534384 |

| May | 103013.191014 |

| Jun | 102781.514665 |

| Jul | 100157.741173 |

| Aug | 104849.281004 |

| Sep | 100007.081991 |

| Oct | 94080.272682 |

| Nov | 96240.993328 |

| Dec | 104719.371461 |

In [None]:
def chat(system, user_assistant):
  assert isinstance(system, str), "`system` should be a string"
  assert isinstance(user_assistant, list), "`user_assistant` should be a list"
  system_msg = [{"role": "system", "content": system}]
  user_assistant_msgs = [
      {"role": "assistant", "content": user_assistant[i]} if i % 2 else {"role": "user", "content": user_assistant[i]}
      for i in range(len(user_assistant))]

  msgs = system_msg + user_assistant_msgs
  response = openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                          messages=msgs)
  status_code = response["choices"][0]["finish_reason"]
  assert status_code == "stop", f"The status code was {status_code}."
  return response["choices"][0]["message"]["content"]

In [None]:
response_fn_test = chat("You are a machine learning expert.",["Explain what a neural network is."])

display(Markdown(response_fn_test))

In [None]:
# Assign the content from the response in Task 1 to assistant_msg
assistant_msg = response["choices"][0]["message"]["content"]

# Define a new user message
user_msg2 = 'Using the dataset you just created, write code to calculate the mean of the `total_sales_usd` column. Also include the result of the calculation.'

# Create an array of user and assistant messages
user_assistant_msgs = [user_msg, assistant_msg, user_msg2]

# Get GPT to perform the request
response_calc = chat(system_msg, user_assistant_msgs)

# Display the generated content
display(Markdown(response_calc))


Using GPT in a Pipeline

In [None]:
# Import the weather2 package
import weather

# Get the forecast for Miami
location = "Miami"
forecast = weather.forecast(location)

# Pull out forecast data for midday tomorrow
fcast = forecast.tomorrow["12:00"]

# Create a prompt
user_msg_weather = f"In {location} at midday tomorrow, the temperature is forecast to be {fcast.temp}, the wind speed is forecast to be {fcast.wind.speed} m/s, and the amount of precipitation is forecast to be {fcast.precip}. Make a list of suitable leisure activities."

# Call GPT
response_activities = chat("You are a travel guide.", [user_msg_weather])

display(Markdown(response_activities))