<a href="https://colab.research.google.com/github/wandb/edu/blob/main/llm-apps-course/notebooks/01.%20Using_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{llmapps-intro} -->

# Understanding LLM APIs

We will explore OpenAI models API to generate text.

<!--- @wandbcode{llmapps-intro} -->

### Setup

In [3]:
! pip install --upgrade openai tiktoken wandb -qq

In [4]:
import os
import openai
import tiktoken
import wandb
from pprint import pprint
from getpass import getpass
from wandb.integration.openai import autolog
# OPENAPI

You will need an OpenAI API key to run this notebook. You can get one [here](https://platform.openai.com/account/api-keys).

In [5]:
if os.getenv("OPENAI_API_KEY") is None:
  if any(['VSCODE' in x for x in os.environ.keys()]):
    print('Please enter password in the VS Code prompt at the top of your VS Code window!')
  os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n")
  openai.api_key = os.getenv("OPENAI_API_KEY", "")

assert os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API key"
print("OpenAI API key configured")

Paste your OpenAI key from: https://platform.openai.com/account/api-keys
········
OpenAI API key configured


Let's enable W&B autologging to track our experiments.

In [6]:
# start logging to W&B
autolog({"project":"llmapps", "job_type": "introduction"})

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /Users/pepo_abdo/.netrc


# Tokenization

In [8]:
encoding = tiktoken.encoding_for_model("text-davinci-003")
enc = encoding.encode("Weights & Biases is awesome!")
print(enc)
print(encoding.decode(enc))

[1135, 2337, 1222, 8436, 1386, 318, 7427, 0]
Weights & Biases is awesome!


we can decode the tokens one by one

In [9]:
for token_id in enc:
    print(f"{token_id}\t{encoding.decode([token_id])}")

1135	We
2337	ights
1222	 &
8436	 Bi
1386	ases
318	 is
7427	 awesome
0	!


> Note how the leading tokens contain spacing.

# Sampling

Let's sample some text from the model. For this, let's create a wrapper function around the temperature parameters.
Higher temperature will result in more random samples.

In [10]:
def generate_with_temperature(temp):
  "Generate text with a given temperature, higher temperature means more randomness"
  response = openai.Completion.create(
    model="text-davinci-003",
    prompt="Say something me",
    max_tokens=50,
    temperature=temp,
  )
  return response.choices[0].text

In [11]:
for temp in [0, 0.5, 1, 1.5, 2]:
  pprint(f'TEMP: {temp}, GENERATION: {generate_with_temperature(temp)}')

'TEMP: 0, GENERATION: \n\nHi there! How can I help you?'
'TEMP: 0.5, GENERATION: \n\nHi there! How can I help you?'
'TEMP: 1, GENERATION: \n\nHey there! How are you doing?'


RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-3bDGZSe15HEdD9LnQ9cySNbc on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.

You can also use the [`top_p` parameter](https://platform.openai.com/docs/api-reference/completions/create#completions/create-top_p) to control the diversity of the generated text. This parameter controls the cumulative probability of the next token. For example, if `top_p=0.9`, the model will pick the next token from the top 90% most likely tokens. The higher the `top_p` the more likely the model will pick a token that it hasn't seen before. You should only use one of `temperature` or `top_p` at a given time.

In [12]:
def generate_with_topp(topp):
  "Generate text with a given top-p, higher top-p means more randomness"
  response = openai.Completion.create(
    model="text-davinci-003",
    prompt="Say something about Weights & Biases",
    max_tokens=50,
    top_p=topp,
    )
  return response.choices[0].text

In [13]:
for topp in [0.01, 0.1, 0.5, 1]:
  pprint(f'TOP_P: {topp}, GENERATION: {generate_with_topp(topp)}')

('TOP_P: 0.01, GENERATION: \n'
 '\n'
 'Weights & Biases is an amazing tool for tracking and analyzing machine '
 'learning experiments. It provides powerful visualizations and insights into '
 'model performance, enabling data scientists to quickly identify areas of '
 'improvement and optimize their models.')
('TOP_P: 0.1, GENERATION: \n'
 '\n'
 'Weights & Biases is an amazing tool for tracking and analyzing machine '
 'learning experiments. It provides powerful visualizations and insights into '
 'model performance, enabling data scientists to quickly identify areas of '
 'improvement and optimize their models.')


RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-3bDGZSe15HEdD9LnQ9cySNbc on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.

# Chat API

Let's switch to chat mode and see how the model responds to our messages. We have some control over the model's response by passing a `system-role`, here we can steer to model to adhere to a certain behaviour.

> We are using `gpt-3.5-turbo`, this model is faster and cheaper than `davinci-003`

In [14]:
MODEL = "gpt-3.5-turbo"
response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say something about me"},
    ],
    temperature=0,
)

response

<OpenAIObject chat.completion id=chatcmpl-7cgOgUvm9FpCzuRRpghU5o0owpdtS at 0x11750e9f0> JSON: {
  "id": "chatcmpl-7cgOgUvm9FpCzuRRpghU5o0owpdtS",
  "object": "chat.completion",
  "created": 1689453794,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Weights & Biases is a powerful tool for machine learning experimentation and collaboration. It provides a seamless way to track and visualize your machine learning experiments, making it easier to understand and iterate on your models. With features like experiment tracking, hyperparameter optimization, and model versioning, Weights & Biases helps streamline the machine learning workflow and improve productivity. Whether you are a researcher, data scientist, or machine learning engineer, Weights & Biases can be a valuable addition to your toolkit."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion

As you can see above, the response is a JSON object with relevant information about the request.

In [15]:
pprint(response.choices[0].message.content)

('Weights & Biases is a powerful tool for machine learning experimentation and '
 'collaboration. It provides a seamless way to track and visualize your '
 'machine learning experiments, making it easier to understand and iterate on '
 'your models. With features like experiment tracking, hyperparameter '
 'optimization, and model versioning, Weights & Biases helps streamline the '
 'machine learning workflow and improve productivity. Whether you are a '
 'researcher, data scientist, or machine learning engineer, Weights & Biases '
 'can be a valuable addition to your toolkit.')


In [16]:
wandb.finish()

VBox(children=(Label(value='0.016 MB of 0.016 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
usage/completion_tokens,▁▁▁▄▄█
usage/elapsed_time,▂▁▁▃▂█
usage/prompt_tokens,▁▁▁▃▃█
usage/total_tokens,▁▁▁▃▃█

0,1
usage/completion_tokens,98.0
usage/elapsed_time,4.08892
usage/prompt_tokens,25.0
usage/total_tokens,123.0
