<a href="https://colab.research.google.com/github/aaalexlit/wandb_notebooks/blob/main/llm-apps-course/notebooks/01.Using_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/wandb/edu/blob/main/llm-apps-course/notebooks/01.%20Using_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{llmapps-intro} -->

# Understanding LLM APIs

We will explore OpenAI models API to generate text.

<!--- @wandbcode{llmapps-intro} -->

### Setup

In [1]:
%pip install --upgrade openai tiktoken wandb -qq

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m36.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m87.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.5/188.5 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m218.8/218.8 kB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [2]:
import os
import openai
import tiktoken
import wandb
from pprint import pprint
from getpass import getpass
from wandb.integration.openai import autolog

You will need an OpenAI API key to run this notebook. You can get one [here](https://platform.openai.com/account/api-keys).

In [3]:
if os.getenv("OPENAI_API_KEY") is None:
  if any(['VSCODE' in x for x in os.environ.keys()]):
    print('Please enter password in the VS Code prompt at the top of your VS Code window!')
  os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n")
  openai.api_key = os.getenv("OPENAI_API_KEY", "")

assert os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API key"
print("OpenAI API key configured")

Paste your OpenAI key from: https://platform.openai.com/account/api-keys
··········
OpenAI API key configured


Let's enable W&B autologging to track our experiments.

In [4]:
# start logging to W&B
autolog({"project":"llmapps", "job_type": "introduction"})

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016670107799999792, max=1.0…

# Tokenization

In [5]:
encoding = tiktoken.encoding_for_model("text-davinci-003")
enc = encoding.encode("Weights & Biases is awesome!")
print(enc)
print(encoding.decode(enc))

[1135, 2337, 1222, 8436, 1386, 318, 7427, 0]
Weights & Biases is awesome!


we can decode the tokens one by one

In [6]:
for token_id in enc:
    print(f"{token_id}\t{encoding.decode([token_id])}")

1135	We
2337	ights
1222	 &
8436	 Bi
1386	ases
318	 is
7427	 awesome
0	!


> Note how the leading tokens contain spacing.

# Sampling

Let's sample some text from the model. For this, let's create a wrapper function around the temperature parameters.
Higher temperature will result in more random samples.

In [19]:
def generate_with_temperature(temp):
  "Generate text with a given temperature, higher temperature means more randomness"
  response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{
      "role": "user",
      "content": "Say something about Weights & Biases"
    }],
    max_tokens=50,
    temperature=temp,
  )
  return response.choices[0].message.content

In [21]:
import time

In [22]:
for temp in [0, 0.5, 1, 1.5, 2]:
  pprint(f'TEMP: {temp}, GENERATION: {generate_with_temperature(temp)}')
  time.sleep(21)

('TEMP: 0, GENERATION: Weights & Biases is a powerful and user-friendly '
 'platform for machine learning experiment tracking and visualization. It '
 'provides a seamless way to log and analyze model performance, '
 'hyperparameters, and other experiment details. With its intuitive interface '
 'and extensive integrations, it simpl')
('TEMP: 0.5, GENERATION: Weights & Biases is a powerful platform that provides '
 'tools and infrastructure to help researchers and data scientists track, '
 'visualize, and analyze their machine learning experiments. It offers '
 'features like experiment tracking, hyperparameter optimization, and model '
 'visualization, making it easier to understand')
('TEMP: 1, GENERATION: Weights & Biases is a platform that makes it easy to '
 'track, visualize, and understand your machine learning models. It provides '
 'powerful tools for experiment management, allowing you to organize, compare, '
 'and share your model training runs. With its integration with pop

You can also use the [`top_p` parameter](https://platform.openai.com/docs/api-reference/completions/create#completions/create-top_p) to control the diversity of the generated text. This parameter controls the cumulative probability of the next token. For example, if `top_p=0.9`, the model will pick the next token from the top 90% most likely tokens. The higher the `top_p` the more likely the model will pick a token that it hasn't seen before. You should only use one of `temperature` or `top_p` at a given time.

In [23]:
def generate_with_topp(topp):
  "Generate text with a given top-p, higher top-p means more randomness"
  response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{
      "role": "user",
      "content": "Say something about Weights & Biases"
    }],
    max_tokens=50,
    top_p=topp,
    )
  return response.choices[0].message.content

In [24]:
for topp in [0.01, 0.1, 0.5, 1]:
  pprint(f'TOP_P: {topp}, GENERATION: {generate_with_topp(topp)}')
  time.sleep(21)

('TOP_P: 0.01, GENERATION: Weights & Biases is a powerful and user-friendly '
 'platform for machine learning experiment tracking and visualization. It '
 'provides a seamless way to log and analyze model performance, '
 'hyperparameters, and other experiment details. With its intuitive interface '
 'and extensive integrations, it simpl')
('TOP_P: 0.1, GENERATION: Weights & Biases is a powerful and user-friendly '
 'platform for machine learning experiment tracking and visualization. It '
 'provides a seamless way to log and analyze model performance, '
 'hyperparameters, and other experiment details. With its intuitive interface '
 'and extensive integrations, it simpl')
('TOP_P: 0.5, GENERATION: Weights & Biases is a powerful platform for machine '
 'learning experimentation and collaboration. It provides a comprehensive '
 'suite of tools and features that enable researchers and data scientists to '
 'track, visualize, and analyze their models and experiments. With its '
 'seamless 

# Chat API

Let's switch to chat mode and see how the model responds to our messages. We have some control over the model's response by passing a `system-role`, here we can steer to model to adhere to a certain behaviour.

> We are using `gpt-3.5-turbo`, this model is faster and cheaper than `davinci-003`

In [25]:
MODEL = "gpt-3.5-turbo"
response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say something about Weights & Biases"},
    ],
    temperature=0,
)

response

<OpenAIObject chat.completion id=chatcmpl-7tU3rUk6KeHPFYjxX9NJCf7NkM3bm at 0x7a4ab739d800> JSON: {
  "id": "chatcmpl-7tU3rUk6KeHPFYjxX9NJCf7NkM3bm",
  "object": "chat.completion",
  "created": 1693457951,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Weights & Biases is a powerful tool for machine learning experimentation and collaboration. It provides a seamless way to track and visualize your machine learning experiments, making it easier to understand and iterate on your models. With features like experiment tracking, hyperparameter optimization, and model versioning, Weights & Biases helps streamline the machine learning workflow and improve productivity. Whether you are a researcher, data scientist, or machine learning engineer, Weights & Biases can be a valuable addition to your toolkit."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "complet

As you can see above, the response is a JSON object with relevant information about the request.

In [26]:
pprint(response.choices[0].message.content)

('Weights & Biases is a powerful tool for machine learning experimentation and '
 'collaboration. It provides a seamless way to track and visualize your '
 'machine learning experiments, making it easier to understand and iterate on '
 'your models. With features like experiment tracking, hyperparameter '
 'optimization, and model versioning, Weights & Biases helps streamline the '
 'machine learning workflow and improve productivity. Whether you are a '
 'researcher, data scientist, or machine learning engineer, Weights & Biases '
 'can be a valuable addition to your toolkit.')


In [27]:
wandb.finish()

VBox(children=(Label(value='0.052 MB of 0.052 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
usage/completion_tokens,▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂█
usage/elapsed_time,▁▁▁▄▃▄▄▅▃▃▄▄▃▄▄▃▄█
usage/prompt_tokens,▁▁▁▄▄▄▄▄▄▄▄▄▄▄▄▄▄█
usage/total_tokens,▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂█

0,1
usage/completion_tokens,98.0
usage/elapsed_time,8.38431
usage/prompt_tokens,25.0
usage/total_tokens,123.0
