# Using `prompto` with OpenAI

In [1]:
from prompto.settings import Settings
from prompto.experiment import Experiment
from dotenv import load_dotenv
import warnings
import os

When using `prompto` to query models from the OpenAI API, lines in our experiment `.jsonl` files must have `"api": "openai"` in the prompt dict. 

## Environment variables

For the [AzureOpenAI API](../../docs/models.md#azure-openai), there are four environment variables that could be set:
- `AZURE_OPENAI_API_KEY`: the API key for the Azure OpenAI API
- `AZURE_OPENAI_API_ENDPOINT`: the endpoint for the Azure OpenAI API
- `AZURE_OPENAI_API_VERSION`: the version of the Azure OpenAI API (optional)
- `AZURE_OPENAI_MODEL_NAME`: the default model name for the Azure OpenAI API (optional)

As mentioned in the [model docs](../../docs/models.md#model-specific-environment-variables), there are also model-specific environment variables too which can be utilised. In particular, if you specify a `model_name` key in a prompt dict, one could also specify a `AZURE_OPENAI_API_KEY_model_name` environment variable to indicate the API key used for that particular model (where "model_name" is replaced to whatever the corresponding value of the `model_name` key is). We will see a concrete example of this later. The same applies for the `AZURE_OPENAI_API_ENDPOINT_model_name` and `AZURE_OPENAI_API_VERSION_model_name` environment variables.

Note that `OPENAI_MODEL_NAME` is optional since you can simply specify `model_name` to each prompt dict that has `"api": "azure-openai"`.

To set environment variables, one can simply have these in a `.env` file which specifies these environment variables as key-value pairs:
```
AZURE_OPENAI_API_KEY = <YOUR-AZURE-OPENAI-KEY>
AZURE_OPENAI_API_ENDPOINT = <YOUR-AZURE-OPENAI-ENDPOINT>
AZURE_OPENAI_API_VERSION = <DEFAULT-AZURE-OPENAI-API-VERSION>
AZURE_OPENAI_MODEL_NAME = <DEFAULT-AZURE-OPENAI-MODEL>
```

If you make this file, you can run the following which should return `True` if it's found one, or `False` otherwise:

In [2]:
load_dotenv(dotenv_path=".env")

True

Now, we obtain those values. We raise an error if the `OPENAI_API_KEY` environment variable hasn't been set:

In [3]:
AZURE_OPENAI_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
if AZURE_OPENAI_API_KEY is None:
    raise ValueError("AZURE_OPENAI_API_KEY is not set")

In [4]:
AZURE_OPENAI_API_ENDPOINT = os.environ.get("AZURE_OPENAI_API_ENDPOINT")
if AZURE_OPENAI_API_ENDPOINT is None:
    raise ValueError("AZURE_OPENAI_API_ENDPOINT is not set")

We will only raise a warning if `AZURE_OPENAI_API_VERSION` or `AZURE_OPENAI_MODEL_NAME` hasn't been set:

In [5]:
AZURE_OPENAI_API_VERSION = os.environ.get("AZURE_OPENAI_API_VERSION")
if AZURE_OPENAI_API_VERSION is None:
    warnings.warn("AZURE_OPENAI_API_VERSION is not set")
else:
    print(f"Default AzureOpenAI version: {AZURE_OPENAI_API_VERSION}")

Default AzureOpenAI version: 2024-02-01


Note the model name for the AzureOpenAI API is actually the _deployment name_ of the model that you have chosen in your Azure subscription. For us, we set this to be "reginald-gpt4", but you should replace this with your own deployment name of whatever model you have chosen.

In [6]:
AZURE_OPENAI_MODEL_NAME = os.environ.get("AZURE_OPENAI_MODEL_NAME")
if AZURE_OPENAI_MODEL_NAME is None:
    warnings.warn("AZURE_OPENAI_MODEL_NAME is not set")
else:
    print(f"Default AzureOpenAI model: {AZURE_OPENAI_MODEL_NAME}")

Default AzureOpenAI model: reginald-gpt4


If you get any errors or warnings in the above two cells, try to fix your `.env` file like the example we have above to get these variables set.

## Types of prompts

With the OpenAI API, the prompt (given via the `"prompt"` key in the prompt dict) can take several forms:
- a string: a single prompt to obtain a response for
- a list of strings: a sequence of prompts to send to the model
    - this is useful in the use case of simulating a conversation with the model by defining the user prompts sequentially
- a list of dictionaries with keys "role" and "content", where "role" is one of "user", "assistant", or "system" and "content" is the message
    - this is useful in the case of passing in some conversation history or to pass in a system prompt to the model

We have created an input file in [data/input/azure-openai-example.jsonl](./data/input/azure-openai-example.jsonl) with an example of each of these cases as an illustration.

In [7]:
settings = Settings(data_folder="./data", max_queries=30)
experiment = Experiment(file_name="azure-openai-example.jsonl", settings=settings)

We set `max_queries` to 30 so we send 30 queries a minute (every 2 seconds).

In [8]:
print(settings)

Settings: data_folder=./data, max_queries=30, max_attempts=3, parallel=False
Subfolders: input_folder=./data/input, output_folder=./data/output, media_folder=./data/media


In [9]:
len(experiment.experiment_prompts)

5

We can see the prompts that we have in the `experiment_prompts` attribute:

In [10]:
experiment.experiment_prompts

[{'id': 0,
  'api': 'azure-openai',
  'prompt': 'How does technology impact us?',
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 1000}},
 {'id': 1,
  'api': 'azure-openai',
  'model_name': 'gpt-3.5-turbo',
  'prompt': 'How does technology impact us?',
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 1000}},
 {'id': 2,
  'api': 'azure-openai',
  'prompt': ['How does international trade create jobs?',
   'I want a joke about that'],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 1000}},
 {'id': 3,
  'api': 'azure-openai',
  'prompt': [{'role': 'system',
    'content': 'You are a helpful assistant designed to answer questions briefly.'},
   {'role': 'user',
    'content': 'What efforts are being made to keep the hakka language alive?'}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 1000}},
 {'id': 4,
  'api': 'azure-openai',
  'prompt': [{'role': 'system',
    'content': 'You are a helpful assistant designed to answer questions briefly.'},
   {'role

- In the first prompt (`"id": 0`), we have a `"prompt"` key which is a string and we do not specify a `"model_name"` key, hence we will use the model specified by the `AZURE_OPENAI_MODEL_NAME` environment variable.
- In the second prompt (`"id": 1`), we have a `"prompt"` key is also a string but we specify a `"model_name"` key to be "gpt-3.5-turbo" which will override the default model specified by the `AZURE_OPENAI_MODEL_NAME` environment variable. We also will first look to see if there is a `AZURE_OPENAI_API_KEY_gpt_3_5_turbo` environment variable set since we look for model-specific environment variables whenever a `model_name` key is specified. Here, we don't do that, so we will use the `OPENAI_API_KEY` environment variable.
- In the third prompt (`"id": 2`), we have a `"prompt"` key which is a list of strings. Like the first prompt, we also do not specify a `"model_name"` key, so we will use the model specified by the `AZURE_OPENAI_MODEL_NAME` environment variable.
- In the fourth prompt (`"id": 3`), we have a `"prompt"` key which is a list of dictionaries. These dictionaries have a "role" and "content" key. This acts as passing in a system prompt. Here, we just have a system prompt before a user prompt. No `"model_name"` key is specified.
- In the fifth prompt (`"id": 4`), we have a `"prompt"` key which is a list of dictionaries. These dictionaries have a "role" and "content" key. Here, we have a system prompt and a series of user/assistant interactions before finally having a user prompt. This acts as passing in a system prompt and conversation history. No `"model_name"` key is specified.

In [11]:
experiment.experiment_prompts[3]["prompt"]

[{'role': 'system',
  'content': 'You are a helpful assistant designed to answer questions briefly.'},
 {'role': 'user',
  'content': 'What efforts are being made to keep the hakka language alive?'}]

## Running the experiment

We now can run the experiment using the async method `process` which will process the prompts in the input file asynchronously. Note that a new folder named `timestamp-openai-example` (where "timestamp" is replaced with the actual date and time of processing) will be created in the output directory and we will move the input file to the output directory. As the responses come in, they will be written to the output file and there are logs that will be printed to the console as well as being written to a log file in the output directory.

In [12]:
responses = await experiment.process()

Sending 5 queries  (attempt 1/3): 100%|██████████| 5/5 [00:10<00:00,  2.00s/query]
Waiting for responses  (attempt 1/3): 100%|██████████| 5/5 [01:00<00:00, 12.17s/query]
Sending 1 queries  (attempt 2/3): 100%|██████████| 1/1 [00:02<00:00,  2.00s/query]
Waiting for responses  (attempt 2/3): 100%|██████████| 1/1 [00:00<00:00, 12.49query/s]
Sending 1 queries  (attempt 3/3): 100%|██████████| 1/1 [00:02<00:00,  2.01s/query]
Waiting for responses  (attempt 3/3): 100%|██████████| 1/1 [00:00<00:00, 12.76query/s]


We can see that the responses are written to the output file, and we can also see them as the returned object. From running the experiment, we obtain prompt dicts where there is now a `"response"` key which contains the response(s) from the model.

For the case where the prompt is a list of strings, we see that the response is a list of strings where each string is the response to the corresponding prompt.

Note here, for our specific Azure subscription, we haven't got a model with deployment name "gpt-3.5-turbo" and hence, we actually receive an error message in the response for the second prompt.

In [13]:
responses

([{'id': 4,
   'api': 'azure-openai',
   'prompt': [{'role': 'system',
     'content': 'You are a helpful assistant designed to answer questions briefly.'},
    {'role': 'user', 'content': "Hello, I'm Bob and I'm 6 years old"},
    {'role': 'assistant', 'content': 'Hi Bob, how may I assist you?'},
    {'role': 'user', 'content': 'How old will I be next year?'}],
   'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 1000},
   'response': "If you're 6 years old now, you'll be 7 years old next year."},
  {'id': 3,
   'api': 'azure-openai',
   'prompt': [{'role': 'system',
     'content': 'You are a helpful assistant designed to answer questions briefly.'},
    {'role': 'user',
     'content': 'What efforts are being made to keep the hakka language alive?'}],
   'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 1000},
   'response': 'Efforts to keep the Hakka language alive include a range of cultural, educational, and technological initiatives:\n\n1. **Language Education**: Cours

## Running the experiment via the command line

We can also run the experiment via the command line. The command is as follows (assuming that your working directory is the current directory of this notebook, i.e. `examples/azure-openai`):
```bash
prompto_run_experiment --file data/input/azure-openai-example.jsonl --max_queries 30
```
