# Module 1: Introduction to OpenAI API & Prompting

This module introduces participants to the motivation for working with the OpenAI API and the fundamentals for doing so. Participants will learn about the OpenAI Platform, types of prompting, and be introduced to advanced topics covered in later modules (e.g. RAG, Agents).

## ChatGPT -> API 
Let's begin by exploring how we could use ChatGPT for a data validation task and compare the responses from different models.

Below we have some sample data for concommitant medications and medical history events. Let's ask two different models in ChatGPT to identify medications that do not have an associated condition (a likely data quality issue). The optimal answer is that the model identifies that Advil (Patient 1) and Creatine (Patient 3) have no associated patient condition.

We'll start using by passing the following prompt to 4o-mini and to o1 mini:
```
# Task
Go through the following medication and condition lists and return medications for which the patient does not have an associated condition.

# Input

## Medications:
[
    ("Patient 1", "Metformin"),
    ("Patient 1", "Advil"),
    ("Patient 2", "Humira"),
    ("Patient 3", "Dupixent"),
    ("Patient 3", "Creatine"),
]

## Conditions:
[
    ("Patient 1", "Diabetes"),
    ("Patient 1", "Chronic Asthma"),
    ("Patient 2", "Rheumatoid Arthritis"),
    ("Patient 3", "Eczema"),
]
```
Screenshots of responses:
- 4o mini: https://chatgpt.com/share/e/67d3152b-028c-8005-a131-e31a60dc07d0
- o3-mini: https://chatgpt.com/share/e/67d314f1-9020-8005-bce1-94909e307bc8
- o3-mini-high: https://chatgpt.com/share/e/67d31566-486c-8005-8269-ac25065cd7af


## Introduction to OpenAI API

In the above example, it would become tedious to continually update this prompt and paste it into ChatGPT (e.g. for patients at multiple trial sites or at various points in the trial) or to use a larger dataset. It would be easier and more scalable to use the OpenAI API to automate this process. 

### API Advantages
- Automated Programmatic Workflows
   - Enables automation of repetitive tasks, allowing you to run the same processes multiple times with different inputs efficiently (as well as batch processing!)
- Enhanced Control Over Model Configuration
   - Greater flexibility in choosing specific models, adjusting model parameters, and customizing system prompts
- Production-Ready Capabilities
   - Facilitates embedding AI capabilities directly into applications that can scale and handle large volumes of requests
   - Developers can select and lock specific model versions and integrate with monitoring tools to track usage, performance, and error rates

### API Keys
OpenAI API keys are used to securely access the API and should be set as an environment variable when interacting with the OpenAI models.

To see your individual API key(s) go to https://platform.openai.com/settings/project/api-keys. Here you can also create new API keys if you want to separately track your usage for various projects/use cases.

At Genmab we use "Project Keys" for personal development and non-production use cases. If your use case evolves and needs a Service Key, submit a consultation request via [this link](https://teams.microsoft.com/l/entity/81fef3a6-72aa-4648-a763-de824aeafb7d/_djb2_msteams_prefix_1776558767?context=%7B%22channelId%22%3A%2219%3Af44efea3d91e4a1c9da746d204a90ff3%40thread.tacv2%22%7D&tenantId=9a88a419-24b9-401f-8557-e155db7ae966) and Farhat will get back to you. Service accounts are tied to a "bot" individual and should be used to provision access for production systems.

In [1]:
pip install -r requirements1.txt

Note: you may need to restart the kernel to use updated packages.


In [1]:
# Import required libraries
import openai
import os
import pandas as pd

In [2]:
# Add the key for the AI Course below
 
os.environ["OPENAI_API_KEY"] = "your_api_key_here"
openai.api_key = os.getenv("OPENAI_API_KEY")

### Chat Completions API

OpenAI provides simple APIs to use an LLM to generate text from a prompt, as you might using ChatGPT. The [Chat Completions API](https://platform.openai.com/docs/api-reference/chat) is a powerful tool for building conversational agents and dynamic responses. We can use it to create dialogues, making it essential for various applications like customer service bots, virtual assistants, and more. 

The format of the Chat Completitions API is common and highly useful for building agents (which we'll see in a later session). The structure of multiple messages, each with a role (user, system, assistant) and content helps maintain a clear context for interactions, which is essential when building multi-turn dialogues.

Each chat completitions request must specify a minimum of two parameters:
1. `model`: the model ID to use
2. `messages`: a list of dictionaries where each dictionary represents a message in the conversation. Each message must have:
   - `role`: such as [`system`](https://platform.openai.com/docs/guides/text-generation#system-messages), [`user`](https://platform.openai.com/docs/guides/text-generation#user-messages), or [`assistant`](https://platform.openai.com/docs/guides/text-generation#assistant-messages)
   - `content`: which is the main body of the message that conveys information or questions


To get a better feel for the Chat Completions API lets try to recreate a chatbot!

In [3]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

# See possible models here: https://platform.openai.com/docs/models#model-endpoint-compatibility
client = openai.OpenAI()
response = client.chat.completions.create(
        model= "gpt-4o-mini",
        messages=messages
    )

# Structure of response from model described here: https://platform.openai.com/docs/api-reference/chat/object
print(response.choices[0].message)

ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None)


In [4]:
# Lets add this message to our dialogue and ask a follow up question
messages.append({"role": "assistant", "content": response.choices[0].message.content})
messages.append({"role": "user", "content": "And what is the population of that city?"})
response = client.chat.completions.create(
        model= "gpt-4o-mini",
        messages=messages
    )
print(response.choices[0].message.content)


As of my last update in 2023, the population of Paris is approximately 2.1 million residents within the city limits. However, the larger metropolitan area, known as the Île-de-France region, has a population of around 12 million people. For the most current population figures, it’s always a good idea to consult the latest census data or official statistics.


Above we entered content that was just a string of text, but we can also pass the model content that is not text with a compatible multimodal model, like 4o-mini. Let's pass the model [this image of the Eiffel Tower](https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/Tour_Eiffel_Wikimedia_Commons_%28cropped%29.jpg/250px-Tour_Eiffel_Wikimedia_Commons_%28cropped%29.jpg).

In [5]:
# Content can also be passed as a list of content parts, each with a defined type.
new_message = {"role": "user",
               "content": [
                {"type": "text", "text": "Is the item in this image within that city?"},
                {"type": "image_url",  "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/Tour_Eiffel_Wikimedia_Commons_%28cropped%29.jpg/500px-Tour_Eiffel_Wikimedia_Commons_%28cropped%29.jpg"}}
                ]
                }

messages.append(new_message)
response = client.chat.completions.create(
        model= "gpt-4o-mini",
        messages=messages
    )
print(response.choices[0].message.content)

Yes, the item in the image is located in Paris, France. It is the Eiffel Tower.


Check out [the chat completion docs](https://platform.openai.com/docs/api-reference/chat/create) to learn more about other parameters that can be specified in chats, some particularly useful ones are:
- `temperature`: controls the randomness of the model’s output. Higher values make responses more creative and diverse, while lower values make them more focused and deterministic. Default: 1.
- `max_completion_tokens`: sets the upper limit on the number of tokens the model can generate for a response, including visible text and internal reasoning. This helps manage response length and token costs. Default: None.
- `response_format`: specifies the output format (e.g. {"type": "json_object"} ensures the output is valid JSON). Default: None.
- `n`: how many responses the model generates for each input. Default: 1.

#### Additional References
- [OpenAI Text Generation Guide](https://platform.openai.com/docs/guides/text-generation?lang=python)

# Introduction to Prompting

Thus far we have directly asked the model to do tasks, without any training or examples- this is often called *"zero-shot prompting"*. The generalization capabilities of LLMs allow zero-shot prompting to work successfully for a large and diverse number of tasks, however in some more complex cases the model can fall short. *"Few-shot prompting"* is a technique where you provide the examples in prompt to steer the model to better performance.

Let's go back to our medication and condition dataset and try asking 4o-mini to solve this problem for us with zero-shot prompting through the API.

In [6]:
cm_data = [
    ("Patient 1", "Metformin"),
    ("Patient 1", "Advil"),
    ("Patient 2", "Humira"),
    ("Patient 3", "Dupixent"),
    ("Patient 3", "Creatine"),
]

mh_data = [
    ("Patient 1", "Diabetes"),
    ("Patient 1", "Chronic Asthma"),
    ("Patient 2", "Rheumatoid Arthritis"),
    ("Patient 3", "Eczema"),
]
content = f"""
        Task: Go through the following medication and condition lists and return medications for which the patient does not have an associated condition.
        
        ## Input

        Medications:
        {cm_data}

        Conditions:
        {mh_data}

        ## Output:
        """

client = openai.OpenAI()

response = client.chat.completions.create(
        model= "gpt-4o-mini",
        messages=[{"role": "user", "content": content}],
        temperature=0,
    )
print(response.choices[0].message.content)

To find the medications for which the patient does not have an associated condition, we can compare the lists of medications and conditions for each patient.

### Medications:
- Patient 1: Metformin, Advil
- Patient 2: Humira
- Patient 3: Dupixent, Creatine

### Conditions:
- Patient 1: Diabetes, Chronic Asthma
- Patient 2: Rheumatoid Arthritis
- Patient 3: Eczema

### Analysis:
- **Patient 1**:
  - Medications: Metformin, Advil
  - Conditions: Diabetes, Chronic Asthma
  - Both medications are associated with conditions.

- **Patient 2**:
  - Medications: Humira
  - Conditions: Rheumatoid Arthritis
  - Humira is associated with a condition.

- **Patient 3**:
  - Medications: Dupixent, Creatine
  - Conditions: Eczema
  - Dupixent is associated with a condition, but Creatine is not associated with any condition.

### Result:
The only medication without an associated condition is:
- Patient 3: Creatine

### Output:
[('Patient 3', 'Creatine')]


Interestingly, this answer is pretty different than what we got with 4o-mini through ChatGPT earlier. That is probably due to the fact that ChatGPT has its own system prompt (see https://x.com/krishnanrohit/status/1755122786014724125) and may be updated more frequently than the API model.

Regardless, the answer is excessively wordy and only identifies one of the two correct answers. Let's try to do *"one-shot prompting"* and give a single example to the model to see if it improves.

In [7]:
content = f"""
        Task: Go through the following medication and condition lists and return medications for which the patient does not have an associated condition.

        ## Example:
        Medications:
        [("Patient 1", "Atorvastatin"), ("Patient 2", "Methotrexate"), ("Patient 2", "Simvastatin")]

        Conditions:
        [("Patient 1", "Hyperlipidemia"), ("Patient 2", "Psoriasis")]

        Expected Output:
        [("Patient 2", "Simvastatin")]

        ## Input

        Medications:
        {cm_data}

        Conditions:
        {mh_data}

        ## Output:
        """

response = client.chat.completions.create(
        model= "gpt-4o-mini",
        messages=[{"role": "user", "content": content}],
        temperature=0,
    )
print(response.choices[0].message.content)

[('Patient 1', 'Advil'), ('Patient 3', 'Creatine')]


Much better! It is generally best practice to include at least one example in your prompt for any complex task. Some best practices for adding examples to your prompt are:
- Use a clear and specific prompt format.
- Use high quality and diverse examples (e.g. if you are asking for classification use positive and negative examples).
- Iterate! See where zero-shot fails, then add examples iteratively based on failure modes.
- There is some [research](https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf) to suggest using at least 2 examples, but that in-context learning may plateau after that.
- Use tools to help you create and test prompts (e.g. Anthropic Workbench or Bedrock Prompt Management) or ask ChatGPT for help refining your prompt.

### Beyond Few-Shot

#### Files & Knowledge Bases
Sometimes even quality examples can't get you all the way, and you might need to share additional knowledge with the model. In ChatGPT this is done through [File Uploads](https://help.openai.com/en/articles/8555545-file-uploads-faq) which allows you to upload up to 20 files (maximum size of 512 MB each) to be used for synthesis, transformation, or extraction.

In the OpenAI API you can use the [Files API](https://platform.openai.com/docs/api-reference/files) to upload and use files in [Assistants](https://platform.openai.com/docs/api-reference/assistants), [Fine-tuning](https://platform.openai.com/docs/api-reference/fine-tuning), or Batch API. Support for files in Chat Completitions was recently added but only for [PDF files](https://platform.openai.com/docs/guides/pdf-files?api-mode=chat).

Let's look at medical coding to the MedDRA dictionary and consider some other ways we could give the model extra knowledge outside of few-shot learning.

In [8]:
# Import a small version (~30K/80K total codes) of the most recent MedDRA dictionary 
df = pd.read_csv("meddra_27_1_llt_small.txt", sep=" ", header=None, dtype=str)
df.columns = ["llt_code", "llt_name", "pt_code", "llt_currency"]
df.head()

Unnamed: 0,llt_code,llt_name,pt_code,llt_currency
0,10000001,Ventilation pneumonitis,10081988,N
1,10000002,11-beta-hydroxylase deficiency,10000002,Y
2,10000003,11-oxysteroid activity incr,10033315,N
3,10000004,11-oxysteroid activity increased,10033315,Y
4,10000005,17 ketosteroids urine,10000005,Y


In [9]:
# Let's try to get GPT to help us find the appropriate code for recurrent cervical cancer
condition = "recurrent cervical cancer"
df[df["llt_name"].str.contains("cervical cancer", case=False)]

Unnamed: 0,llt_code,llt_name,pt_code,llt_currency
7918,10008229,Cervical cancer,10008342,Y
7919,10008231,Cervical cancer recurrent,10008344,Y
7920,10008232,Cervical cancer stage 0,10061809,Y
7921,10008233,Cervical cancer stage I,10008345,Y
7922,10008234,Cervical cancer stage II,10008346,Y
7923,10008235,Cervical cancer stage III,10008347,Y
7924,10008236,Cervical cancer stage IV,10008348,Y


In [10]:
system_message = ("You are a skilled medical coder translating natural language medical history conditions and adverse events "
                  "to codes from the MedDRA dictionary.")
input_message = f"""Determine the appropriate MedDRA LLT code for the following medical condition.
                # Example:
                ## Input: "ventilation pneumonitis"
                ## Output: "10000001"
                
                # Input:
                {condition}
                """

messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": input_message}
]

response = client.chat.completions.create(
        model= "gpt-4o-mini",
        temperature=0,
        messages=messages
    )

print(response.choices[0].message.content)

The appropriate MedDRA LLT code for "recurrent cervical cancer" is "10012400".


In [11]:
# Let's check to see if that was correct
output_code = "10012400"
df[df["llt_code"] == output_code]

Unnamed: 0,llt_code,llt_name,pt_code,llt_currency
11935,10012400,"Depressive disorder, not elsewhere classified",10012378,N


Since the model is unable to code correctly, let's instead give the model the dictionary file as a source of knowledge and ask again. We could do this in ChatGPT with a file upload, or with the OpenAI API we can use the new [Responses API](https://platform.openai.com/docs/guides/responses-vs-chat-completions) with file search capabilities. 

The Responses API offers a new way to interact with OpenAI models, similar to Chat Completitions, but is built with agents in mind. This means it gives users access to tools that are common in agentic tasks like web search, file search, and computer use. The Responses API replaces OpenAI's previous agentic framework, called [Assistants](https://platform.openai.com/docs/assistants/overview) which will be deprecated in 2026. If you are new to the OpenAI platform, it is probably wise to start using Responses inplace of Chat Completitions as your API for interacting with OpenAI models. 

For our MedDRA problem let's use the file search tool in the Responses API, which enables models to retrieve information from a knowledge base of uploaded files. We begin by uploading a file to the Files API.

In [12]:
# Generic method to use for uploading files (from URL for file path) to Files API
import requests
from io import BytesIO

def create_file(client, file_path):
    if file_path.startswith("http://") or file_path.startswith("https://"):
        # Download the file content from the URL
        response = requests.get(file_path)
        file_content = BytesIO(response.content)
        file_name = file_path.split("/")[-1]
        file_tuple = (file_name, file_content)
        result = client.files.create(
            file=file_tuple,
            purpose="assistants"
        )
    else:
        # Handle local file path
        with open(file_path, "rb") as file_content:
            result = client.files.create(
                file=file_content,
                purpose="assistants"
            )
    print(result.id)
    return result.id

# Replace with your own file path or URL
# file_id = create_file(client, "meddra_27_1_llt_small.txt")

# To keep costs down the above line has been run once and we will all use the below file_id generated
file_id = "file-Bkod9mSgqMezv6E2eJNaeA"

<!-- The file_search tool uses the Vector Store object for storing and searching file content. Adding a file to a vector store automatically parses, chunks, embeds and stores the file in a vector database that's capable of both keyword and semantic search (more on this to come in the RAG section later). Each vector store can hold up to 10,000 files. 

For our simple example we will create a vector store, upload and add our file to the vector store, then give the vector store to the assistant. -->

In [13]:
# Create a vector store
# vector_store = client.vector_stores.create(name="knowledge_base")

# To keep costs down the above line has been run once and we will all use the vector store generated
vector_store_id = "vs_67d9f5c8516c8191a000cd88e1284edb"
 
# Add a file to a vector store
client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_id
)

VectorStoreFile(id='file-Bkod9mSgqMezv6E2eJNaeA', created_at=1744585172, last_error=None, object='vector_store.file', status='in_progress', usage_bytes=0, vector_store_id='vs_67d9f5c8516c8191a000cd88e1284edb', attributes={}, chunking_strategy=StaticFileChunkingStrategyObject(static=StaticFileChunkingStrategy(chunk_overlap_tokens=400, max_chunk_size_tokens=800), type='static'))

In [14]:
content = f"What is the MedDRA LLT code for the following medical condition: {condition}"
response = client.responses.create(
    model="gpt-4o-mini",
    input=content,
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id]
    }])

print(response)

Response(id='resp_67fc4335332c81918ad11e4c951ecaea0632bb47f11cd048', created_at=1744585525.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseFileSearchToolCall(id='fs_67fc43362e788191abab9295aba6d8180632bb47f11cd048', queries=['MedDRA LLT code for recurrent cervical cancer', 'recurrent cervical cancer', 'cervical cancer MedDRA LLT code'], status='completed', type='file_search_call', results=None), ResponseOutputMessage(id='msg_67fc43393868819185ecac37b96afe3d0632bb47f11cd048', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-Bkod9mSgqMezv6E2eJNaeA', index=67, type='file_citation', filename='meddra_27_1_llt_small.txt')], text='The MedDRA LLT code for "recurrent cervical cancer" is **10008344**.', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FileSearchTool(type='file_sea

When the file search tool is called by the model, you will receive a response with multiple outputs:
1. A file_search_call output item, which contains the id of the file search call.
2. A message output item, which contains the response from the model, along with the file citations.


In [15]:
# The first text response from the model
print(response.output[1].content[0].text)

The MedDRA LLT code for "recurrent cervical cancer" is **10008344**.


In [16]:
# Let's check to see if that was correct
output_code = "10008344"
df[df["llt_code"] == output_code]

Unnamed: 0,llt_code,llt_name,pt_code,llt_currency
8029,10008344,Cervix carcinoma recurrent,10008344,Y


Using file search in Responses API is a great precursor to understanding Retrieval-Augmented Generation (RAG) systems, a topic which will be covered in a later module. By integrating file search, you are storing information from files in a vector store which can then be used to dynamically retrieve relevant information and feed that context to the model. This approach allows the LLM to augment its responses with factual data stored externally.

Like with other RAG systems, you can fine-tune some parameters to optimize search precision and relevance. For example, you can customize the number of results retrieved from the vector store, and add metadata filtering to enhance the quality of retrieved content. To explore these concepts further check out the [OpenAI File Search Guide](https://platform.openai.com/docs/guides/tools-file-search#retrieval-customization).

#### Finetuning
Sometimes just passing a file or knowledge base to the model may not be sufficient, and in *some* of these cases fine-tuning can help. Fine-tuning is the process of adapting an existing model to perform better for a specific task by training it on a task-specific dataset. This process requires a large, well-structured dataset and usually a **significant amount of time, effort, and cost**. It is always recommended to first explore prompt engineering, integrating knowledge bases, and/or leveraging function calling to achieve desired results before considering fine-tuning.

Some common use cases where fine-tuning can improve results: 
- Handling many edge cases in specific ways
- Setting the style, tone, format, or other qualatative aspects in ways that's hard to articulate in a prompt
- Improving reliability at producing a desired output

For some long-term projects using LLMs, fine-tuning can reduce cost and/or latency associated with repeatedly sending lengthy prompts. In these cases the upfront cost of fine-tuning and hosting a fine-tuned model can pay off over time. An increasingly common trend is to replace usage of larger LLMs (like gpt-4o) with a fine-tuned smaller model (like gpt-4o-mini) to achieve cost savings and maintain or improve performance.

Returning to our medical coding example we can consider how companies often accumulate large datasets of historical coding decisions. These datasets often contain the most challenging cases for medical coding (e.g. where conditions are represented differently than in the dictionary and may contain abbreviations and/or specialized terminology). In this context, it might be useful to fine-tune a smaller model with the historical coding dataset to create a company-specific medical coding model that makes decisions similar to past coding decisions.

Because most of us are using LLMs for proof-of-concept and shorter-term projects, we won't cover the specifics of fine-tuning with the OpenAI API here, but refer to the [OpenAI fine-tuning docs](https://platform.openai.com/docs/guides/fine-tuning) to learn more.