<a href="https://colab.research.google.com/github/statgen/EPID731_2024/blob/main/EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Harmonizing Medication Data with OpenAI GPT: Workshop

Welcome to the workshop on using OpenAI GPT for harmonizing medication data. In this workshop, we will explore how to:
1. Setting Up the Environment for Google Colab so we can use the OpenAI API
2. Develop a powerful prompt to classify medications.
3. Explore various parameters of the API that influence the model's performance.


# Before you start: Create a copy of this notebook

The original notebook is read-only, so please follow these steps to get a copy you can modify and interact with:

1. Go to the **File** menu in the top left corner and select **Save a copy in Drive**.  
(If you can't see the File menu, click the **^** button in the top right corner.)
2. Close the tab with the original file `EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb`.
3. Open and follow the exercise in your new copy, named  
`Copy of EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb`, which is now saved in your Google Drive.


# Setup

## Before you start: Create a copy of this notebook

The original notebook is read-only, so please follow these steps:

1. Go to the **File** menu in the top left corner and select **Save a copy in Drive**.
   - (If you can't see the File menu, click the **^** button in the top right corner.)
2. Close the tab with the original file `EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb`.
3. Open and follow the exercise in your new copy, named `Copy of EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb`, which is now saved in your Google Drive.

## Setting Up the Environment

1. Configure our OpenAI API key
2. Install the necessary libraries in Python & download the main function and examples



### 1. Configure API Key

We will use the Secrets feature in Google Colab to securely store and access our API key.

1. Click on the **key icon &#128273;** on the left sidebar of the Colab notebook.
2. Add a new secret named `openai_api_key` with your OpenAI API key that you received by email as the value.
3. Toggle the Notebook access from &#10008; to &#10004;
4. Use the following code to access the secret in your notebook.

In [None]:
from google.colab import userdata

# Access the OpenAI API key from secrets
api_key = userdata.get('openai_api_key')

# Verify if the API key is fetched correctly
if api_key:
    print("API key fetched successfully!")
else:
    print("Failed to fetch API key. Please ensure it is set correctly.")

### 2. Install the necessary libraries in Python & download the main script and the prepared examples

In [None]:
# Install necessary python libraries
!pip install openai pandas configparser
!pip install tiktoken --only-binary :all:

import configparser
import openai
import os
from tiktoken import get_encoding

# Initialize the tokenizer for GPT-2
tokenizer = get_encoding("gpt2")

# Step 1: Check if the repository already exists
if not os.path.exists('/content/EPID731_2024'):
    # Step 1: Clone the repository
    !git clone https://github.com/statgen/EPID731_2024.git
    print("Repository cloned.")

# Step 2: Change directory to the cloned repository if not already in it
target_directory = '/content/EPID731_2024/Day4'
if os.getcwd() != target_directory:
    os.chdir(target_directory)
    print(f"Changed directory to {target_directory}")

# Step 3: Add the scripts directory to the Python path
import sys
sys.path.append('/content/EPID731_2024/Day4/scripts')

# Step 4: Execute the script to load its functions so we can use them below
exec(open("/content/EPID731_2024/Day4/scripts/gpt_line_processor.py").read())


# Using the OpenAI API

Now, we should have everything in place to interact with the OpenAI GPT.

## Check the connection to the OpenAI API

In this example, we will check if we can access the OpenAI API and list the available models.

In [None]:
from openai import OpenAI
import os

client = OpenAI(
    api_key = api_key
)

# Obtain list of available models
models = client.models.list()

# Print the list of models in a nicer format
print("Available Models:")
for model in models.data:
    print(f"Model ID: {model.id}")



We can compare the performance of four different models (price in USD):

| Model                  | Input Price per Token | Output Price per Token |
|------------------------|-----------------------|------------------------|
| gpt-3.5-turbo-0125     | 0.0000005             | 0.0000015              |
| gpt-4-turbo-2024-04-09 | 0.00001               | 0.00003                |
| gpt-4o-2024-05-13      | 0.000005              | 0.000015               |
| gpt-4o-mini-2024-07-18      | 0.00000015              | 0.0000006               |


##Check the Chat Completion function

Next, we will try the Chat Completion function of the OpenAI API.

In [None]:
import openai

# Set your OpenAI API key
openai.api_key = api_key

# Define the system and user messages
system_message = "You are an exceptionally funny stand-up comedian."
user_message = "Tell me a new joke that is based on observational humor."

# Call the OpenAI GPT model
response = client.chat.completions.create(
    model="gpt-4-turbo-2024-04-09",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ],
    temperature=0.7,
    max_tokens=150,
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0
)

# Print the response from the model
output = response.choices[0].message.content.strip()
print("Response:", output)

## Prompt Engineering Exercise

In this exercise, you will explore how to use and modify system prompts and parameters to interact with OpenAI's GPT model. The goal is to understand how different prompts and settings influence the responses generated by the model.

### System Messages:

#### Basic Prompt (`system_prompt_1.txt`)
```
Be a helpful assistant.
```

#### Simple Prompt with Instructions (`system_prompt_2.txt`)
```
Classify the medication by providing its generic name, primary use, and its pharmacological classification.
```

#### Simple Prompt with Instructions and additional Context (`system_prompt_3.txt`)
```
Classify the medication by providing its generic name, primary use, and its pharmacological classification. Ensure to align the classification with the following blood pressure medication classes: Diuretic, Beta-blocker, ACE inhibitor, Angiotensin II receptor blocker, Calcium channel blocker, Alpha-blocker, Alpha-2 receptor agonist, Combined alpha and beta-blocker, and Vasodilator.
```

#### Further Prompts:
You can find the additional prompt examples  (`system_prompt_4.txt` to  `system_prompt_9.txt`) here: `/content/EPID731_2024/Day4/prompts`


### Try these system prompts in the code cell below.

You can try various "medications":
- Hydrochlorothiazide
- Hygroton
- Diuril
- Hyzaar
- Augmentin  

Or try even non-sensical input like:
- Cardioventrix
- Nutella

### Instructions for Experimentation:

1. **Modify the System Message:**
   - Change `system_message = "Be a helpful assistant."` to any of the provided examples or create your own.
   - Example: `system_message = "Classify the medication by providing its generic name, primary use, and its pharmacological classification."`

2. **Experiment with Different Parameters:**
   - Adjust the `temperature` parameter to see how it affects the creativity of the responses.
     - Higher values (closer to 1) make the output more random.
     - Lower values (closer to 0) make the output more focused.
   - Change the `max_tokens` to control the length of the response.
   - Modify `top_p`, `frequency_penalty`, and `presence_penalty` to see their effects on the response.

3. **Try Different User Messages:**
   - Replace `user_message` with different medications from the list or create your own.
   - Example: `user_message = "Classify the medication Hydrochlorothiazide"`

Feel free to explore and modify these parameters to see how the model's responses change. Happy experimenting!


In [None]:
import openai

# Set your OpenAI API key
openai.api_key = api_key

# Define the system and user messages
system_message = "Be a helpful assistant."
user_message = """
Classify the medication Hydrochlorothiazide"
"""
# Call the OpenAI GPT model
response = client.chat.completions.create(
    model="gpt-4-turbo-2024-04-09",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ],
    temperature=0.7,
    max_tokens=150,
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0
)

# Print the response from the model
output = response.choices[0].message.content.strip()
print("Response:", output)

## The Main Script `gpt_line_processor.py`

To make things easier and more accessible, this script automates the interaction with OpenAI's GPT models. It is designed to be user-friendly, even for those who may not be familiar with Python, coding, or APIs.

### Overview

This notebook demonstrates how to use a script for processing lines using various models from OpenAI, as specified in configuration files. The script, `gpt_line_processor.py`, simplifies the process by handling all the technical details for you.

### What Does the Script Do?

The script sets up and runs a main function called `process_lines(config_file, system_prompt_file, user_prompt_file, input_file, output_location, file_prefix)`. This function does the following:

- **Reads the Configuration File:** This file contains settings for different OpenAI models.
- **Reads the Input Files:** These files contain the lines of text (e.g., medication names) you want to process.
- **Reads the Prompt Files:** These files contain instructions for the GPT models on how to respond.
- **Submits API Requests:** For each line in the input file, the script submits an API request to the specified OpenAI models. This means if you have 10 lines and 4 models, the script will make 40 API requests.
- **Processes the Responses:** It processes the responses from the GPT models and organizes them.
- **Saves the Results:** The results are saved to a specified output file in either text or CSV format.

### How to Use the Script

This script is designed to be as simple as possible. You don't need to write any code or understand how the API works. Just follow these steps:

1. **Ensure All Files Are in Place:**
   - **Configuration Files:** These files have settings for different OpenAI models. These should be in the `configs/` folder. Example paths:
     - `configs/config_example_4models.ini`
   - **Input Files:** These files have the data (lines of text) that you want to process. These should be in the `inputs/` folder. Example path:
     - `inputs/medication_example_5meds.txt`
   - **Prompt Files:** These files contain instructions for the GPT models to guide their responses. These should be in the `prompts/` folder. Example paths:
     - `prompts/system_prompt_1.txt`
     - `prompts/user_prompt.txt`

2. **Run the Script:**
   - The script will read your configuration, input, and prompt files.
   - It will send the input lines to the GPT models, using the prompts to guide their responses.
   - The responses will be saved in an output file, either as text or CSV.

3. **View Your Results:**
   - Open the output file to see the results. The script organizes everything neatly, so it's easy to read and analyze.

## Example1: Classify a Medication

You can call the main function from the script, with example files that are already provided in this Colab notebook.

Here we classify a single medication, `chlorothiazide [Diuril]`, using four models (`gpt-3.5-turbo-1106, gpt-4-turbo-2024-04-09, gpt-4o-2024-05-13, and gpt-4o-mini-2024-07-18`) with default settings.  
The user prompt is `Please classify the following medication: {input_line}` and the system prompt is `Classify the medication.`  
The results are written to a text file.

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_4models.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_1.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_1med.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example1_prompt1_1med'
)


### Finding the Output File

After running the script, you can find the output file as follows:

1. **Open the File Explorer in Colab**: Click the folder icon on the left sidebar.
2. **Refresh the File List**: Click the refresh icon at the top of the file explorer to update the file list.
3. **Navigate to `GPT_Outputs`**: Open the `GPT_Outputs` folder (`/content/GPT_Outputs`).
4. **Locate the Output File**: Look for the file named `Example1_prompt1_1med.txt` (`/content/GPT_Outputs/Example1_prompt1_1med_results.txt`).

This file contains the classification results for the medication "chlorothiazide [Diuril]" using the specified models. Open this file to view the detailed results.

We can also change the output format to CSV-format by specifying it in the config file:
`/content/EPID731_2024/Day4/configs/config_example_4models.ini`

Find the section:
```
[output_format]
# Specify the format as 'text' or 'csv'
format = text
```

and change it to:
```
[output_format]
# Specify the format as 'text' or 'csv'
format = csv
```

Then rerun the code cell above. There should be a new CSV file:
`/content/GPT_Outputs/Example1_prompt1_1med_results.csv`

## Example 2: Turning up the temperature

The `process_lines()` function is working and creates easy to parse output.
Let's use to see what happens if we modify the temperature.

We will use a different medication input file which lists the medication `Diuril` five times, so that we'll have 5 API requests.
We'll only specify one model `gpt-4o-2024-05-13` and set the temperature to `0`.
We will use the CSV output for increased readability.

The temperature is set via the config file in the secion `response_settings`:

```
[response_settings]
temperature = 0
max_tokens = 120
top_p = 1.0
frequency_penalty = 0.0
presence_penalty = 0.0
```

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_1.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_1med_x5.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example2_prompt1_1med_lowtemp'
)

You can find the results here: `/content/GPT_Outputs/Example2_prompt1_1med_lowtemp_results.csv`

The Output from each of the 5 API calls is very similar if not identical.


```
# This is formatted as code
```


Let's increase the temperature to 1.

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_high_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_1.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_1med_x5.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example2_prompt1_1med_hightemp'
)

You can find the results here: `/content/GPT_Outputs/Example2_prompt1_1med_hightemp_results.csv`

How does the output compare across API calls?

## Example 3: Prompt Development

We will use a different medication input file which lists 10 medications (actually 5 medications with one line with the generic name, and one line with the brand name):

```
Amiodarone Hydrochloride
Pacerone
Disopyramide
Norpace
Chlorothiazide
Diuril
Propafenone Hydrochloride
Rythmol
Losartan and Hydrochlorothiazide
Hyzaar
```

The model of choice is `gpt-4o-2024-05-13`.
The temperature is set to `0`
Output format is `csv` for increased readability.

### Example 3.1: Simple Prompt

```
Classify the medication.
```


In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_1.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt1_5meds'
)


Output: `/content/GPT_Outputs/Example3_prompt1_5meds_results.csv`

### Example 3.2: Prompt with **Instructions**

```
Classify the medication by providing its generic name, primary use, and its pharmacological classification.
```


In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_2.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt2_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt2_5meds_results.csv`

### Example 3.3: Additional Context
Simple Prompt with
- Instructions
- **Additional Context**

```
Classify the medication by providing its generic name, primary use, and its pharmacological classification.
Ensure to align the classification with the following blood pressure medication classes:
Diuretic, Beta-blocker, ACE inhibitor, Angiotensin II receptor blocker, Calcium channel blocker, Alpha-blocker,
Alpha-2 receptor agonist, Combined alpha and beta-blocker, and Vasodilator.
```

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_3.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt3_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt3_5meds_results.csv`

### Example 3.4: Chain-of-Thought
Structured Prompt with
- **Step-by-Step Instructions**
- Additional Context

```
Classify the medication by following these steps:

1. Identify and provide its generic name.
2. Determine its primary use.
3. Classify it according to the following blood pressure medication classes:
- Diuretic
- Beta-blocker
- ACE inhibitor
- Angiotensin II receptor blocker
- Calcium channel blocker
- Alpha-blocker
- Alpha-2 receptor agonist
- Combined alpha and beta-blocker
- Vasodilator
4. Review the classification to ensure it is accurate and complete.
5. Provide the final, refined answer concisely without detailing the step-by-step process.

```

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_4.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt4_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt4_5meds_results.csv`

### Example 3.5: Example Output Format
Structured Prompt with
- Step-by-Step Instructions
- Additional Context
- **Example Output Format**

```
Classify the medication by following these steps:

1. Identify and provide its generic name.
2. Determine its primary use.
3. Classify it according to the following blood pressure medication classes:
- Diuretic
- Beta-blocker
- ACE inhibitor
- Angiotensin II receptor blocker
- Calcium channel blocker
- Alpha-blocker
- Alpha-2 receptor agonist
- Combined alpha and beta-blocker
- Vasodilator
4. Review the classification to ensure it is accurate and complete.
5. Provide the final, refined answer concisely without detailing the step-by-step process.

Present the results in the following text format for easy readability and parsability:

Generic Name: [Generic Name]
Primary Use: [Primary Use]
Classification: [Classification]
```

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_5.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt5_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt5_5meds_results.csv`

### Example 3.6: Fallback Options
Structured Prompt with
- Step-by-Step Instructions
- Additional Context
- Example Output Format
- **Fallback Classification**

```
Classify the medication by following these steps:

1. Identify and provide its generic name.
2. Determine its primary use.
3. Classify it according to the following blood pressure medication classes:
- Diuretic
- Beta-blocker
- ACE inhibitor
- Angiotensin II receptor blocker
- Calcium channel blocker
- Alpha-blocker
- Alpha-2 receptor agonist
- Combined alpha and beta-blocker
- Vasodilator
4. If the medication does not fit the listed categories, classify it as Other.
5. If the medication is unknown, classify it as Unknown.
6. Review the classification to ensure it is accurate and complete.
7. Provide the final, refined answer concisely without detailing the step-by-step process.

Present the results in the following text format for easy readability and parsability:

Generic Name: [Generic Name or Unknown]
Primary Use: [Primary Use or Unknown]
Classification: [Classification or Other or Unknown]

```


In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_6.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt6_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt6_5meds_results.csv`

### Example 3.7: Multiple Outcomes
Structured Prompt with
- Step-by-Step Instructions
- Additional Context
- Example Output Format
- Fallback Classification
- **Allow Multiple Outcomes**

```
Classify the medication by following these steps:

1. Identify and provide its generic name(s), comma-separated.
2. Determine its primary use(s), comma-separated.
3. Classify it according to the following blood pressure medication classes:
- Diuretic
- Beta-blocker
- ACE inhibitor
- Angiotensin II receptor blocker
- Calcium channel blocker
- Alpha-blocker
- Alpha-2 receptor agonist
- Combined alpha and beta-blocker
- Vasodilator
4. If the medication does not fit the listed categories, classify it as Other.
5. If the medication is unknown, classify it as Unknown.
6. If the medication fits into multiple classes, report all comma-separated.
7. Review the classification to ensure it is accurate and complete.
8. Provide the final, refined answer concisely without detailing the step-by-step process.

Present the results in the following text format for easy readability and parsability:

Generic Name: [Generic Name(s) or Unknown]
Primary Use: [Primary Use(s) or Unknown]
Classification: [Classification(s) or Other or Unknown]

```

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_7.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt7_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt7_5meds_results.csv`

### Example 3.8: Defined Role
Structured Prompt with
- Step-by-Step Instructions
- Additional Context
- Example Output Format
- Fallback Classification
- Allow Multiple Outcomes
- **Defined Role**

```
As a pharmaceutical expert with extensive knowledge in medication classification and pharmacology,
classify the medication by following these steps:

1. Identify and provide its generic name(s), comma-separated.
2. Determine its primary use(s), comma-separated.
3. Classify it according to the following blood pressure medication classes:
- Diuretic
- Beta-blocker
- ACE inhibitor
- Angiotensin II receptor blocker
- Calcium channel blocker
- Alpha-blocker
- Alpha-2 receptor agonist
- Combined alpha and beta-blocker
- Vasodilator
4. If the medication does not fit the listed categories, classify it as Other.
5. If the medication is unknown, classify it as Unknown.
6. If the medication fits into multiple classes, report all comma-separated.
7. Review the classification to ensure it is accurate and complete.
8. Provide the final, refined answer concisely without detailing the step-by-step process.

Present the results in the following text format for easy readability and parsability:

Generic Name: [Generic Name(s) or Unknown]
Primary Use: [Primary Use(s) or Unknown]
Classification: [Classification(s) or Other or Unknown]

```

In [None]:
process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_8.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt8_5meds'
)

Output: `/content/GPT_Outputs/Example3_prompt8_5meds_results.csv`

### Example 3.9: Formatted
Structured Prompt with
- Step-by-Step Instructions
- Additional Context
- Example Output Format
- Fallback Classification
- Allow Multiple Outcomes
- Defined Role
- **Formatted**

```
## Role
Pharmaceutical Expert with extensive knowledge in medication classification and pharmacology

## Objective
Classify a medication by identifying and providing its generic name(s), determining its primary use(s), and classifying it according to the listed blood pressure medication classes.

## Chain-of-Thought
1. Identify and provide its generic name(s), comma-separated.
2. Determine its primary use(s), comma-separated.
3. Classify it according to the following blood pressure medication classes:
   - Diuretic
   - Beta-blocker
   - ACE inhibitor
   - Angiotensin II receptor blocker
   - Calcium channel blocker
   - Alpha-blocker
   - Alpha-2 receptor agonist
   - Combined alpha and beta-blocker
   - Vasodilator
4. If the medication does not fit the listed categories, classify it as Other.
5. If the medication is unknown, classify it as Unknown.
6. If the medication fits into multiple classes, report all comma-separated.
7. Review the classification to ensure it is accurate and complete.
8. Provide the final, refined answer concisely without detailing the step-by-step process.

## Output Format
Generic Name: [Generic Name(s) or Unknown]  
Primary Use: [Primary Use(s) or Unknown]  
Classification: [Classification(s) or Other or Unknown]

```


We'll also measure and log the time taken by the function to process 5 medications:

In [None]:
exec(open("/content/EPID731_2024/Day4/scripts/gpt_line_processor.py").read())


In [None]:
import time

# Record the start time
start_time = time.time()

process_lines(
    config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
    system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_9.txt',
    user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
    input_file='/content/EPID731_2024/Day4/inputs/medication_example_5meds.txt',
    output_location='/content/GPT_Outputs',
    file_prefix='Example3_prompt9_5meds'
)

# Calculate and display the elapsed time
elapsed_time = time.time() - start_time
print(f"Total time taken for processing: {elapsed_time:.2f} seconds.")

Output: `/content/GPT_Outputs/Example3_prompt9_5meds_results.csv`

# Example 4: Asynchronous API Calls for Medication Classification

In this advanced exercise, we will classify 158 known blood pressure medications using asynchronous API calls. This technique is essential for handling large datasets efficiently, allowing multiple requests to be processed in parallel rather than sequentially. This approach is particularly advantageous in scenarios requiring rapid data processing and minimal response times.


## Script Overview

The script (`gpt_process_batches.py`) is designed to optimize API usage by batching requests. This method differs significantly from traditional synchronous processing by sending multiple requests simultaneously, thus accelerating the response time and enhancing overall efficiency.

**Step 1: Load and Execute the Script**

First, we load the necessary functions from an external Python script, then define an asynchronous function to manage our batch processing:


In [None]:
# Import the external script containing batch processing functions
exec(open("/content/EPID731_2024/Day4/scripts/gpt_process_batches.py").read())

# Define the asynchronous function to handle batch processing
async def run():
    await process_batches(
        config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
        system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_9.txt',
        user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
        input_file='/content/EPID731_2024/Day4/inputs/medication_example_aha.txt',
        output_location='/content/GPT_Outputs',
        file_prefix='Example4_prompt9_aha_meds',
        chunk_size=100  # Adjust based on API rate limits and performance needs
    )

**Step 2: Running the Classification**

Execute the following command to start processing the medication data asynchronously. This action utilizes the script settings previously defined:

We'll also measure and log the time taken by the function to process the batch requests:


In [None]:
import time

# Record the start time
start_time = time.time()

# Execute the asynchronous batch processing
await run()

# Calculate and display the elapsed time
elapsed_time = time.time() - start_time
print(f"Total time taken for processing: {elapsed_time:.2f} seconds.")

# Reload the original script to not run into any conflicts when exploring the other examples that use a loop:
exec(open("/content/EPID731_2024/Day4/scripts/gpt_line_processor.py").read())


## Output and Analysis

The output can be found here:
`/content/GPT_Outputs/Example4_prompt9_aha_meds_gpt-4o-2024-05-13_results.csv`

Asynchronous operations may result in outputs that do not align with the sequence of inputs due to the non-linear processing of tasks. This is a common characteristic of asynchronous systems, influenced by factors such as API load and network conditions.

### Key Points to Consider

- **Efficiency**: Asynchronous processing markedly improves efficiency for operations involving large datasets.
- **Scalability**: This approach is highly scalable, accommodating larger datasets and higher volumes of requests effectively. But costs ramp up quickly.


# Example 5: Asynchronous API Calls for ATC Medication Classification

In this advanced exercise, we will classify all the drug_concept_names from the Day 2 Synthetics EHR data example.

In [None]:
# Import the external script containing batch processing functions
exec(open("/content/EPID731_2024/Day4/scripts/gpt_process_batches.py").read())

# Define the asynchronous function to handle batch processing
async def run():
    await process_batches(
        config_file='/content/EPID731_2024/Day4/configs/config_example_low_temperature.ini',
        system_prompt_file='/content/EPID731_2024/Day4/prompts/system_prompt_9.txt',
        user_prompt_file='/content/EPID731_2024/Day4/prompts/user_prompt.txt',
        input_file='/content/EPID731_2024/Day4/inputs/unique_drug_concept_names.txt',
        output_location='/content/GPT_Outputs',
        file_prefix='Example5_prompt9_rxcorm_concept_names',
        chunk_size=100  # Adjust based on API rate limits and performance needs
    )


In [None]:
import time

# Record the start time
start_time = time.time()

# Execute the asynchronous batch processing
await run()

# Calculate and display the elapsed time
elapsed_time = time.time() - start_time
print(f"Total time taken for processing: {elapsed_time:.2f} seconds.")

# Reload the original script to not run into any conflicts when exploring the other examples that use a loop:
exec(open("/content/EPID731_2024/Day4/scripts/gpt_line_processor.py").read())

# Reflection: AI-Assisted Classification of Medications

## Evaluate AI Classifications
**Activity**: Compare the AI-generated classifications of 158 blood pressure medications with the categories and treatments listed on the American Heart Association's [Types of Blood Pressure Medications](https://www.heart.org/en/health-topics/high-blood-pressure/changes-you-can-make-to-manage-high-blood-pressure/types-of-blood-pressure-medications) page.

### Discussion Points
- **Accuracy**: Assess the precision of AI classifications. Are the results consistent with established medical standards?
- **Discrepancies**: Identify any discrepancies between AI results and AHA information. Discuss possible reasons for these differences.
- **Cost Analysis**: Reflect on the cost-effectiveness of using AI for this classification task. Was the financial investment justified by the outcomes?

## Understand AI Training and Limitations
Critical discussion on the capabilities and limitations of the GPT model used in this experiment.

### Questions for Reflection
- **Training Data Scope**: Consider whether the GPT model had exposure to specific medical content like that on the AHA page. How does the scope of training data influence the model's performance?
- **Handling Novel Data**: Discuss the model's potential performance with less common or newly introduced hypertension medications. What challenges might arise?
- **Human Oversight**: Explore strategies for integrating human judgment in evaluating AI performance. How can medical professionals remain actively involved to ensure the accuracy and reliability of AI applications?