## Detecting Adverse Drug Reactions Using LLMs

```
OpenAI's GPT-4o model was fine-tuned using both training and test datasets, with the process documented on OpenAI's developer platform. This fine-tuning approach allows for customization of the model's performance for specific use cases while maintaining the robust capabilities of the base GPT-4o architecture.

The data preparation process involved converting CSV datasets into JSONL (JSON Lines) format, which is the required input format for OpenAI's fine-tuning pipeline. This transformation step was essential because JSONL format allows each training example to be structured as a complete JSON object on its own line, making it optimal for batch processing during model training. The conversion process mapped the original CSV columns to the appropriate JSON schema expected by the fine-tuning system, ensuring that each training example was properly formatted with the correct message structure and roles.

This data preprocessing step was crucial for maintaining data integrity and ensuring compatibility with OpenAI's fine-tuning infrastructure, ultimately enabling the successful customization of the GPT-4o model for the specific use case requirements.
```

In [1]:
# Import packages:

import os
import pandas as pd
import numpy as np
import seaborn as sns
# os.chdir('/project/mousavi_lab/msba24/class_shared')

In [11]:
kaggle = pd.read_csv("val_DrugExp.csv")

In [4]:
pip install openai

Note: you may need to restart the kernel to use updated packages.


In [None]:
from tqdm.auto import tqdm
import openai
from openai import OpenAI

tqdm.pandas()

client = OpenAI(api_key="OpenAI_Key")

def predict_label_gpt4o(text, model_id):
    response = client.chat.completions.create(
        model=model_id,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a medical practitioner specialized in pharmacovigilance. "
                    "When provided with a patient’s drug experience, respond with “1” if it "
                    "describes an adverse or severe drug reaction, or “0” otherwise. "
                    "DO NOT output any additional text or punctuation."
                )
            },
            {
                "role": "user",
                "content": f"Patient report:\n\n{text}\n\nLabel (0 or 1):"
            }
        ],
        temperature=0,    
        max_tokens=1,     
        stop=["\n"]       
    )
    return response.choices[0].message.content.strip()

model_id = "ft:gpt-4o-2024-08-06:personal:adr:BSqDpkP9"
kaggle["label"] = kaggle["text"].apply(lambda x: predict_label_gpt4o(x, model_id))

In [17]:
submission_df = kaggle[["text", "label"]]
submission_df.to_csv("submission11.csv", index=False)