![electronic_medical_records](electronic_medical_records.png)

Medical professionals often summarize patient encounters in transcripts written in natural language, which include details about symptoms, diagnosis, and treatments. These transcripts can be used for other medical documentation, such as for insurance purposes, but as they are densely packed with medical information, extracting the key data accurately can be challenging.  

You and your team at Lakeside Healthcare Network have decided to leverage the OpenAI API to automatically extract medical information from these transcripts and automate the matching with the appropriate ICD-10 codes. ICD-10 codes are a standardized system used worldwide for diagnosing and billing purposes, such as insurance claims processing.

## The Data
The dataset contains anonymized medical transcriptions categorized by specialty.

## transcriptions.csv
| Column     | Description              |
|------------|--------------------------|
| `"medical_specialty"` | The medical specialty associated with each transcription.  |
| `"transcription"` | Detailed medical transcription texts, with insights into the medical case. |


## Before you start

In order to complete the project you will need to create a developer account with OpenAI and store your API key as a secure environment variable. Instructions for these steps are outlined below.

### Create a developer account with OpenAI

1. Go to the [API signup page](https://platform.openai.com/signup). 

2. Create your account (you'll need to provide your email address and your phone number).

3. Go to the [API keys page](https://platform.openai.com/account/api-keys). 

4. Create a new secret key.

<img src="images/openai-new-secret-key.png" width="200">

5. **Take a copy of it**. (If you lose it, delete the key and create a new one.)

### Add a payment method

OpenAI sometimes provides free credits for the API, but this can vary depending on geography. You may need to add debit/credit card details. 

**This project should cost less than 1 US cents with GPT-3.5-Turbo (but if you rerun tasks, you will be charged every time).**

1. Go to the [Payment Methods page](https://platform.openai.com/account/billing/payment-methods).

2. Click Add payment method.

<img src="images/openai-add-payment-method.png" width="200">

3. Fill in your card details.

### Setting an Environmental Variable for Your OpenAI Key

Follow these steps to add an environmental variable for your OpenAI API key using `dotenv` in Python:

1. **Install python-dotenv** (if not already installed)
   ```sh
   pip install python-dotenv
   ```

2. **Create a `.env` File**
   - In your project directory, create a new file named `.env`.
   - Add the following line to the file:
     ```
     OPENAI_API_KEY=your_secret_key_here
     ```

3. **Load and Use the Environment Variable in Python**
   - In your Python script, use the following code:
     ```python
     import os
     from dotenv import load_dotenv
     
     load_dotenv()
     
     openai_api_key = os.getenv("OPENAI_API_KEY")
     print("OpenAI API Key:", openai_api_key)
     ```

4. **Ensure the `.env` File is Not Committed**
   - Add the following line to your `.gitignore` file to prevent accidental exposure of sensitive information:
     ```
     # Ignore environment variables file
     .env
     ```

Your OpenAI API key is now securely stored and accessible in your scripts!


In [1]:
# Import the necessary libraries
import pandas as pd
from openai import OpenAI
import json
import os
from dotenv import load_dotenv
     
load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")

In [2]:
# Load the data
df = pd.read_csv("data/transcriptions.csv")
df.head()

Unnamed: 0,medical_specialty,transcription
0,Allergy / Immunology,"SUBJECTIVE:, This 23-year-old white female pr..."
1,Orthopedic,"CHIEF COMPLAINT:, Achilles ruptured tendon.,H..."
2,Bariatrics,"PREOPERATIVE DIAGNOSIS: , Morbid obesity.,POST..."
3,Cardiovascular / Pulmonary,"PREOPERATIVE DIAGNOSES,Airway obstruction seco..."
4,Urology,"CHIEF COMPLAINT:, Urinary retention.,HISTORY ..."


In [3]:
# Initialize the OpenAI client with the API key
client = OpenAI()

In [4]:
# Define function to extract age and recommended treatment/procedure
def extract_info_with_openai(transcription):
    """Extracts age and recommended treatment or procedure from a transcription using OpenAI."""
    messages = [
        {
            "role": "system",
            "content":"You are a healthcare professional and need to get the age and recommended treatment or procedure from a medical record transcript. Always return both age and recommended treatment or procedure: if any of the fields is missing in the transcript, return Not Found.",
            "role": "user",
            "content": f"Return the age and recommended treatment or procedure for the patients from the body of the following transcription: {transcription}. "
        }
    ]
    function_definition = [
        {
            'type': 'function',
            'function': {
                'name': 'extract_medical_data',
                'description': 'Get the age and recommended treatment or procedure from the input text. Always return both age and recommended treatment or procedure: if any of the fields is missing in the transcript, return Not Found.',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'Age': {
                            'type': 'integer',
                            'description': 'Age of the patient'
                        },
                        'Recommended Treatment/Procedure': {
                            'type': 'string',
                            'description': 'Recommended treatment or procedure for the patient'
                        }
                    }
                }
            }
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        tools=function_definition
    )
    return json.loads(response.choices[0].message.tool_calls[0].function.arguments)

In [5]:
# Define function to extract age and recommended treatment/procedure
def get_icd_codes(treatment):
    """Retrieves ICD codes for a given treatment using OpenAI."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{
            "role": "user",
            "content": f"Provide the ICD codes for the following treatment or procedure: {treatment}. Return the answer as a list of codes with corresponding definition."
        }],
        temperature=0.3
    )
    return response.choices[0].message.content

In [6]:
# Start an empty list to store processed data
processed_data = []

# Process each row in the DataFrame
for index, row in df.iterrows():
    transcription = row['transcription']
    medical_specialty = row['medical_specialty']
    extracted_data = extract_info_with_openai(transcription)
    icd_code = get_icd_codes(extracted_data["Recommended Treatment/Procedure"])
    extracted_data["Medical Specialty"] = medical_specialty
    extracted_data["ICD Code"] = icd_code

    # Append the extracted information as a new row in the list
    processed_data.append(extracted_data)

# Convert the list to a DataFrame
df_structured = pd.DataFrame(processed_data)

In [7]:
# Display the DataFrame
df_structured.head()

Unnamed: 0,Age,Recommended Treatment/Procedure,Medical Specialty,ICD Code
0,23,Zyrtec,Allergy / Immunology,1. J30.1 - Allergic rhinitis due to pollen\n2....
1,41,Operative fixation,Orthopedic,1. ICD-10-PCS code: 0SG00ZZ - Insertion of int...
2,30,Laparoscopic antecolic antegastric Roux-en-Y g...,Bariatrics,ICD-10 Procedure Code: 0DBJ8ZZ\nDefinition: By...
3,50,Neck exploration; tracheostomy; urgent flexibl...,Cardiovascular / Pulmonary,1. Neck exploration - ICD-10-PCS code: 0JH60ZZ...
4,66,Flomax and Proscar,Urology,1. Flomax - ICD-10 code: Z79.891 - Long term (...


In [8]:
# Save the DataFrame as a CSV file
csv_filename = "data/structured_data.csv"
df_structured.to_csv(csv_filename, index=False)

In [9]:
# Load the structured CSV file
csv_filename = "data/structured_data.csv"

# Read the CSV file into a DataFrame
df_loaded = pd.read_csv(csv_filename)

In [10]:
# Display the DataFrame
df_loaded.head()

Unnamed: 0,Age,Recommended Treatment/Procedure,Medical Specialty,ICD Code
0,23,Zyrtec,Allergy / Immunology,1. J30.1 - Allergic rhinitis due to pollen\n2....
1,41,Operative fixation,Orthopedic,1. ICD-10-PCS code: 0SG00ZZ - Insertion of int...
2,30,Laparoscopic antecolic antegastric Roux-en-Y g...,Bariatrics,ICD-10 Procedure Code: 0DBJ8ZZ\nDefinition: By...
3,50,Neck exploration; tracheostomy; urgent flexibl...,Cardiovascular / Pulmonary,1. Neck exploration - ICD-10-PCS code: 0JH60ZZ...
4,66,Flomax and Proscar,Urology,1. Flomax - ICD-10 code: Z79.891 - Long term (...
