![electronic_medical_records](electronic_medical_records.png)

Medical professionals often summarize patient encounters in transcripts written in natural language, which include details about symptoms, diagnosis, and treatments. These transcripts can be used for other medical documentation, such as for insurance purposes, but as they are densely packed with medical information, extracting the key data accurately can be challenging.  

You and your team at Lakeside Healthcare Network have decided to leverage the OpenAI API to automatically extract medical information from these transcripts and automate the matching with the appropriate ICD-10 codes. ICD-10 codes are a standardized system used worldwide for diagnosing and billing purposes, such as insurance claims processing.

## The Data
The dataset contains anonymized medical transcriptions categorized by specialty.

## transcriptions.csv
| Column     | Description              |
|------------|--------------------------|
| `"medical_specialty"` | The medical specialty associated with each transcription.  |
| `"transcription"` | Detailed medical transcription texts, with insights into the medical case. |

In [93]:
# Import the necessary libraries
import pandas as pd
from openai import OpenAI
import json

In [94]:
# Load the data
df = pd.read_csv("data/transcriptions.csv")
df.head()

Unnamed: 0,medical_specialty,transcription
0,Allergy / Immunology,"SUBJECTIVE:, This 23-year-old white female pr..."
1,Orthopedic,"CHIEF COMPLAINT:, Achilles ruptured tendon.,H..."
2,Bariatrics,"PREOPERATIVE DIAGNOSIS: , Morbid obesity.,POST..."
3,Cardiovascular / Pulmonary,"PREOPERATIVE DIAGNOSES,Airway obstruction seco..."
4,Urology,"CHIEF COMPLAINT:, Urinary retention.,HISTORY ..."


In [95]:
# Initialize the OpenAI client
client = OpenAI()


# Define a function to retrieve age and recommended treatment
def get_age_rt(record):
    # Format the messages. Assume role and provide content.
    messages = [{"role": "user",
                 "content": record.to_json()}]
    # Define the function, extract the needed columns.
    function_definition = [{'type': 'function',
                            'function': {
                                        'name': 'extract_age_speciality',
                                        'description': '''From the Pandas DataFrame, extract age and 
                                                          recommended_treatment''',
                                        'parameters': {'type': 'object',
                                            'properties':{
                                                'age': {'type': 'string', 'description': 'Age'},
                                                # 'medical_specialty': {'type': 'string', 'description': 'Medical Specialty'},
                                                'recommended_treatment':{'type': 'string', 'description': '''Recommended 
                                                                          treatment or procedure'''},
                                                          }
                                        }}
                            }]
    
    response = client.chat.completions.create(
                    model="gpt-4o-mini",
                    messages=messages,
                    tools=function_definition) 
    # Convert the string to dict
    return json.loads(response.choices[0].message.tool_calls[0].function.arguments)

def get_codes(rec):
    # Format the message for instructions and input
    msg = f'''Provide the ICD codes for the following treatment or procedure: {rec}. Return response as a list. Return response in json format with field name as icd_code. Only include the codes and nothing else in your response.'''
    response =  client.chat.completions.create(
                                        model="gpt-4o-mini",
                                        messages=[{"role": "user", "content": msg}],
                                        response_format={"type": "json_object"}, 
                                        temperature=0.3
                                        )
    # Return the response as dict
    return json.loads(response.choices[0].message.content)



return_data_list = []

# Process each row in the df
for i,v in df.iterrows():
    # Get the age and recommended treatment
    rep = get_age_rt(v)
    
    # Add the medical specialty
    rep['medical_specialty'] = v['medical_specialty']
    
    # Get the icd code from recommended treatment
    res = get_codes(rep['recommended_treatment'])
    
    # Update the original dict with icd code
    rep.update(res)
    print(rep)
    return_data_list.append(rep)
    
df_structured = pd.DataFrame(return_data_list)
df_structured
   

{'age': '23', 'recommended_treatment': 'Zyrtec, Nasonex', 'medical_specialty': 'Allergy / Immunology', 'icd_code': ['J30.1', 'J30.2']}
{'age': '41', 'recommended_treatment': 'operative fixation', 'medical_specialty': 'Orthopedic', 'icd_code': ['S02.1', 'S02.2', 'S02.3', 'S02.4', 'S02.5', 'S02.6', 'S02.7', 'S02.8', 'S02.9', 'S22.0', 'S22.1', 'S22.2', 'S22.3', 'S22.4', 'S22.5', 'S22.6', 'S22.7', 'S22.8', 'S22.9', 'S42.1', 'S42.2', 'S42.3', 'S42.4', 'S42.5', 'S42.6', 'S42.7', 'S42.8', 'S42.9', 'S52.1', 'S52.2', 'S52.3', 'S52.4', 'S52.5', 'S52.6', 'S52.7', 'S52.8', 'S52.9', 'S72.1', 'S72.2', 'S72.3', 'S72.4', 'S72.5', 'S72.6', 'S72.7', 'S72.8', 'S72.9', 'S82.1', 'S82.2', 'S82.3', 'S82.4', 'S82.5', 'S82.6', 'S82.7', 'S82.8', 'S82.9']}
{'age': '30', 'recommended_treatment': 'Laparoscopic antecolic antegastric Roux-en-Y gastric bypass', 'medical_specialty': 'Bariatrics', 'icd_code': ['Z98.84', 'E66.01', 'E66.9']}
{'age': '50', 'recommended_treatment': 'tracheostomy, urgent flexible bronchosco

Unnamed: 0,age,recommended_treatment,medical_specialty,icd_code
0,23,"Zyrtec, Nasonex",Allergy / Immunology,"[J30.1, J30.2]"
1,41,operative fixation,Orthopedic,"[S02.1, S02.2, S02.3, S02.4, S02.5, S02.6, S02..."
2,30,Laparoscopic antecolic antegastric Roux-en-Y g...,Bariatrics,"[Z98.84, E66.01, E66.9]"
3,50,"tracheostomy, urgent flexible bronchoscopy, re...",Cardiovascular / Pulmonary,"[0B110F4, 0B110FZ, 0B110F0, 0B110F1, 0B110F2]"
4,66,Flomax and Proscar,Urology,"[N40.0, N40.1, N40.9, N41.0, N41.1, N41.9]"
