
<h1><center>READ MEDICAL TRANSCRIPTIONS</h1>
<h2><center>Function Calling</center></h2>
<h3><center>Build AI Apps - Beginner Level</center></h3>

## Before you start

In order to complete the project you will need to create a developer account with OpenAI and store your API key as an environment variable. Instructions for these steps are outlined below.

### Create a developer account with OpenAI

1. Go to the [API signup page](https://platform.openai.com/signup). 

2. Create your account (you'll need to provide your email address and your phone number).

3. Go to the [API keys page](https://platform.openai.com/account/api-keys). 

4. Create a new secret key.


5. **Take a copy of it**. (If you lose it, delete the key and create a new one.)

### Add a payment method

OpenAI sometimes provides free credits for the API, but this can vary based on geography. You may need to add debit/credit card details. 

**Using the `gpt-3.5-turbo` model in this project should incur a cost less than 1 US cent (but if you rerun tasks, you will be charged every time).** For more information on pricing, see [OpenAI's pricing page](https://openai.com/pricing).

1. Go to the [Payment Methods page](https://platform.openai.com/account/billing/payment-methods).

2. Click Add payment method.

3. Fill in your card details.

### Install open ai library

In [44]:
# !pip install openai

### Load Open AI Key

In [45]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPEN_API_KEY=os.getenv('OPENAI_API_KEY')



### Create Client

In [46]:

from openai import OpenAI

# Define Model
model="gpt-4o-mini"

#define client
client = OpenAI(api_key=OPEN_API_KEY)

## Make Your First Call

In [47]:
# response=client.chat.completions.create(model=model,
#                               messages=[{"role":"user","content":"Tell me more about gravity like I am a 5 year old kid in 2 lines"}
#                                        ]
#                               )

# print(response.choices[0].message.content)

# READ MEDICAL TRANSCRIPTIONS

**Data Source**
- https://mtsamples.com/

- https://www.kaggle.com/datasets/tboyle10/medicaltranscriptions/data

## Read data from the source file

In [60]:
import pandas as pd

In [61]:
df= pd.read_csv("transcriptions.csv")

In [62]:
df.head()

Unnamed: 0,medical_specialty,transcription
0,Allergy / Immunology,"SUBJECTIVE:, This 23-year-old white female pr..."
1,Bariatrics,"PAST MEDICAL HISTORY:, He has difficulty climb..."
2,Bariatrics,"HISTORY OF PRESENT ILLNESS: , I have seen ABC ..."
3,Cardiovascular / Pulmonary,"2-D M-MODE: , ,1. Left atrial enlargement wit..."
4,Cardiovascular / Pulmonary,1. The left ventricular cavity size and wall ...


In [59]:
transcription=df['transcription'][0]
transcription

'SUBJECTIVE:,  This 23-year-old white female presents with complaint of allergies.  She used to have allergies when she lived in Seattle but she thinks they are worse here.  In the past, she has tried Claritin, and Zyrtec.  Both worked for short time but then seemed to lose effectiveness.  She has used Allegra also.  She used that last summer and she began using it again two weeks ago.  It does not appear to be working very well.  She has used over-the-counter sprays but no prescription nasal sprays.  She does have asthma but doest not require daily medication for this and does not think it is flaring up.,MEDICATIONS: , Her only medication currently is Ortho Tri-Cyclen and the Allegra.,ALLERGIES: , She has no known medicine allergies.,OBJECTIVE:,Vitals:  Weight was 130 pounds and blood pressure 124/78.,HEENT:  Her throat was mildly erythematous without exudate.  Nasal mucosa was erythematous and swollen.  Only clear drainage was seen.  TMs were clear.,Neck:  Supple without adenopathy.,

In [52]:
def extract_info_with_openai(transcription):
    # prompt message
    messages=messages = [
            {
                "role": "system",
                "content": "You are a healthcare professional extracting structured patient data. Always return all of these fields: Age, Gender, Symptoms, Diagnoses, Medications, Tests, and Recommended Treatment. If information is not found, return 'Unknown' for strings or an empty list for list-type fields."
            },
            {
                "role": "user",
                "content": f"Extract structured data from the following transcription:\n{transcription}"
            }
        ]
    
    #function_definition
    function_definition = [
            {
                'type': 'function',
                'function': {
                    'name': 'extract_medical_data',
                    'description': 'Extracts structured medical information from a clinical transcription.',
                    'parameters': {
                        'type': 'object',
                        'properties': {
                            'Age': {'type': 'integer'},
                            'Gender': {'type': 'string'},
                            'Symptoms': {'type': 'array', 'items': {'type': 'string'}},
                            'Diagnoses': {'type': 'array', 'items': {'type': 'string'}},
                            'Medications': {'type': 'array', 'items': {'type': 'string'}},
                            'Tests': {'type': 'array', 'items': {'type': 'string'}},
                            'Recommended Treatment/Procedure': {'type': 'string'}
                        },
                    }
                }
            }
        ]
    
    response=client.chat.completions.create(
        model=model,
        messages=messages,
        tools=function_definition
    )
    
    extracted_data=json.loads(response.choices[0].message.tool_calls[0].function.arguments)
    return extracted_data

## Extract Data From Dataframe

In [53]:
processed_data=[]

for index, row in df.iterrows():
    medical_specialty=row['medical_specialty']
    extracted_data=extract_info_with_openai(row['transcription'])
    extracted_data["Medical Specialty"] = medical_specialty

    processed_data.append(extracted_data)


In [54]:
# convert list to a dataframe
df_structured=pd.DataFrame(processed_data)

In [55]:
df_structured

Unnamed: 0,Age,Gender,Symptoms,Diagnoses,Medications,Tests,Recommended Treatment/Procedure,Medical Specialty
0,23,Female,"[allergies, nasal mucosa erythematous and swol...",[Allergic rhinitis],"[Ortho Tri-Cyclen, Allegra]",[],Try Zyrtec instead of Allegra. Use loratadine ...,Allergy / Immunology
1,,Unknown,"[difficulty climbing stairs, difficulty with a...",[gastroesophageal reflux disease],[None],[],Unknown,Bariatrics
2,42,Unknown,"[Sluggishness, Gets tired quickly, Difficulty ...","[Obesity, High cholesterol, High blood pressur...","[Diovan, Crestor, Tricor, Chantix]","[Upper endoscopy, H. pylori testing, Thyroid f...",Laparoscopic Roux-en-Y gastric bypass,Bariatrics
3,,Unknown,[],"[Left atrial enlargement, Mild mitral regurgit...",[],"[2-D M-MODE, DOPPLER]",Unknown,Cardiovascular / Pulmonary
4,,Unknown,"[mild aortic valve stenosis, mitral regurgitat...",[hyperdynamic left ventricular systolic functi...,[],[echocardiogram],monitoring and possibly further evaluation,Cardiovascular / Pulmonary
5,30,Female,[],[Morbid obesity],[],[],Laparoscopic antecolic antegastric Roux-en-Y g...,Bariatrics
6,31,Female,[],"[Deformity, right breast reconstruction, Exces...",[],[],"Revision of right breast reconstruction, excis...",Bariatrics
7,,Unknown,[],[],[],[2-D Echocardiogram],Unknown,Cardiovascular / Pulmonary
8,Unknown,Unknown,[],[Lipodystrophy of the abdomen and thighs],[Lactated Ringers],[],Suction-assisted lipectomy,Bariatrics
