# Reasoning for Validation

* Generate a synthetic dataset of medical data that contains inconsistencies.
* Define a function that takes in a row of data and validates its accuracy
* Run the validation process and compute accuracy metrics.
* Analyze and interpret the results.

In [2]:
from openai import OpenAI
import json
from IPython.display import display, HTML
from sklearn.metrics import precision_score, recall_score, f1_score
from concurrent.futures import ThreadPoolExecutor, as_completed
import csv
import pandas as pd
from dotenv import load_dotenv

load_dotenv()

client = OpenAI()
MODEL = "gpt-4o-mini"

# Synthetic Data Generation

In [3]:
def generate_data():
    messages = [
        {
            "role": "user",
            "content": """You are a helpful assistant designed to generate data. You will be given a format for the data to generate and some examples of the data.

            When generating Patient IDs, use the format 'P' followed by a three-digit number (e.g., P006, P941, P319).

            Intentionally make some mistakes in the data generation and document them in the appropriate columns ('Is Valid' and 'Issue') if the row of data is invalid.

            The types of mistakes to include are:

            - **Allergy Contradictions**: Prescribing a medication that the patient is allergic to (e.g., prescribing Penicillin to a patient allergic to Penicillin).
            - **Medical History and Medication Mismatch**: A patient with a medical condition not receiving appropriate medication (e.g., a diabetic patient not prescribed any diabetes medication).
            - **Lab Results and Diagnosis Mismatch**: Lab results that do not support the diagnosis (e.g., normal glucose levels but diagnosed with Diabetes Type 2).
            - **Other Plausible Mistakes**: Any other realistic errors that could occur in medical records, such as incorrect gender entries, impossible dates of birth, or inconsistent treatment plans.

            Ensure that when 'Is Valid' is 'False', the 'Issue' column clearly explains the problem.

            Return 20 rows of data for the user. Your response should strictly be in the format of a valid CSV.

            Generate Synthetic Medical Records Dataset with the following columns:
                - Patient ID: A randomly generated patient id
                - Date of Birth: Date of birth of the patient
                - Gender: M/F
                - Medical History: Past diagnoses
                - Current Medications: Medication the patient is taking
                - Allergies: Identified allergies
                - Lab Results (Glucose mg/dL)
                - Diagnoses: Current diagnosis
                - Treatment Plan: Current treatment plan
                - Is Valid: Whether or not the current row of data is valid (True/False)
                - Issue: If the row of data is not valid, what the issue is

            Patient ID,Date of Birth,Gender,Medical History,Current Medications,Allergies,Lab Results (Glucose mg/dL),Diagnoses,Treatment Plan,Is Valid,Issue
            P001,1980-05-14,M,Hypertension,Lisinopril,None,110,Hypertension,Continue Lisinopril,True,
            P002,1975-11-30,F,Diabetes Type 2,Metformin,Penicillin,90,Diabetes Type 2,Continue Metformin,True,
            P003,1990-07-22,F,Asthma,Albuterol,Aspirin,85,Asthma,Prescribe Albuterol,True,
            P004,2000-03-10,M,None,Amoxicillin,Penicillin,95,Infection,Prescribe Amoxicillin,False,Prescribed Amoxicillin despite Penicillin allergy
            P005,1985-09-18,F,Hyperlipidemia,Atorvastatin,None,200,Hyperlipidemia,Continue Atorvastatin,True,
            P006,1978-12-05,M,Hypertension; Diabetes Type 2,Lisinopril; Insulin,None,55,Diabetes Type 2,Adjust insulin dosage,False,Low glucose level not properly addressed
            """
        }
    ]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
    )

    return response.choices[0].message.content.replace('```csv', '').replace('```', '')


In [7]:
generated_data = []
data = generate_data()
generated_data.extend(data.strip().split('\n'))

# Append the generated data to a CSV file
with open('data/medicalData.csv', 'a', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    for row in generated_data:
        csvwriter.writerow(row.split(','))
print("Synthetic data generated and appending completed.")

Synthetic data generated and appending completed.


# Data Validation

In [6]:
def validate_data(input_data):
    messages = [
        {
            "role": "user",
            "content": f"""
You are a helpful assistant designed to validate the quality of medical datasets. You will be given a single row of medical data, and your task is to determine whether the data is valid.

- Carefully analyze the data for any inconsistencies, contradictions, missing values, or implausible information.
- Consider the logical relationships between different fields (e.g., treatments should be appropriate for the diagnoses, medications should not conflict with allergies, lab results should be consistent with diagnoses, etc.).
- Use your general medical knowledge to assess the validity of the data.
- Focus solely on the information provided without making assumptions beyond the given data.

**Return only a JSON object** with the following two properties:

- `"is_valid"`: a boolean (`true` or `false`) indicating whether the data is valid.
- `"issue"`: if `"is_valid"` is `false`, provide a brief explanation of the issue; if `"is_valid"` is `true`, set `"issue"` to `null`.

Both JSON properties must always be present.

Do not include any additional text or explanations outside the JSON object.

MEDICAL DATA:
{input_data}
"""      
        }
    ]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
    )

    response_content = response.choices[0].message.content.replace('```json','').replace('```','').strip()

    try:
        if isinstance(response_content, dict):
            response_dict = response_content
        else:
            response_dict = json.loads(response_content)
        return response_dict
    except json.JSONDecodeError as e:
        print(f"Failed to decode JSON response: {response_content}")
        raise e

In [7]:
# Read the csv file and exclude the last two columns
input_data = []
with open('data/medicalData.csv', 'r') as file:
    reader = csv.reader(file)
    headers = next(reader)
    for row in reader:
        input_data.append(row[:-2]) # Exclude "Is Valid" and "Issue" columns

# Initialize a list to store true labels
true_is_valid = []
true_issues = []

# Extract true labels from the csv file
with open('data/medicalData.csv', 'r') as file:
    reader = csv.reader(file)
    headers = next(reader)
    for row in reader:
        true_is_valid.append(row[-2] == 'True')
        true_issues.append(row[-1])

# Function to validate a single row of data
def validate_row(row):
    input_str = ','.join(row)
    result_json = validate_data(input_str)
    return result_json

# Validate data rows and collect results
pred_is_valid = [False] * len(input_data)
pred_issues = [''] * len(input_data)

with ThreadPoolExecutor() as executor:
    futures = {executor.submit(validate_row, row): i for i, row in enumerate(input_data)}

    for future in as_completed(futures):
        i = futures[future] # Get the index of the current row
        result_json = future.result()
        pred_is_valid[i] = result_json['is_valid']
        pred_issues[i] = result_json['issue']

# Compare model results

In [8]:
pred_is_valid_bool = [bool(val) if isinstance(val, bool) else val == 'True' for val in pred_is_valid]
true_is_valid_bool = [bool(val) if isinstance(val, bool) else val == 'True' for val in true_is_valid]

precision = precision_score(true_is_valid_bool, pred_is_valid_bool, pos_label=True)
recall = recall_score(true_is_valid_bool, pred_is_valid_bool, pos_label=True)
f1 = f1_score(true_is_valid_bool, pred_is_valid_bool, pos_label=True)

# Initialize issue_matches_full with False
issue_matches_full = [False] * len(true_is_valid)

In [9]:
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

Precision: 0.67
Recall: 0.60
F1 Score: 0.63


# Issue Identification

In [10]:
def validate_issue(model_generated_answer, correct_answer):
    messages = [
        {
            "role": "user",
            "content": f"""
            You are a medical expert assistant designed to validate the quality of an LLM-generated answer.

            The model was asked to review a medical dataset row to determine if the data is valid. If the data is not valid, it should provide a justification explaining why.

            Your task:

                •	Compare the model-generated justification with the correct reason provided.
                •	Determine if they address the same underlying medical issue or concern, even if phrased differently.
                •	Focus on the intent, medical concepts, and implications rather than exact wording.

            Instructions:

                •	If the justifications have the same intent or address the same medical issue, return True.
                •	If they address different issues or concerns, return False.
                •	Only respond with a single word: True or False.

            Examples:

                1.	Example 1:
                •	Model Generated Response: “The patient is allergic to penicillin”
                •	Correct Response: “The patient was prescribed penicillin despite being allergic”
                •	Answer: True
                2.	Example 2:
                •	Model Generated Response: “The date of birth of the patient is incorrect”
                •	Correct Response: “The patient was prescribed penicillin despite being allergic”
                •	Answer: False


            Model Generated Response: {model_generated_answer}
            Correct Response:  {correct_answer}
        """
        }
    ]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages
    )
    result = response.choices[0].message.content

    return result

In [11]:
validation_results = []

with ThreadPoolExecutor() as executor:
    futures = {
        executor.submit(validate_issue, pred_issues[i], true_issues[i]): i
        for i in range(len(pred_is_valid_bool))
        if not pred_is_valid_bool[i] and not true_is_valid_bool[i]
    }

    for future in as_completed(futures):
        i = futures[future]
        issue_match = future.result()
        issue_matches_full[i] = (issue_match == 'True')
        validation_results.append({
            "index": i,
            "predicted_issue": pred_issues[i],
            "true_issue": true_issues[i],
            "issue_match": issue_matches_full[i]
        })
    # Calculate issue accuracy
    issue_accuracy = sum([i['issue_match'] for i in validation_results]) / len(validation_results)

    model_results = {
        "precision": precision,
        "recall": recall,
        "f1":f1,
        "issue_accuracy": issue_accuracy
    }
    # Create a DataFrame to store the results
    df_results = pd.DataFrame([model_results])

    # Create a DataFrame to store the validation results for each row
    df_validation_results = pd.DataFrame(validation_results)



# Display subset of rows

In [12]:
def display_formatted_dataframe(df):
    def format_text(text):
        return text.replace("\n", "<br>")
    
    df_formatted  = df.copy()
    df_formatted['predicted_issue'] = df_formatted['predicted_issue'].apply(format_text)
    df_formatted['true_issue'] = df_formatted['true_issue'].apply(format_text)

    display(HTML(df_formatted.to_html(escape=False, justify='left')))

display_formatted_dataframe(pd.DataFrame(validation_results))

Unnamed: 0,index,predicted_issue,true_issue,issue_match
0,64,"The diagnosis of Anemia is appropriate, but the recommendation to 'Start Iron Supplements' is inconsistent with the absence of any specific treatment or medication history. Iron supplements are commonly used to treat anemia; however, without any prior indication of treatment, this raises concerns about the validity of the treatment plan.",Anemic patient not on iron supplements,True
1,77,"The entry indicates a diagnosis of Asthma but does not list any allergies or prior treatments, which raises concern about the completeness of the medical history.",Asthmatic patient not prescribed medication,False
2,100,"The treatment plan suggests adjusting Metformin dosage, but no specific dosage value is mentioned for the adjustment.",Low glucose level not properly addressed,False
3,15,"The patient has an allergy to Penicillin, which contradicts prescribing it.",Prescribed Penicillin despite Penicillin allergy,True
4,5,"The patient is prescribed Lisinopril (for hypertension) but does not have any allergies listed. While this is not necessarily invalid, the adjustment of insulin dosage is questionable without context for lab results or current glucose levels.",Low glucose level not properly addressed,False
5,60,"The treatment 'Prescribe Albuterol' is appropriate for asthma, but there is no indication of any prior assessment or specific patient history that justifies the prescription. More information is needed to confirm the validity of the action.",Asthmatic patient not prescribed medication,False
6,9,"The data indicates prescribing Ibuprofen for pain management, but there is a duplicative entry for Ibuprofen in both the medication fields, which could imply a data entry error.",Prescribed Ibuprofen despite Ibuprofen allergy,False
7,46,"Patient has a penicillin allergy and is being prescribed Amoxicillin, which is a type of penicillin.",Prescribed Amoxicillin despite Penicillin allergy,True
8,104,"The diagnosis of 'Anemia' does not provide any indication of the type of anemia or further context, and the recommended treatment of 'Start Iron Supplements' may not be appropriate if the anemia is not iron deficiency, which is not specified.",Anemic patient not on iron supplements,False
9,32,"The data indicates a diagnosis of Hypertension but lists no contraindications or prior treatments. Also, the 'Start Lisinopril' as a treatment is inconsistent without prior treatment history.",Hypertensive patient not on antihypertensive medication,True


In [13]:
print(df_results)

   precision    recall        f1  issue_accuracy
0   0.671875  0.597222  0.632353        0.538462
