In [7]:
# Setup and Initiation:
!pip install -U -q "google"
!pip install -U -q "google.genai"
!pip install PyPDF2

import os
import io # Needed for reading PDF content from memory
import json # Needed for validating/parsing JSON output
from google.colab import userdata
from google.colab import drive
from google.colab import files
import base64
from google import genai
from google.genai import types
import PyPDF2

# --- Environment Setup ---
try:
    os.environ["GEMINI_API_KEY"] = userdata.get("GOOGLE_API_KEY")
    if not os.environ["GEMINI_API_KEY"]:
        raise ValueError("GOOGLE_API_KEY not found in Colab Secrets. Please add it.")
    print("Successfully retrieved GOOGLE_API_KEY.")
except Exception as e:
    print(f"Error accessing Colab Secrets: {e}")
    print("Please ensure you have set the 'GOOGLE_API_KEY' secret in Google Colab.")
    exit() # Exit if the key is essential and missing

# --- Google Drive Setup ---
try:
    drive.mount("/content/drive")
    drive_dir = "/content/drive/MyDrive/Google AI Studio" # Define target directory
    os.makedirs(drive_dir, exist_ok=True) # Create directory if it doesn't exist
    os.chdir(drive_dir)
    print(f"Current working directory: {os.getcwd()}")
except Exception as e:
    print(f"Error mounting Google Drive or changing directory: {e}")
    # Decide if you want to proceed without Drive or exit
    # exit()

# --- Define Prompts ---

# PROMPT_1: Differential Diagnosis (Updated for Top 10 & Clarity)
PROMPT_1 = """Prompt: NEJM Medical Case Analysis with Atom-of-Thought Reasoning (JSON Output)
Goal:
Analyze a provided NEJM medical case record and generate a differential diagnosis (top 10) ranked in order of likelihood/confidence, along with the final diagnosis. Justify each ranking using atom-of-thought reasoning and suggest next diagnostic steps/tests a physician would perform to confirm or rule out conditions. The response must be formatted as structured JSON.

Context Dump:
You are a highly advanced medical AI trained in clinical reasoning, differential diagnosis, and diagnostic testing. Your task is to analyze patient case data methodically, using the atom-of-thought reasoning process, breaking down each step into granular diagnostic components before synthesizing conclusions. You follow evidence-based medicine and best clinical practices.

You will be provided with a full NEJM medical case record, including history, symptoms, lab results, imaging findings, and other relevant data. Your role is to act as an expert diagnostician, systematically working through the case to generate an accurate and well-supported differential diagnosis.

Warnings & Considerations:
Do NOT fabricate data—base all reasoning strictly on the given case information.
Clearly state uncertainty levels for each differential diagnosis.
Emphasize clinical reasoning rather than just listing conditions.
Do NOT provide patient-specific medical advice—this is a simulated diagnostic reasoning exercise.
Use the provided filename as the 'case_id' in the JSON output if no other unique identifier is present in the text.

Return Format (JSON Structure):
Your response must be structured as a valid JSON object using the following schema:

{
  "case_id": "<unique_case_id_or_filename>",
  "case_summary": "<brief summary of the patient's key symptoms, history, and findings>",
  "differential_diagnosis": [
    { "diagnosis": "<1st most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<2nd most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<3rd most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<4th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<5th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<6th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<7th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<8th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<9th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" },
    { "diagnosis": "<10th most likely>", "reasoning": "<step-by-step justification>", "confidence_level": "<High/Medium/Low>" }
  ],
  "final_diagnosis": {
    "diagnosis": "<most probable final diagnosis>",
    "justification": "<detailed reasoning explaining why this diagnosis is most likely>"
  },
  "next_steps_recommended_tests": [
    "<test 1: explanation>",
    "<test 2: explanation>",
    "<test 3: explanation>"
    // Add more tests as appropriate
  ]
}

Atom-of-Thought Reasoning Process:
For each differential diagnosis, apply the following structured approach:

Identify key clinical clues (e.g., symptoms, lab values, imaging findings).
Compare with characteristic disease patterns (match findings to potential conditions).
Assess probability & fit (Does this condition fully explain the case? Are there inconsistencies?).
Consider alternative explanations (What else could explain this? Are there competing diagnoses?).
Rank & justify (Determine the most likely and why).
Determine next steps (What additional data is needed to confirm the diagnosis?).

Input Example (User Provides):
"Input Case File: NEJMcpc2309500.pdf
Case Details:
[Insert case details]."

Expected Output Example (LLM Response in JSON):
// (Example structure implies 10 entries in differential_diagnosis array)
{
  "case_id": "NEJMcpc2309500.pdf", // Example using filename
  "case_summary": "A 30-year-old postpartum woman developed persistent fever...",
  "differential_diagnosis": [
    { "diagnosis": "Septic Pelvic Thrombophlebitis", "reasoning": "...", "confidence_level": "High" },
    // ... (up to 10 diagnoses)
  ],
  "final_diagnosis": { ... },
  "next_steps_recommended_tests": [ ... ]
}"""


# PROMPT_2: Patient-Clinician Conversation (Updated for Combined JSON Output)
PROMPT_2 = """Prompt: Patient-Clinician Interaction Simulation (Combined JSON Output)
Role:
You are a medical dialogue generator trained to simulate realistic and succinct clinician-patient interactions based on detailed medical case records.

Objective:
Given the full text of a patient case history (e.g., from NEJM Case Records), simulate a natural, human-like conversation between a doctor and a patient as it would occur during a real-world clinical visit. You will assume both roles (doctor and patient) and follow a logical conversational flow. Additionally, generate a concise doctor's summary note based on the simulated encounter.

Instructions:

Persona Setup:
You are both the Doctor and the Patient based on the uploaded medical case history.
The patient presents for evaluation of symptoms, and the doctor proceeds to ask clarifying and relevant questions in a natural flow.
Maintain clinical realism: do not include dialogue that wouldn’t typically occur in a normal patient-clinician setting.

Conversation Structure:
Start with a greeting and an open-ended question from the doctor.
Patient shares their chief complaints.
The doctor then collects the following in a logical conversational order: HPI, PMH, PSH, Meds/Allergies, Family History, Social History, relevant ROS/Vitals (from case file).
Ask follow-up questions only when clinically appropriate.
Avoid speculative diagnostic reasoning or technical discussion not spoken to the patient.

Tone and Length:
Maintain a professional, empathetic tone.
Keep the dialogue realistic and succinct — approximately 25–30 turns.

Output Requirements:
Your response MUST be a single, valid JSON object containing two keys: "conversation" and "doctor_summary_note".

1.  `conversation`: An array of objects, each representing a turn in the dialogue.
    *   Each object has keys: `speaker` ("Doctor" or "Patient") and `utterance` (the text spoken).
2.  `doctor_summary_note`: A string containing a short, clinically worded summary note (written from the doctor’s perspective) summarizing key findings from the simulated encounter using appropriate medical terminology.

JSON Output Schema:
```json
{
  "conversation": [
    {"speaker": "Doctor", "utterance": "Hello, I’m Dr. [Name]. What brings you in today?"},
    {"speaker": "Patient", "utterance": "Hi Doctor, I’ve been having [symptom]..."},
    // ... more turns ...
    {"speaker": "Doctor", "utterance": "Okay, thank you for sharing that. We'll run some tests."},
    {"speaker": "Patient", "utterance": "Thank you, Doctor."}
  ],
  "doctor_summary_note": "Patient is a [Age]-year-old [Gender] presenting with [Chief Complaint]. History notable for [Key PMH/PSH/etc.]. Social history [details]. Vitals [if available]. Key findings on simulated review include [relevant points]. Plan: [Initial thoughts/tests based on conversation, not full diagnosis]."
}
Input:
Use the following detailed patient case history as your reference material to simulate the interaction and populate the JSON content above.
Input Case File: [Filename will be provided here]
Case Details:
[Case text will be provided here]"""

Successfully retrieved GOOGLE_API_KEY.
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Current working directory: /content/drive/MyDrive/Google AI Studio


In [8]:
def generate_diagnosis_json(case_text, filename="unknown_case", prompt=PROMPT_1):
  """
  Generates differential diagnosis in JSON format for a given case text.
  Args:
      case_text (str): The extracted text from the medical case PDF.
      filename (str): The original filename, used for context/ID.
      prompt (str): The prompt template to use for generation.

  Returns:
      str: The generated JSON output as a string, or None if an error occurs.
  """
  full_prompt = f"{prompt}\n\nInput Case File: {filename}\nCase Details:\n{case_text}"
  json_output = None # Initialize to None

  try:
      client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
      # Use a model known for good JSON handling and reasoning
      model = "gemini-2.5-pro-exp-03-25" # Recommended

      contents = [types.Content(role="user", parts=[types.Part.from_text(text=full_prompt)])]
      generate_content_config = types.GenerateContentConfig(
          temperature=0, # Low temperature for structured output
          response_mime_type="application/json",
      )

      print(f"   [API Call] Requesting diagnosis for {filename}...")
      response = client.models.generate_content(
          model=model,
          contents=contents,
          config=generate_content_config,
          # stream=False # Default for generate_content
      )
      json_output = response.text
      print(f"   [API Call] Received diagnosis response for {filename}.")

      # Basic validation
      try:
          json.loads(json_output)
          print(f"   [Validation] Diagnosis output for {filename} appears to be valid JSON.")
      except json.JSONDecodeError as json_err:
          print(f"   [Validation Warning] Diagnosis output for {filename} is NOT valid JSON: {json_err}")
          print("   --- Raw Diagnosis Output Start ---")
          print(json_output)
          print("   --- Raw Diagnosis Output End ---")
          # Keep the potentially flawed string, but warn the user

  except Exception as e:
      print(f"   [Error] API call failed for diagnosis generation ({filename}): {e}")
      # import traceback # Uncomment for detailed debugging if needed
      # traceback.print_exc()
      json_output = None # Ensure None is returned on error

  return json_output

In [11]:
def generate_diagnosis_from_interaction_json(case_text, filename="unknown_case", prompt=PROMPT_2):
  """
  Generates a simulated patient-clinician conversation and summary note
  in a combined JSON format.
  Args:
      case_text (str): The extracted text from the medical case PDF.
      filename (str): The original filename, used for context.
      prompt (str): The prompt template to use for generation.

  Returns:
      str: The generated JSON output (containing conversation and summary)
          as a string, or None if an error occurs.
  """
  full_prompt = f"{prompt}\n\nInput Case File: {filename}\nCase Details:\n{case_text}"
  json_output = None # Initialize to None

  try:
      client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
      # Use a model capable of following complex instructions and JSON format
      model = "gemini-2.5-pro-exp-03-25" # Recommended

      contents = [types.Content(role="user", parts=[types.Part.from_text(text=full_prompt)])]
      generate_content_config = types.GenerateContentConfig(
          temperature=0, # Moderate temperature for more natural conversation
          response_mime_type="application/json", # Expecting JSON back
      )

      print(f"   [API Call] Requesting conversation for {filename}...")
      response = client.models.generate_content(
          model=model,
          contents=contents,
          config=generate_content_config,
          # stream=False
      )
      json_output = response.text
      print(f"   [API Call] Received conversation response for {filename}.")

      # Basic validation
      try:
          parsed_json = json.loads(json_output)
          # Check for expected top-level keys
          if "conversation" in parsed_json and "doctor_summary_note" in parsed_json:
              print(f"   [Validation] Conversation output for {filename} appears to be valid JSON with expected keys.")
          else:
              print(f"   [Validation Warning] Conversation JSON for {filename} is missing expected keys ('conversation', 'doctor_summary_note').")
              print("   --- Raw Conversation Output Start ---")
              print(json_output)
              print("   --- Raw Conversation Output End ---")

      except json.JSONDecodeError as json_err:
          print(f"   [Validation Warning] Conversation output for {filename} is NOT valid JSON: {json_err}")
          print("   --- Raw Conversation Output Start ---")
          print(json_output)
          print("   --- Raw Conversation Output End ---")
          # Keep the potentially flawed string

  except Exception as e:
      print(f"   [Error] API call failed for conversation generation ({filename}): {e}")
      # import traceback
      # traceback.print_exc()
      json_output = None # Ensure None is returned on error

  return json_output

In [15]:
print("Please upload one or more PDF medical case files.")
uploaded = files.upload()
if not uploaded:
  print("\nNo files were uploaded. Exiting.")
else:
  print(f"\n--- Processing {len(uploaded)} file(s) ---")
# Store results if needed later, e.g., for saving to files
all_results = {}
for filename, file_content in uploaded.items():
    print(f"\n--- Analyzing File: {filename} ---")
    all_results[filename] = {'conversation': None, 'diagnosis': None, 'error': None}
    case_text = ""

    try:
        # 1. Read PDF content from memory
        print(f"   Reading PDF content...")
        pdf_file = io.BytesIO(file_content)
        reader = PyPDF2.PdfReader(pdf_file)
        num_pages = len(reader.pages)
        print(f"   Number of pages: {num_pages}")

        for i, page in enumerate(reader.pages):
            page_text = page.extract_text()
            if page_text:
                case_text += page_text + "\n" # Add newline between pages
            else:
                print(f"   Warning: Could not extract text from page {i+1} of {filename}.")

        if not case_text:
            print(f"   Error: No text could be extracted from {filename}. Skipping.")
            all_results[filename]['error'] = "Failed to extract text from PDF."
            continue # Skip to the next file

        print(f"   Successfully extracted text (approx. {len(case_text)} characters).")
        # Optional: Preview extracted text
        # print("   Extracted Text (first 300 chars):", case_text[:300]+"...")

        # 2. Generate Conversation JSON (using the extracted case_text)
        print(f"\n   Generating Conversation for {filename}...")
        conversation_json_output = generate_diagnosis_from_interaction_json(case_text, filename=filename)
        all_results[filename]['conversation'] = conversation_json_output

        if conversation_json_output:
            print(f"\n--- Conversation & Summary JSON Output for: {filename} ---")
            # Pretty print the JSON for better readability
            try:
                parsed = json.loads(conversation_json_output)
                print(json.dumps(parsed, indent=2))
            except json.JSONDecodeError:
                print(conversation_json_output) # Print raw if not valid JSON
            print(f"--- End of Conversation Output for: {filename} ---")
        else:
            print(f"   Failed to generate conversation for {filename}.")
            all_results[filename]['error'] = (all_results[filename]['error'] or "") + " Conversation generation failed."


        # 3. Generate Differential Diagnosis JSON (using the SAME extracted case_text)
        print(f"\n   Generating Differential Diagnosis for {filename}...")
        diagnosis_json_output = generate_diagnosis_json(conversation_json_output, filename=filename)
        all_results[filename]['diagnosis'] = diagnosis_json_output

        if diagnosis_json_output:
            print(f"\n--- Differential Diagnosis JSON Output for: {filename} ---")
            # Pretty print the JSON
            try:
                parsed = json.loads(diagnosis_json_output)
                print(json.dumps(parsed, indent=2))
            except json.JSONDecodeError:
                print(diagnosis_json_output) # Print raw if not valid JSON
            print(f"--- End of Diagnosis Output for: {filename} ---")
        else:
            print(f"   Failed to generate diagnosis for {filename}.")
            all_results[filename]['error'] = (all_results[filename]['error'] or "") + " Diagnosis generation failed."


    except PyPDF2.errors.PdfReadError as pdf_err:
         print(f"   Error: Failed to read PDF file {filename}. It might be corrupted or password-protected. {pdf_err}")
         all_results[filename]['error'] = f"Failed to read PDF: {pdf_err}"
    except Exception as e:
        print(f"   Error: An unexpected error occurred processing file {filename}: {e}")
        # import traceback # Uncomment for detailed traceback
        # traceback.print_exc()
        all_results[filename]['error'] = f"Unexpected processing error: {e}"

print("\n\n--- All Files Processed ---")

# Optional: Post-processing - e.g., save results to files
# print("\n--- Saving Results ---")
# for filename, data in all_results.items():
#     base_name = os.path.splitext(filename)[0]
#     if data['conversation']:
#         try:
#             # Validate before saving
#             json.loads(data['conversation'])
#             output_conv_filename = f"{base_name}_conversation.json"
#             with open(output_conv_filename, 'w') as f:
#                 f.write(data['conversation'])
#             print(f"Saved conversation to: {output_conv_filename}")
#         except Exception as e:
#             print(f"Could not save conversation JSON for {filename}: {e}")
#     if data['diagnosis']:
#         try:
#             # Validate before saving
#             json.loads(data['diagnosis'])
#             output_diag_filename = f"{base_name}_diagnosis.json"
#             with open(output_diag_filename, 'w') as f:
#                 f.write(data['diagnosis'])
#             print(f"Saved diagnosis to: {output_diag_filename}")
#         except Exception as e:
#             print(f"Could not save diagnosis JSON for {filename}: {e}")
#     if data['error']:
#          print(f"File {filename} had errors: {data['error']}")

Please upload one or more PDF medical case files.


Saving PoC - NEJMcpc2100279.pdf to PoC - NEJMcpc2100279 (3).pdf
Saving PoC - NEJMcpc2300900.pdf to PoC - NEJMcpc2300900 (3).pdf
Saving PoC - NEJMcpc2309383.pdf to PoC - NEJMcpc2309383 (3).pdf
Saving PoC - NEJMcpc2309500.pdf to PoC - NEJMcpc2309500 (3).pdf

--- Processing 4 file(s) ---

--- Analyzing File: PoC - NEJMcpc2100279 (3).pdf ---
   Reading PDF content...
   Number of pages: 4
   Successfully extracted text (approx. 15247 characters).

   Generating Conversation for PoC - NEJMcpc2100279 (3).pdf...
   [API Call] Requesting conversation for PoC - NEJMcpc2100279 (3).pdf...
   [API Call] Received conversation response for PoC - NEJMcpc2100279 (3).pdf.
   [Validation] Conversation output for PoC - NEJMcpc2100279 (3).pdf appears to be valid JSON with expected keys.

--- Conversation & Summary JSON Output for: PoC - NEJMcpc2100279 (3).pdf ---
{
  "conversation": [
    {
      "speaker": "Doctor",
      "utterance": "Hello, I'm Dr. Evans. I see you've been transferred from another hosp

In [12]:
print("Please upload one or more PDF medical case files.")
uploaded_pdfs = files.upload() # Renamed for clarity

if not uploaded_pdfs:
    print("\nNo PDF files were uploaded. Exiting.")
else:
    print(f"\n--- Processing {len(uploaded_pdfs)} PDF file(s) ---")
    all_results = {} # Dictionary to store results for each file

    for filename, file_content in uploaded_pdfs.items():
        print(f"\n--- Analyzing File: {filename} ---")
        # Initialize results structure for this file
        all_results[filename] = {
            'original_pdf_text': None,
            'conversation_json': None,
            'diagnosis_from_interaction_json': None,
            'error': None
            }
        case_text = ""
        conversation_json_output = None
        diagnosis_json_output = None

        try:
            # 1. Read PDF content
            print(f"   Reading PDF content...")
            pdf_file = io.BytesIO(file_content)
            reader = PyPDF2.PdfReader(pdf_file)
            num_pages = len(reader.pages)
            # print(f"   Number of pages: {num_pages}") # Optional verbosity

            for i, page in enumerate(reader.pages):
                page_text = page.extract_text()
                if page_text:
                    case_text += page_text + "\n"
                # else: # Optional verbosity
                #     print(f"   Warning: Could not extract text from page {i+1} of {filename}.")

            if not case_text:
                print(f"   Error: No text could be extracted from {filename}. Skipping subsequent steps.")
                all_results[filename]['error'] = "Failed to extract text from PDF."
                continue

            print(f"   Successfully extracted text.")
            all_results[filename]['original_pdf_text'] = case_text # Store original text if needed

            # 2. Generate Conversation JSON
            print(f"\n   Generating Conversation for {filename}...")
            conversation_json_output = generate_conversation_json(case_text, filename=filename)
            all_results[filename]['conversation_json'] = conversation_json_output # Store conversation JSON string

            if conversation_json_output:
                # Optional: Print conversation output
                # print(f"\n--- Conversation & Summary JSON Output for: {filename} ---")
                # try:
                #     parsed = json.loads(conversation_json_output)
                #     print(json.dumps(parsed, indent=2))
                # except json.JSONDecodeError:
                #     print(conversation_json_output)
                # print(f"--- End of Conversation Output for: {filename} ---")

                # 3. Generate Differential Diagnosis JSON (from conversation)
                print(f"\n   Generating Differential Diagnosis based on Interaction for {filename}...")
                diagnosis_json_output = generate_diagnosis_from_interaction_json(conversation_json_output, filename=filename)
                # **** STORE THE DIAGNOSIS JSON STRING ****
                all_results[filename]['diagnosis_from_interaction_json'] = diagnosis_json_output

                if diagnosis_json_output:
                     # Optional: Print diagnosis output
                     # print(f"\n--- Differential Diagnosis (from Interaction) JSON Output for: {filename} ---")
                     # try:
                     #     parsed = json.loads(diagnosis_json_output)
                     #     print(json.dumps(parsed, indent=2))
                     # except json.JSONDecodeError:
                     #     print(diagnosis_json_output)
                     # print(f"--- End of Diagnosis Output for: {filename} ---")
                     print(f"   Successfully generated diagnosis for {filename}.")
                else:
                    print(f"   Failed to generate diagnosis from interaction for {filename}.")
                    error_msg = "Diagnosis generation (from interaction) failed."
                    all_results[filename]['error'] = (all_results[filename]['error'] + "; " + error_msg) if all_results[filename]['error'] else error_msg


            else:
                print(f"   Skipping diagnosis generation because conversation generation failed for {filename}.")
                error_msg = "Conversation generation failed; diagnosis skipped."
                all_results[filename]['error'] = (all_results[filename]['error'] + "; " + error_msg) if all_results[filename]['error'] else error_msg


        except PyPDF2.errors.PdfReadError as pdf_err:
             print(f"   Error: Failed to read PDF file {filename}. It might be corrupted or password-protected. {pdf_err}")
             all_results[filename]['error'] = f"Failed to read PDF: {pdf_err}"
        except Exception as e:
            print(f"   Error: An unexpected error occurred processing file {filename}: {e}")
            # import traceback
            # traceback.print_exc()
            all_results[filename]['error'] = f"Unexpected processing error: {e}"

    print("\n\n--- All PDF Files Processed ---")
    # At this point, all_results dictionary holds the outputs (or errors) for each file.

Please upload one or more PDF medical case files.


Saving PoC - NEJMcpc2100279.pdf to PoC - NEJMcpc2100279 (2).pdf
Saving PoC - NEJMcpc2300900.pdf to PoC - NEJMcpc2300900 (2).pdf
Saving PoC - NEJMcpc2309383.pdf to PoC - NEJMcpc2309383 (2).pdf
Saving PoC - NEJMcpc2309500.pdf to PoC - NEJMcpc2309500 (2).pdf

--- Processing 4 PDF file(s) ---

--- Analyzing File: PoC - NEJMcpc2100279 (2).pdf ---
   Reading PDF content...
   Successfully extracted text.

   Generating Conversation for PoC - NEJMcpc2100279 (2).pdf...
   [API Call] Requesting conversation for PoC - NEJMcpc2100279 (2).pdf...
   [API Call] Received conversation response for PoC - NEJMcpc2100279 (2).pdf.
   [Validation] Conversation output for PoC - NEJMcpc2100279 (2).pdf appears to be valid JSON with expected keys.

   Generating Differential Diagnosis based on Interaction for PoC - NEJMcpc2100279 (2).pdf...
   [API Call] Requesting conversation for PoC - NEJMcpc2100279 (2).pdf...
   [API Call] Received conversation response for PoC - NEJMcpc2100279 (2).pdf.
   [Validation] Con

In [14]:
# --- [PREVIOUS CODE: Setup, Imports, Prompts (PROMPT_1, PROMPT_2, PROMPT_SIMILARITY),
# --- Function Definitions (generate_conversation_json, generate_diagnosis_from_interaction_json,
# --- normalize_diagnosis, check_diagnosis_similarity, calculate_reciprocal_rank_semantic,
# --- calculate_dcg_semantic), and the Main PDF Processing Loop should be executed first
# --- to populate 'all_results'] ---

import json
import math
import time
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
from google import genai
from google.genai import types

# --- Load Single Ground Truth Data File (MODIFIED for New Object/List Structure) ---

print("\nPlease upload the SINGLE Ground Truth JSON file.")
print("Expected format: A JSON object with a 'Differential Diagnoses' key containing a list of case objects.")
print("""Example:
{
  "Differential Diagnoses": [
    {
      "case_id": "NEJMcpc2100279",
      "differential_diagnosis": [ ... ], // This part is ignored for evaluation
      "final_diagnosis": "Infective endocarditis due to Haemophilus parainfluenzae."
    },
    {
      "case_id": "NEJMcpc2300900",
      "differential_diagnosis": [ ... ],
      "final_diagnosis": "AL amyloidosis."
    }
    // ... more cases
  ]
}""")

uploaded_ground_truth = files.upload() # This will prompt for one file

ground_truth_data = {} # This will store the final {filename.pdf: diagnosis} mapping
if not uploaded_ground_truth:
    print("No ground truth file uploaded. Cannot perform evaluation.")
else:
    ground_truth_filename = list(uploaded_ground_truth.keys())[0]
    try:
        ground_truth_content = uploaded_ground_truth[ground_truth_filename].decode('utf-8')
        # Load the JSON - expecting an object
        loaded_json_object = json.loads(ground_truth_content)

        # Validate the top-level structure
        if not isinstance(loaded_json_object, dict) or "Differential Diagnoses" not in loaded_json_object:
            print(f"Error: Ground truth file '{ground_truth_filename}' does not contain a top-level object with a 'Differential Diagnoses' key.")
            ground_truth_data = {}
        else:
            # Access the list of cases
            diagnoses_list = loaded_json_object["Differential Diagnoses"]

            if not isinstance(diagnoses_list, list):
                print(f"Error: The 'Differential Diagnoses' key in '{ground_truth_filename}' does not contain a JSON list.")
                ground_truth_data = {}
            else:
                print(f"Successfully loaded ground truth structure from {ground_truth_filename}.")
                processed_count = 0
                skipped_count = 0
                # Transform the list into a dictionary {filename.pdf: normalized_diagnosis}
                temp_ground_truth_dict = {}
                for case_data in diagnoses_list:
                    if isinstance(case_data, dict) and "case_id" in case_data and "final_diagnosis" in case_data:
                        case_id = case_data["case_id"]
                        final_diagnosis = case_data["final_diagnosis"]

                        # Validate types and content
                        if case_id and isinstance(case_id, str) and final_diagnosis and isinstance(final_diagnosis, str):
                            # *** ASSUMPTION: Construct filename by adding .pdf to case_id ***
                            filename_key = f"{case_id}.pdf"
                            # Normalize the diagnosis during transformation
                            normalized_diagnosis = normalize_diagnosis(final_diagnosis)
                            temp_ground_truth_dict[filename_key] = normalized_diagnosis
                            processed_count += 1
                        else:
                            print(f"   - Warning: Skipping invalid case data (empty/wrong type): {case_data}")
                            skipped_count += 1
                    else:
                        print(f"   - Warning: Skipping item with missing keys ('case_id', 'final_diagnosis') or wrong type: {case_data}")
                        skipped_count += 1

                ground_truth_data = temp_ground_truth_dict # Assign the processed dictionary
                print(f"Processed {processed_count} ground truth entries.")
                if skipped_count > 0:
                    print(f"Skipped {skipped_count} invalid entries in the ground truth file.")
                print(f"Final ground truth dictionary contains {len(ground_truth_data)} cases.")

    except json.JSONDecodeError as e:
        print(f"Error: Could not parse ground truth JSON file '{ground_truth_filename}'. Invalid JSON: {e}")
        ground_truth_data = {} # Reset to empty if parsing fails
    except Exception as e:
        print(f"An error occurred loading/processing the ground truth file: {e}")
        ground_truth_data = {}

# --- Perform Evaluation (Using Semantic Similarity) ---
# (The rest of this block remains IDENTICAL to the previous answer)
# It uses the 'ground_truth_data' dictionary which was populated above.

evaluation_scores = []
k_for_dcg = 10
similarity_cache = {} # Re-initialize cache here to be safe

if ground_truth_data and 'all_results' in locals() and all_results:
    print(f"\n--- Evaluating {len(all_results)} Processed Files against Ground Truth (using Semantic Similarity) ---")
    evaluated_count = 0
    missing_ground_truth_count = 0
    error_parsing_results_count = 0
    no_diagnosis_generated_count = 0

    sorted_filenames = sorted(all_results.keys())

    for filename in sorted_filenames:
        result_data = all_results[filename]
        print(f"\nEvaluating: {filename}")

        # Access ground truth using the dictionary created earlier
        # The keys in ground_truth_data should now match the PDF filenames (e.g., "NEJMcpc2100279.pdf")
        actual_diagnosis_normalized = ground_truth_data.get(filename)
        if not actual_diagnosis_normalized:
            print(f"   - Warning: No ground truth found for {filename} in the processed data. Skipping.")
            missing_ground_truth_count += 1
            continue

        diagnosis_json_str = result_data.get('diagnosis_from_interaction_json')
        if not diagnosis_json_str:
            print(f"   - Warning: No diagnosis JSON generated for {filename}. Skipping.")
            no_diagnosis_generated_count += 1
            continue

        try:
            diagnosis_data = json.loads(diagnosis_json_str)
            generated_ddx = diagnosis_data.get('differential_diagnosis')

            if not isinstance(generated_ddx, list):
                 print(f"   - Warning: 'differential_diagnosis' key missing/not a list in generated JSON for {filename}. Skipping.")
                 error_parsing_results_count += 1
                 continue

            print(f"   - Ground Truth: '{actual_diagnosis_normalized}'")
            print(f"   - Checking top {min(len(generated_ddx), k_for_dcg)} generated diagnoses for similarity...")

            rr = calculate_reciprocal_rank_semantic(generated_ddx, actual_diagnosis_normalized, similarity_cache)
            dcg = calculate_dcg_semantic(generated_ddx, actual_diagnosis_normalized, k_for_dcg, similarity_cache)

            evaluation_scores.append({
                'filename': filename,
                'reciprocal_rank': rr,
                f'dcg@{k_for_dcg}': dcg,
                'actual_diagnosis': actual_diagnosis_normalized,
                'found_rank': int(1/rr) if rr > 0 else None
            })
            print(f"   - Scores: RR={rr:.4f} | DCG@{k_for_dcg}={dcg:.4f} | Found Rank: {int(1/rr) if rr > 0 else 'Not Found'}")
            evaluated_count += 1

        except json.JSONDecodeError as e:
            print(f"   - Error: Could not parse generated diagnosis JSON for {filename}: {e}. Skipping.")
            error_parsing_results_count += 1
            continue
        except Exception as e:
            print(f"   - Error: Unexpected error during evaluation for {filename}: {e}. Skipping.")
            error_parsing_results_count += 1
            continue

    print("\n--- Evaluation Summary ---")
    print(f"Successfully evaluated: {evaluated_count} cases")
    print(f"Skipped (Missing Ground Truth): {missing_ground_truth_count} cases")
    print(f"Skipped (No Diagnosis Generated): {no_diagnosis_generated_count} cases")
    print(f"Skipped (Error Parsing Results): {error_parsing_results_count} cases")
    print(f"Total Similarity API calls made (excluding cache hits): {len(similarity_cache)}")


    # --- Calculate Aggregate Metrics & Visualize (No changes needed here) ---
    if evaluated_count > 0:
        all_rr = [score['reciprocal_rank'] for score in evaluation_scores]
        all_dcg = [score[f'dcg@{k_for_dcg}'] for score in evaluation_scores]

        mean_rr = np.mean(all_rr)
        mean_dcg = np.mean(all_dcg)

        print(f"\nMean Reciprocal Rank (MRR): {mean_rr:.4f}")
        print(f"Mean DCG@{k_for_dcg}:          {mean_dcg:.4f}")

        print("\n--- Generating Visualizations ---")
        plt.style.use('seaborn-v0_8-whitegrid')

        fig, axes = plt.subplots(1, 2, figsize=(14, 6))

        sns.boxplot(y=all_rr, ax=axes[0], palette="viridis")
        axes[0].set_title('Distribution of Reciprocal Ranks (RR)')
        axes[0].set_ylabel('Reciprocal Rank')
        axes[0].set_ylim(bottom=-0.05, top=1.05)

        sns.boxplot(y=all_dcg, ax=axes[1], palette="plasma")
        axes[1].set_title(f'Distribution of DCG@{k_for_dcg}')
        axes[1].set_ylabel(f'DCG@{k_for_dcg} Score')

        plt.tight_layout()
        plt.show()

        found_ranks = [score['found_rank'] for score in evaluation_scores if score['found_rank'] is not None]
        if found_ranks:
            plt.figure(figsize=(8, 5))
            sns.histplot(found_ranks, bins=range(1, k_for_dcg + 2), kde=False, discrete=True)
            plt.title('Frequency of Ranks for Correct Diagnosis (when found)')
            plt.xlabel('Rank in Generated List')
            plt.ylabel('Number of Cases')
            plt.xticks(range(1, k_for_dcg + 1))
            plt.grid(axis='y', linestyle='--')
            plt.show()
        else:
            print("\nCorrect diagnosis not found (or deemed similar) in the top K for any evaluated cases.")

    else:
        print("\nNo cases were successfully evaluated. Cannot calculate aggregate metrics or generate plots.")

elif not ground_truth_data:
    print("\nEvaluation skipped because ground truth data was not loaded or processed correctly.")
else:
    print("\nEvaluation skipped because 'all_results' dictionary is empty or not defined. Did the processing step run correctly?")


Please upload the SINGLE Ground Truth JSON file.
Expected format: A JSON object with a 'Differential Diagnoses' key containing a list of case objects.
Example:
{
  "Differential Diagnoses": [
    {
      "case_id": "NEJMcpc2100279",
      "differential_diagnosis": [ ... ], // This part is ignored for evaluation
      "final_diagnosis": "Infective endocarditis due to Haemophilus parainfluenzae."
    },
    {
      "case_id": "NEJMcpc2300900",
      "differential_diagnosis": [ ... ],
      "final_diagnosis": "AL amyloidosis."
    }
    // ... more cases
  ]
}


Saving Patient Differential Diagnoses.json to Patient Differential Diagnoses (2).json
Successfully loaded ground truth structure from Patient Differential Diagnoses (2).json.
Processed 4 ground truth entries.
Final ground truth dictionary contains 4 cases.

--- Evaluating 4 Processed Files against Ground Truth (using Semantic Similarity) ---

Evaluating: PoC - NEJMcpc2100279 (2).pdf

Evaluating: PoC - NEJMcpc2300900 (2).pdf

Evaluating: PoC - NEJMcpc2309383 (2).pdf

Evaluating: PoC - NEJMcpc2309500 (2).pdf

--- Evaluation Summary ---
Successfully evaluated: 0 cases
Skipped (Missing Ground Truth): 4 cases
Skipped (No Diagnosis Generated): 0 cases
Skipped (Error Parsing Results): 0 cases
Total Similarity API calls made (excluding cache hits): 0

No cases were successfully evaluated. Cannot calculate aggregate metrics or generate plots.
