# Setup and Initiation

In [1]:
!pip install -U -q "google"
!pip install -U -q "google.genai"
!pip install PyPDF2

import os
from google.colab import userdata
from google.colab import drive
os.environ["GEMINI_API_KEY"] = userdata.get("GOOGLE_API_KEY")

drive.mount("/content/drive")
os.chdir("/content/drive/MyDrive/Google AI Studio")

import base64
import os
from google import genai
from google.genai import types
from google.colab import files

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Define Prompts

## Differential Diagnoses

In [7]:


PROMPT_1 = """Prompt: NEJM Medical Case Analysis with Atom-of-Thought Reasoning (JSON Output)
Goal:
Analyze a provided NEJM medical case record and generate a differential diagnosis (top 10) ranked in order of likelihood/confidence, along with the final diagnosis. Justify each ranking using atom-of-thought reasoning and suggest next diagnostic steps/tests a physician would perform to confirm or rule out conditions. The response must be formatted as structured JSON.

Context Dump:
You are a highly advanced medical AI trained in clinical reasoning, differential diagnosis, and diagnostic testing. Your task is to analyze patient case data methodically, using the atom-of-thought reasoning process, breaking down each step into granular diagnostic components before synthesizing conclusions. You follow evidence-based medicine and best clinical practices.

You will be provided with a full NEJM medical case record, including history, symptoms, lab results, imaging findings, and other relevant data. Your role is to act as an expert diagnostician, systematically working through the case to generate an accurate and well-supported differential diagnosis.

Warnings & Considerations:
Do NOT fabricate data—base all reasoning strictly on the given case information.
Clearly state uncertainty levels for each differential diagnosis.
Emphasize clinical reasoning rather than just listing conditions.
Do NOT provide patient-specific medical advice—this is a simulated diagnostic reasoning exercise.

Return Format (JSON Structure):
Your response must be structured as a valid JSON object using the following schema:

{
  "case_id": "<unique_case_id>",
  "case_summary": "<brief summary of the patient's key symptoms, history, and findings>",
  "differential_diagnosis": [
    {
      "diagnosis": "<most likely diagnosis>",
      "reasoning": "<step-by-step justification using atom-of-thought reasoning>",
      "confidence_level": "<High/Medium/Low>"
    },
    {
      "diagnosis": "<second most likely diagnosis>",
      "reasoning": "<step-by-step justification>",
      "confidence_level": "<High/Medium/Low>"
    },
    {
      "diagnosis": "<third most likely diagnosis>",
      "reasoning": "<step-by-step justification>",
      "confidence_level": "<High/Medium/Low>"
    },
    {
      "diagnosis": "<fourth most likely diagnosis>",
      "reasoning": "<step-by-step justification>",
      "confidence_level": "<High/Medium/Low>"
    },
    {
      "diagnosis": "<fifth most likely diagnosis>",
      "reasoning": "<step-by-step justification>",
      "confidence_level": "<High/Medium/Low>"
    }
  ],
  "final_diagnosis": {
    "diagnosis": "<most probable final diagnosis>",
    "justification": "<detailed reasoning explaining why this diagnosis is most likely>"
  },
  "next_steps_recommended_tests": [
    "<test 1: explanation>",
    "<test 2: explanation>",
    "<test 3: explanation>"
  ]
}

Atom-of-Thought Reasoning Process:
For each differential diagnosis, apply the following structured approach:

Identify key clinical clues (e.g., symptoms, lab values, imaging findings).
Compare with characteristic disease patterns (match findings to potential conditions).
Assess probability & fit (Does this condition fully explain the case? Are there inconsistencies?).
Consider alternative explanations (What else could explain this? Are there competing diagnoses?).
Rank & justify (Determine the most likely and why).
Determine next steps (What additional data is needed to confirm the diagnosis?).

Input Example (User Provides):
"Here is a NEJM case record: [Insert case details]."

Expected Output Example (LLM Response in JSON):
{
  "case_id": "NEJMcpc2309500",
  "case_summary": "A 30-year-old postpartum woman developed persistent fever, worsening abdominal pain, leukocytosis, ascites, skin lesions progressing to necrotic ulcers, hepatosplenomegaly, and signs suggestive of septic emboli.",
  "differential_diagnosis": [
    {
      "diagnosis": "Septic Pelvic Thrombophlebitis",
      "reasoning": "Persistent fever, pelvic fluid collections, postpartum timing strongly suggests septic thrombophlebitis; however, absence of clear venous thrombosis on imaging is atypical.",
      "confidence_level": "High"
    },
    {
      "diagnosis": "Necrotizing Fasciitis",
      "reasoning": "Rapidly progressing ulcerative skin lesions, systemic symptoms, postpartum setting could indicate necrotizing infection; however, lesions initially localized and imaging inconsistent.",
      "confidence_level": "Moderate"
    },
    {
      "diagnosis": "Endometritis complicated by Pelvic Abscess",
      "reasoning": "Fever, abdominal pain, postpartum state, and imaging findings consistent with infection/abscess; yet, ongoing systemic involvement atypical for uncomplicated endometritis.",
      "confidence_level": "Moderate"
    },
    {
      "diagnosis": "Disseminated Intravascular Coagulation (DIC)",
      "reasoning": "Elevated D-dimer, fibrinogen, skin lesions suggest DIC, but coagulation abnormalities relatively mild and platelet count elevated.",
      "confidence_level": "Low-Moderate"
    },
    {
      "diagnosis": "Pyoderma Gangrenosum (PG)",
      "reasoning": "Ulcerative skin lesions might suggest PG associated with underlying inflammatory bowel disease or autoimmune conditions; however, severe systemic findings argue against it.",
      "confidence_level": "Low"
    }
  ],
  "final_diagnosis": {
    "diagnosis": "Septic Pelvic Thrombophlebitis",
    "justification": "Persistent postpartum fever, pelvic abscess formation, rapidly progressive skin lesions suggesting septic embolic phenomena, hepatosplenomegaly, and severe systemic involvement strongly indicate septic pelvic thrombophlebitis."
  },
  "next_steps_recommended_tests": [
    "Detailed pelvic MRI or MR venography to visualize venous thrombosis.",
    "Blood and wound cultures to guide antibiotic therapy.",
    "Possible surgical exploration and debridement if necrotizing infection suspected."
  ]
}"""

## Patient-Clinician Conversations

In [None]:
PROMPT_2 = """Prompt: Patient-Clinician Interaction Simulation (Script Format)
Role:
You are a medical dialogue generator trained to simulate realistic and succinct clinician-patient interactions based on detailed medical case records.

Objective:
Given the full text of a patient case history (e.g., from NEJM Case Records), simulate a natural, human-like conversation between a doctor and a patient as it would occur during a real-world clinical visit. You will assume both roles (doctor and patient) and follow a logical conversational flow.

Instructions:

Persona Setup:

You are both the Doctor and the Patient based on the uploaded medical case history.

The patient presents for evaluation of symptoms, and the doctor proceeds to ask clarifying and relevant questions in a natural flow.

Maintain clinical realism: do not include dialogue that wouldn’t typically occur in a normal patient-clinician setting.

Conversation Structure:

Start with a greeting and an open-ended question from the doctor (e.g., “What brings you in today?”).

Patient shares their chief complaints.

The doctor then collects the following in a logical conversational order:

History of Present Illness (HPI)

Past Medical History (PMH)

Past Surgical History (PSH)

Medication and Allergy History

Family History

Social History (including smoking, alcohol, occupation, etc.)

Review of Symptoms and key vitals (as per what's available in the case file)

Ask follow-up questions only when clinically appropriate.

Avoid speculative diagnostic reasoning or technical discussion that wouldn’t be spoken aloud to the patient.

Tone and Length:

Maintain a professional, empathetic tone throughout.

Keep the dialogue realistic and succinct — approximately 25–30 turns of back-and-forth conversation.

Do not exceed normal conversational detail.

Output Requirements: Output should contain two sections:

Section A – Conversation (Output JSON)
Structure the conversation in a structured JSON format for easy parsing:

json
Copy
{
  "conversation": [
    {"speaker": "Doctor", "utterance": "Hello, I’m Dr. Smith. What brings you in today?"},
    {"speaker": "Patient", "utterance": "Hi Doctor, I’ve been having chest pain for the past three days..."},
    ...
  ]
}
Section B – Doctor’s Note Summary
A short, clinically worded summary note (written from the doctor’s perspective) that summarizes key findings from the encounter using appropriate medical terminology. For example:

pgsql
Copy
Patient is a 56-year-old male presenting with 3-day history of substernal chest pain radiating to the left arm. PMH significant for hypertension and hyperlipidemia. Denies smoking or alcohol use. Family history notable for coronary artery disease in father. Vitals stable on presentation.
Input:
Use the following detailed patient case history as your reference material to simulate the interaction and populate all content above."""

# Function Definitions for Generating Conversations and Differentials

## Differential Diagnoses

In [3]:
def generate(case_description, prompt=PROMPT_1):
    full_prompt = f"{prompt}\n\n{case_description}"

    client = genai.Client(
        api_key=os.environ.get("GEMINI_API_KEY"),
    )

    model = "gemini-2.5-pro-exp-03-25"
    contents = [
        types.Content(
            role="user",
            parts=[
                types.Part.from_text(text=full_prompt),
            ],
        ),
    ]
    generate_content_config = types.GenerateContentConfig(
        temperature=0,
        response_mime_type="application/json",
    )

    for chunk in client.models.generate_content_stream(
        model=model,
        contents=contents,
        config=generate_content_config,
    ):
        print(chunk.text, end="")

## Patient-Clinician Conversations

In [4]:
def generate_conv(case_description2, prompt=PROMPT_2):
    full_prompt2 = f"{prompt}\n\n{case_description2}"

    client = genai.Client(
        api_key=os.environ.get("GEMINI_API_KEY"),
    )

    model = "gemini-2.5-pro-exp-03-25"
    contents = [
        types.Content(
            role="user",
            parts=[types.Part.from_text(text=full_prompt2)],
        ),
    ]
    generate_content_config = types.GenerateContentConfig(
        temperature=0,
        response_mime_type="application/json",
    )

    conversation = ""
    for chunk in client.models.generate_content_stream(
        model=model,
        contents=contents,
        config=generate_content_config,
    ):
        # Check if chunk.text is not None before concatenating
        if chunk.text is not None:
            conversation += chunk.text
            print(chunk.text, end="")

    return conversation

# Output

In [5]:
uploaded = files.upload()

import PyPDF2

pdf_filename = list(uploaded.keys())[0]

with open(pdf_filename, "rb") as file:
    reader = PyPDF2.PdfReader(file)
    case_description2 = ""
    for page in reader.pages:
        case_description2 += page.extract_text() or ""

conv_output = generate_conv(case_description2)

Saving PoC - NEJMcpc2309383.pdf to PoC - NEJMcpc2309383 (1).pdf
{
  "Section A - Conversation": [
    {
      "speaker": "Doctor",
      "utterance": "Hello Mr. Jones, I'm Dr. Smith. I understand you're back in the hospital? What's been happening?"
    },
    {
      "speaker": "Patient",
      "utterance": "Hi Doctor. Yes, I'm back. These fevers just won't quit, even with the latest antibiotic, the ciprofloxacin."
    },
    {
      "speaker": "Doctor",
      "utterance": "I'm sorry to hear that. Can you remind me when these fevers started?"
    },
    {
      "speaker": "Patient",
      "utterance": "It's been almost two years now. Started with what they thought was pneumonia. The cough went away, but the fevers kept coming back."
    },
    {
      "speaker": "Doctor",
      "utterance": "And you've had chills and night sweats with them?"
    },
    {
      "speaker": "Patient",
      "utterance": "Yes, pretty often. And this discomfort in my upper stomach area."
    },
    {
      

In [6]:
generate(conv_output)

{
  "case_id": "NEJM_FUO_Hepatosplenic_Lesions_73M",
  "case_summary": "A 73-year-old male with a history of CAD presents with a 22-month history of recurrent fevers, chills, night sweats, epigastric discomfort, fatigue, and splenomegaly. Investigations revealed persistent anemia, elevated alkaline phosphatase, stable pulmonary nodules, and progressive hepato-splenic lesions (initially non-necrotizing granulomas, later evolving to abscesses). Multiple bacterial species (K. pneumoniae, E. coli, E. faecalis) have been cultured from liver lesions over time. Symptoms persist despite multiple antibiotic courses (doxycycline, ciprofloxacin, IV antibiotics) and a trial of prednisone. Recent findings include portal hypertensive gastropathy and interstitial pulmonary edema. Currently admitted for persistent fever despite ciprofloxacin, with recent liver aspirate growing E. faecalis, prompting initiation of amoxicillin.",
  "differential_diagnosis": [
    {
      "diagnosis": "Recurrent Pyogenic