In [23]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))# 

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# MedExplain — Safe, Intent-Aware Medical Explanation Backend

MedExplain is a backend medical explanation system built for the **MedGemma Impact Challenge**.  
The goal of this project is to demonstrate **responsible medical reasoning**, not just text generation.

The system focuses on *how* and *when* to respond, especially in sensitive or ambiguous medical contexts.

---

## Key Features

- **Low-Information Detection**  
  Identifies vague or insufficient user queries and requests clarification instead of hallucinating.

- **Safety & Medical Guardrails**  
  Detects red-flag symptoms and prioritizes emergency guidance when appropriate.

- **Intent-Aware Explanations**  
  Routes responses based on inferred user intent:
  - Patient
  - Caregiver
  - Student

- **Model-Agnostic Architecture**  
  The backend logic is independent of any specific language model and can be directly paired with MedGemma or similar models.

---

## Reproducibility Note

To ensure reliable execution in constrained environments (e.g., Kaggle), this notebook uses a **lightweight placeholder language model**.  
The emphasis of this submission is on **architecture, safety logic, and decision flow**, rather than generation quality.

All critical reasoning components (safety checks, intent routing, fallback handling) operate independently of model size.

---

## Scope

- This notebook contains **backend inference logic only**
- UI components are intentionally excluded
- No diagnostic or prescriptive medical advice is provided

---

The following sections demonstrate the full MedExplain pipeline with representative test cases.


Why MedGemma?
- MedGemma is designed for medical-domain language and safety.
- Generic LLMs often hallucinate or overgeneralize in health contexts.
- MedExplain uses MedGemma only when sufficient information and safety checks are satisfied.
-The downloadable PDF enables auditability and human review, which is critical for medical AI systems.

Problem Motivation

Many users turn to AI for medical explanations, but current systems often:
- provide confident answers with incomplete information
- fail to escalate urgent symptoms
- do not adapt explanations to the user’s background

This creates a safety gap between information access and responsible guidance.
MedExplain focuses on closing this gap.


Expected Impact

MedExplain can support first-stage medical understanding for patients, caregivers, and students,
while reducing unsafe or misleading AI-generated health advice.The system is designed to be integrated into telehealth portals, medical chat assistants, or educational tools.



Deployment Considerations

- Backend-first design enables deployment as an API service
- UI intentionally excluded for reproducibility
- Lightweight models allow GPU inference on standard hardware
- Safety logic reduces legal and clinical risk in production settings


This notebook demonstrates the backend reasoning system for MedExplain.
UI is excluded to prioritize reproducibility and evaluation clarity.

In [24]:
import torch

print("Torch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU")

Torch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4


In [25]:
import torch
from transformers import GPT2Config, GPT2LMHeadModel, GPT2TokenizerFast
# ===== Model Registry =====
MODEL_REGISTRY = {
    "medical": {
        "name": "MedGemma",
        "description": "HAI-DEF medical reasoning model",
        "enabled": True
    },
    "placeholder": {
        "name": "Lightweight Mock Model",
        "description": "Used for reproducibility only",
        "enabled": True
    }
}

# Create tiny config (random weights, instant)
config = GPT2Config(
    vocab_size=50257,
    n_positions=128,
    n_embd=64,
    n_layer=2,
    n_head=2
)

model = GPT2LMHeadModel(config).eval()

# Use local tokenizer (comes with transformers)
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

print("Local mock model ready")


Local mock model ready


In [26]:
# ===== Model Selection Logic =====

def select_model(
    confidence: str,
    has_red_flag: bool,
    is_low_info: bool
) -> str:
    """
    Selects which model should be used for response generation.

    MedGemma is used ONLY when:
    - confidence is high
    - no red flags are detected
    - input is not low-information
    """

    if (
        confidence == "high"
        and not has_red_flag
        and not is_low_info
        and MODEL_REGISTRY["medical"]["enabled"]
    ):
        return "medical"

    return "placeholder"


In [27]:
@torch.no_grad()
def run_inference(prompt, max_new_tokens=50):
    inputs = tokenizer(prompt, return_tensors="pt")

    output = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

    return tokenizer.decode(output[0], skip_special_tokens=True)


In [28]:
def is_low_information(query: str) -> bool:
    if not query:
        return True

    query = query.lower().strip()

    # Very short or vague queries
    if len(query.split()) < 3:
        return True

    vague_phrases = {
        "help", "explain", "tell me",
        "what is this", "i dont know",
        "not feeling well"
    }

    return any(p in query for p in vague_phrases)


### Low-Information Detection – Sanity Check

The following examples demonstrate how vague or insufficient queries
are detected before medical explanation generation.


In [29]:
tests = [
    "help",
    "explain blood pressure",
    "my blood pressure is 150/95",
    ""
]

for t in tests:
    print(f"{t!r} → low_info={is_low_information(t)}")


'help' → low_info=True
'explain blood pressure' → low_info=True
'my blood pressure is 150/95' → low_info=False
'' → low_info=True


In [30]:
def low_info_response(intent: str = "patient") -> str:
    base = {
        "patient": (
            "I want to help, but I need a bit more information. "
            "Could you share specific details like symptoms, numbers (for example, blood pressure values), "
            "duration, or any medical context you already have?"
        ),
        "caregiver": (
            "To give accurate guidance, please provide more details such as observations, measurements, "
            "timeline, or any diagnoses you’re aware of."
        ),
        "student": (
            "Please clarify the concept or provide context (definitions, examples, or data) "
            "so I can explain it accurately."
        ),
    }
    return base.get(intent, base["patient"])


In [31]:
for role in ["patient", "caregiver", "student"]:
    print(role, "→", low_info_response(role))


patient → I want to help, but I need a bit more information. Could you share specific details like symptoms, numbers (for example, blood pressure values), duration, or any medical context you already have?
caregiver → To give accurate guidance, please provide more details such as observations, measurements, timeline, or any diagnoses you’re aware of.
student → Please clarify the concept or provide context (definitions, examples, or data) so I can explain it accurately.


In [32]:
def estimate_confidence(query: str) -> str:
    if is_low_information(query):
        return "low"

    length = len(query.split())

    if length < 6:
        return "medium"

    if any(char.isdigit() for char in query):
        return "high"

    return "medium"


In [33]:
CONFIDENCE_LABELS = {
    "high": "Confidence level: High — explanation based on clear information.",
    "medium": "Confidence level: Medium — explanation based on limited context.",
    "low": "Confidence level: Low — more details may be needed for accuracy."
}


In [34]:
RED_FLAG_KEYWORDS = {
    "chest pain",
    "shortness of breath",
    "unconscious",
    "fainting",
    "seizure",
    "severe bleeding",
    "stroke",
    "heart attack",
    "suicidal",
    "kill myself"
}

def contains_red_flag(text: str) -> bool:
    if not text:
        return False
    t = text.lower()
    return any(k in t for k in RED_FLAG_KEYWORDS)


def safety_prefix(intent: str = "patient") -> str:
    prefixes = {
        "patient": (
            "I’m not a doctor, but I can share general medical information to help you understand this better. "
        ),
        "caregiver": (
            "I can provide general medical information to support understanding, but this is not a diagnosis. "
        ),
        "student": (
            "This explanation is for educational purposes and does not replace professional medical training. "
        ),
    }
    return prefixes.get(intent, prefixes["patient"])


def red_flag_response() -> str:
    return (
        "This may be serious. If you or someone else is experiencing these symptoms, "
        "please seek immediate medical attention or contact local emergency services right now."
    )


In [35]:
tests = [
    "chest pain and shortness of breath",
    "blood pressure is 150/95",
    "help"
]

for t in tests:
    print(t, "→ red_flag =", contains_red_flag(t))


chest pain and shortness of breath → red_flag = True
blood pressure is 150/95 → red_flag = False
help → red_flag = False


In [36]:
BASIC_EXPLANATIONS = {
    "blood pressure": (
        "Blood pressure measures the force of blood pushing against artery walls. "
        "It is written as two numbers, for example 120/80. "
        "The first number (systolic) measures pressure when the heart beats, "
        "and the second (diastolic) measures pressure when the heart rests."
    )
}

def simple_educational_fallback(query: str):
    q = query.lower()
    for key, text in BASIC_EXPLANATIONS.items():
        if key in q:
            return text
    return None


In [37]:
BASIC_EXPLANATIONS.update({
    "diabetes": (
        "Diabetes is a condition in which the body cannot properly regulate blood sugar levels. "
        "It occurs when the body does not produce enough insulin or cannot use insulin effectively."
    ),

    "systolic": (
        "Systolic pressure is the top number in a blood pressure reading. "
        "It represents the pressure in the arteries when the heart beats."
    ),

    "fever": (
        "Fever is a temporary rise in body temperature, usually caused by an infection. "
        "It is a sign that the body is responding to illness."
    )
})


In [38]:
def detect_intent(query: str) -> str:
    if not query:
        return "patient"

    q = query.lower()

    if any(w in q for w in ["study", "exam", "definition", "explain concept"]):
        return "student"
    if any(w in q for w in ["my father", "my mother", "patient", "care for"]):
        return "caregiver"

    return "patient"


In [39]:

def medexplain(query: str) -> str:
    # 1. Detect intent
    intent = detect_intent(query)

    # 2. Red-flag safety check
    if contains_red_flag(query):
        return red_flag_response()

    # 3. Low-information check
    if is_low_information(query):
        return low_info_response(intent)

    # 4. Build safe, intent-aware prompt
    prompt = (
        safety_prefix(intent)
        + "\n\n"
        + f"Explain the following in a clear and appropriate way:\n{query}\n"
    )
         # Try simple educational fallback
    fallback = simple_educational_fallback(query)
    if fallback:
        return safety_prefix(intent) + fallback

    # 5. Run model inference
    raw = run_inference(prompt)
    return raw.split("\n")[0] if raw else raw



In [40]:
def medexplain_with_trace(query: str):
    decision_log = []

    # --- Step 1: Intent detection ---
    intent = detect_intent(query)
    decision_log.append({
        "code": "INTENT_DETECTED",
        "detail": f"User intent classified as '{intent}'"
    })

    # --- Step 2: Confidence estimation ---
    confidence = estimate_confidence(query)
    decision_log.append({
        "code": "CONFIDENCE_ESTIMATED",
        "detail": f"Input confidence assessed as '{confidence}'"
    })

    # --- Step 3: Red-flag safety check ---
    if contains_red_flag(query):
        decision_log.append({
            "code": "RED_FLAG_TRIGGERED",
            "detail": "Emergency symptoms detected; model generation blocked"
        })
        return red_flag_response(), decision_log

    # --- Step 4: Low-information gate ---
    if is_low_information(query):
        decision_log.append({
            "code": "LOW_INFORMATION_BLOCK",
            "detail": "Insufficient medical context; clarification requested"
        })
        return low_info_response(intent), decision_log

     # ✅ Step 5: Rule-based educational fallback (THIS GOES HERE)
    fallback = simple_educational_fallback(query)
    if fallback:
        decision_log.append({
            "code": "RULE_BASED_FALLBACK",
            "detail": "Answered using predefined educational content"
        })
        answer = (
            CONFIDENCE_LABELS[confidence]
            + "\n"
            + safety_prefix(intent)
            + fallback
        )
        return answer, decision_log

    # --- Step 6: Model selection ---
    has_red_flag = contains_red_flag(query)
    low_info = is_low_information(query)

    selected_model = select_model(
        confidence=confidence,
        has_red_flag=has_red_flag,
        is_low_info=low_info
    )

    decision_log.append({
        "code": "MODEL_SELECTED",
        "detail": f"Model routed to {selected_model}"
    })

       

    # --- Step 7: Controlled generation ---
    prompt = safety_prefix(intent) + f"\nExplain clearly:\n{query}"

    if selected_model == "medical":
        decision_log.append({
            "code": "MODEL_APPROVED_MEDGEMMA",
            "detail": "High-confidence input; medical model allowed"
        })
        raw = run_inference(prompt)  # MedGemma path (future-ready)
    else:
        decision_log.append({
            "code": "MODEL_FALLBACK_PLACEHOLDER",
            "detail": "Non-medical or low-risk case; placeholder used"
        })
        raw = run_inference(prompt)

    answer = CONFIDENCE_LABELS[confidence] + "\n\n" + raw
    return answer, decision_log


In [41]:
!pip install -q reportlab


In [42]:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

def export_to_pdf(filename, query, answer, decision_log=None):
    c = canvas.Canvas(filename, pagesize=letter)
    width, height = letter

    y = height - 50
    c.setFont("Helvetica", 10)

    # Title
    c.drawString(50, y, "MedExplain Output")
    y -= 30

    # Query
    c.drawString(50, y, f"Query: {query}")
    y -= 20

    # Answer
    for line in answer.split("\n"):
        c.drawString(50, y, line[:90])
        y -= 14

        if y < 80:
            c.showPage()
            c.setFont("Helvetica", 10)
            y = height - 50

    # Decision Trace (structured audit log)
    if decision_log:
        y -= 20
        c.drawString(50, y, "Decision Trace:")
        y -= 20

        for step in decision_log:
            c.drawString(
                60,
                y,
                f"- [{step['code']}] {step['detail']}"
            )
            y -= 14

            if y < 80:
                c.showPage()
                c.setFont("Helvetica", 10)
                y = height - 50

    c.save()


In [43]:
answer, reasoning = medexplain_with_trace("help")

print("Answer:")
print(answer)
print("\nDecision Trace:")
for step in reasoning:
    print("-", step)


Answer:
I want to help, but I need a bit more information. Could you share specific details like symptoms, numbers (for example, blood pressure values), duration, or any medical context you already have?

Decision Trace:
- {'code': 'INTENT_DETECTED', 'detail': "User intent classified as 'patient'"}
- {'code': 'CONFIDENCE_ESTIMATED', 'detail': "Input confidence assessed as 'low'"}
- {'code': 'LOW_INFORMATION_BLOCK', 'detail': 'Insufficient medical context; clarification requested'}


In [48]:
test_queries = [
    "What is diabetes?",
    "I feel sick",
    "Severe chest pain and I can't breathe",
    "Explain binary search",
    "Do I have cancer?",
    "What medicine should I take for fever?",
    "help",
    "My father has high blood pressure, what does it mean?",
    "Explain the pathophysiology of asthma for exams",
    "asdfghjkl"
]


In [49]:
# ===== Governance Metrics (Judge-Facing) =====

from collections import Counter

def compute_governance_metrics(test_cases):
    metrics = Counter()

    for q in test_cases:
        _, decision_log = medexplain_with_trace(q)

        for step in decision_log:
            code = step["code"]
            metrics[code] += 1

    print("=== MedExplain Governance Metrics ===\n")

    print("Total queries evaluated:", len(test_cases))
    print()

    print("Safety & Control Metrics:")
    print("• Red-flag escalations:", metrics["RED_FLAG_TRIGGERED"])
    print("• Low-information blocks:", metrics["LOW_INFORMATION_BLOCK"])
    print()

    print("Model Usage Metrics:")
    print("• MedGemma approved:", metrics["MODEL_APPROVED_MEDGEMMA"])
    print("• Placeholder fallback:", metrics["MODEL_FALLBACK_PLACEHOLDER"])
    print()

    print("Decision Transparency:")
    print("• Intent detected:", metrics["INTENT_DETECTED"])
    print("• Confidence estimated:", metrics["CONFIDENCE_ESTIMATED"])
    print()

    print("Raw metric counts:", dict(metrics))


compute_governance_metrics(test_queries)



=== MedExplain Governance Metrics ===

Total queries evaluated: 10

Safety & Control Metrics:
• Red-flag escalations: 1
• Low-information blocks: 4

Model Usage Metrics:
• MedGemma approved: 0
• Placeholder fallback: 2

Decision Transparency:
• Intent detected: 10
• Confidence estimated: 10

Raw metric counts: {'INTENT_DETECTED': 10, 'CONFIDENCE_ESTIMATED': 10, 'RULE_BASED_FALLBACK': 3, 'MODEL_SELECTED': 2, 'MODEL_FALLBACK_PLACEHOLDER': 2, 'RED_FLAG_TRIGGERED': 1, 'LOW_INFORMATION_BLOCK': 4}


In [50]:
# ===== Evaluation & Demonstration Harness =====

TEST_CASES = [
    "help",
    "my blood pressure is 150/95",
    "chest pain and sweating",
    "explain blood pressure for exam",
]

def run_evaluation_suite():
    print("=== MedExplain Evaluation Suite ===\n")

    for q in TEST_CASES:
        answer, decision_log = medexplain_with_trace(q)

        print(f"Q: {q}")
        print("A:", answer)
        print("Decision Trace:")

        for step in decision_log:
            print(" •", step)

        print("-" * 60)

run_evaluation_suite()


=== MedExplain Evaluation Suite ===

Q: help
A: I want to help, but I need a bit more information. Could you share specific details like symptoms, numbers (for example, blood pressure values), duration, or any medical context you already have?
Decision Trace:
 • {'code': 'INTENT_DETECTED', 'detail': "User intent classified as 'patient'"}
 • {'code': 'CONFIDENCE_ESTIMATED', 'detail': "Input confidence assessed as 'low'"}
 • {'code': 'LOW_INFORMATION_BLOCK', 'detail': 'Insufficient medical context; clarification requested'}
------------------------------------------------------------
Q: my blood pressure is 150/95
A: Confidence level: Medium — explanation based on limited context.
I’m not a doctor, but I can share general medical information to help you understand this better. Blood pressure measures the force of blood pushing against artery walls. It is written as two numbers, for example 120/80. The first number (systolic) measures pressure when the heart beats, and the second (diastol

In [51]:
queries = [
    "help",
    "my blood pressure is 150/95",
    "chest pain and sweating",
    "explain blood pressure for exam",
    "what is systolic pressure"
]

for q in queries:
    print("Q:", q)
    print("A:", medexplain(q))
    print("-" * 40)


Q: help
A: I want to help, but I need a bit more information. Could you share specific details like symptoms, numbers (for example, blood pressure values), duration, or any medical context you already have?
----------------------------------------
Q: my blood pressure is 150/95
A: I’m not a doctor, but I can share general medical information to help you understand this better. Blood pressure measures the force of blood pushing against artery walls. It is written as two numbers, for example 120/80. The first number (systolic) measures pressure when the heart beats, and the second (diastolic) measures pressure when the heart rests.
----------------------------------------
Q: chest pain and sweating
A: This may be serious. If you or someone else is experiencing these symptoms, please seek immediate medical attention or contact local emergency services right now.
----------------------------------------
Q: explain blood pressure for exam
A: Please clarify the concept or provide context (de

In [52]:
answer, trace = medexplain_with_trace("my blood pressure is 150/95")

export_to_pdf(
    "medexplain_output.pdf",
    "my blood pressure is 150/95",
    answer,
    trace
)

print("PDF generated: medexplain_output.pdf")


PDF generated: medexplain_output.pdf


In [53]:
test_queries = [
    "What is diabetes?",
    "my blood pressure is 150/95",
    "what is systolic pressure",
    "I feel sick",
    "chest pain and sweating"
]

for q in test_queries:
    ans, trace = medexplain_with_trace(q)
    print("Q:", q)
    print("A:", ans)
    print("-" * 40)


Q: What is diabetes?
A: Confidence level: Medium — explanation based on limited context.
I’m not a doctor, but I can share general medical information to help you understand this better. Diabetes is a condition in which the body cannot properly regulate blood sugar levels. It occurs when the body does not produce enough insulin or cannot use insulin effectively.
----------------------------------------
Q: my blood pressure is 150/95
A: Confidence level: Medium — explanation based on limited context.
I’m not a doctor, but I can share general medical information to help you understand this better. Blood pressure measures the force of blood pushing against artery walls. It is written as two numbers, for example 120/80. The first number (systolic) measures pressure when the heart beats, and the second (diastolic) measures pressure when the heart rests.
----------------------------------------
Q: what is systolic pressure
A: Confidence level: Medium — explanation based on limited context.
I

# --- Demo / Evaluation ---
