Medical transfer test — FabricationGuard on MedQA + PubMedQA + HaluEval-medical

**Motivation**

DeepMind shipped AI co-clinician (https://deepmind.google/blog/ai-co-clinician/) on 2026-04-30 with a dual-agent (Planner monitors Talker) safety architecture. The Planner is doing what activation-level probes do — gating an agent based on a monitored signal. The DeepMind work is closed-weight and research-only.

FabricationGuard L31 (https://huggingface.co/openinterp/fabricationguard-qwen36-27b-l31-v2) is the open analogue at the residual-stream level. Trained on HaluEval, tested cross-task on SimpleQA, AUROC 0.88. The medical transfer test has not been measured.

**Why this matters**

- CZI Biohub committed $500M to AI-biology on 2026-04-29
- EU AI Act Article 14 enforcement on high-risk medical AI starts Aug 2026
- DeepMind's post admits AI underperforms physicians at red-flag identification — rare-class detection under distribution shift, paradigmatic linear-probe use case
- /probebench/applications/medical-ai page on the site needs measured numbers, not just methodology

**Acceptance criteria**

Run FabricationGuard L31 (no recalibration, then with recalibration) on:
- MedQA (USMLE multiple choice) — adapt fabrication detection to MC by scoring residual at end-of-question
- PubMedQA (yes/no/maybe with biomedical abstract) — open-ended factual
- HaluEval-medical subset if available, else HaluEval-QA filtered for medical entities

For each: report AUROC + per-class confusion + N + threshold sweep. Compare to baseline (Qwen3.6 logprob).

Compute: 1 Colab Pro session, ~2-3h. Single Qwen3.6-27B forward pass per prompt.

**Output**

When eval completes, add a new Atlas entry (type: probe-result) at https://github.com/OpenInterpretability/registry/atlas/2026/ documenting the measured transfer numbers. Then update /probebench/applications/medical-ai with the real cross-domain AUROC table.

If AUROC drops below 0.65 on medical: publish as honest-negative finding (atlas-entry type, "cross-domain hallucination probes don't generalize to medical without recalibration") — still high value for the field.

**Link to related work**

- DeepMind post: https://deepmind.google/blog/ai-co-clinician/
- FabricationGuard probe: https://huggingface.co/openinterp/fabricationguard-qwen36-27b-l31-v2
- ProbeBench medical-ai page: https://openinterp.org/probebench/applications/medical-ai
- agent-probe-guard (related, capability+thinking gating): https://huggingface.co/datasets/caiovicentino1/agent-probe-guard-qwen36-27b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medical transfer test — FabricationGuard on MedQA + PubMedQA + HaluEval-medical #11

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Medical transfer test — FabricationGuard on MedQA + PubMedQA + HaluEval-medical #11

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions