An open-source ambient-voice-technology (AVT) reference implementation that is conformant-by-construction — it measures itself against the AVT Metrics Taxonomy (the model of experts) and reports which UK health-tech regulatory gates it would face.
⛔ NOT FOR CLINICAL DEPLOYMENT. A clinical ambient-scribe that writes to the record is a medical device. Deploying it triggers MHRA registration, UKCA, classification, DCB0129 + DCB0160, DPIA and post-market surveillance — see
COMPLIANCE.md. This repo runs on synthetic data only and exists to demonstrate the measurement × regulation method, not to be used in care.
Most AI scribes ship the pipeline and stop. This ships the pipeline with its own assurance harness and a generated compliance map — the union nobody usually packages together:
- Pipeline — capture → transcribe (OpenAI) → summarise into a structured note → FHIR-shaped
write-back. (Architecture echoes
i-dot-ai/minute.) - Assurance harness — the taxonomy's metrics as an executable test suite scoring the pipeline's own output (WER, hallucination, omission, negation, write-back fidelity). The computable metrics run; the rest are honestly marked "declared — not automated here".
- Compliance manifest —
COMPLIANCE.md, generated from the regulatory twin: device class → gates → which metric evidences each.
The taxonomy is the spec; the harness is the acceptance test; the twin says whether it would be legal. Build to pass the harness.
uv sync # or: pip install -e .
cp .env.example .env # add your OPENAI_API_KEY
avt compliance # generate COMPLIANCE.md (no API key needed)
avt synth # make synthetic consultation audio (OpenAI TTS)
avt demo # ASR → note → FHIR → assurance scorecardavt demo runs the full loop on synthetic/encounter_01/ (a fictional GP consultation) and
prints a scorecard: each computable metric's score, and how many of the taxonomy's metrics this
reference automates, by tier.
avt_reference/pipeline/ transcribe (OpenAI) · summarise · FHIR
avt_reference/harness/ computable metrics + scorer (joins to the taxonomy)
avt_reference/compliance COMPLIANCE.md generator
twin_export/ snapshot of the taxonomy corpus + gate map (from the twin)
synthetic/ fictional encounter fixtures (no PII, ever)
The reference pipeline summarises into a structured note and FHIR-shaped write-back, but it does
not link concepts to a clinical terminology. A companion live demo (in the regulatory-twin
app, gated) adds a concept-grounding layer on top of the same pipeline: it pulls the note into
discrete clinical concepts (condition | medication | symptom | procedure | allergy | finding),
tags negation, attaches candidate SNOMED CT / dm+d terms, and runs a grounding pass that
checks each concept back against the source transcript — surfacing a Concept grounding rate
metric (joins the taxonomy's Write-back Fidelity + Clinical Keyword clusters).
⚠️ Those SNOMED / dm+d terms are LLM-suggested candidates, not terminology-server lookups — illustrative only. The NHS production path is a SNOMED-linked NLP pipeline: MedCAT / CogStack (or Amazon Comprehend Medical / Azure Text Analytics for Health). An LLM guessing a code is not the same as resolving one.
- Synthetic only. Real clinical audio is governance-gated; a real-data, deployable version is a separate, regulated programme.
- The LLM-judge metrics (hallucination/omission/negation) are themselves taxonomy-flagged as unvalidated — treat the numbers as indicative, not ground truth.
- The compliance map is descriptive, generated from a model — not legal advice or a conformity claim.
Code MIT (see LICENSE). Builds on the AVT Metrics Taxonomy (Dan Schofield, CC BY 4.0) and the
i-dot-ai/minute architecture — see NOTICE.