Skip to content

anix-lynch/baymax

Repository files navigation

🚑 B A Y M A X

An evidence-to-action ER nurse assistant simulation

Baymax reads the hidden history behind one sentence, traces the trajectory, brakes on the kidney, then speaks two ways — clinically to the doctor, plainly to the daughter

🔗 Live: https://baymax-bice.vercel.app

Patient said: “My mother’s legs are swollen.” Baymax retrieved the heart-failure precedent nobody mentioned.

Baymax is not impressive because it has more data. It is impressive when another perspective changes what it decides to do.

         👁 Patient world      👁 Drug world
          "What is similar?"   "What is hidden?"
                \              /
                 └──🧠 Brain──┘
        👃 Nose ──┤ changes action when perspective changes
                   ↓
              🛑 Brakes → ⚡ Nerves → 🤝 Hands → 🛡 Verify

Watch Him Think

make sync
make audit
make ui

Open http://localhost:8000/ui/. The live surface reads the generated audit receipt, narrates what Baymax noticed, admits uncertainty, shows what acted or stopped, and only claims verification when the receipt proves it.

Four Golden Cases

CASE 1 · RETRIEVAL DISCOVERY
"My mother's legs are swollen"
→ raw words miss the target
→ evidence-language query retrieves a CHF precedent
→ Baymax surfaces unmentioned evidence before answering
→ next question: short of breath at rest or when lying flat?

CASE 2 · ATTENTION FLIP
routine medication refill
→ served NOSE assigns ESI-attention tier 4
→ expensive eyes stay closed
→ no action attempted

CASE 3 · DECISION FLIP
same critical chest-pain patient
→ bed available: assign_bed
→ ER gridlock: divert
→ capacity perspective changes the action

CASE 4 · BRAKE SAVE
abdominal pain after Ibuprofen
→ patient-only: discharge_plan
→ drug eye: 16/17 exact-Ibuprofen reports marked serious
→ population signal only; no causality claim
→ brake blocks autonomous discharge
→ clinician-review nerve receives and ACKs the case

That is the recruiter path. Four cases show that Baymax discovers hidden evidence, allocates attention, changes decisions when another perspective matters, and stops itself before an unsafe autonomous action.

Technical Audit

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
make test
make audit

The final receipt is outputs/baymax_audit.json. It contains the discovery receipt and action trajectories:

  • attention skip: NOSE stops a routine case before expensive evidence work
  • cross-domain brake: another perspective changes action into human review
  • bounded action: durable state changes and the outcome is independently re-read

It also records exact source commits, organ grades, the solo-engineer A+ ceiling, and the remaining shortest path.

Simulated Deployment Readiness

Baymax includes a deliberately thin FDE-adjacent rehearsal package:

make readiness

It verifies five acceptance gates against the live audit receipt, then injects a false-success receipt and proves the release contract detects it and requires rollback. The result is eligible only for simulated shadow, never presented as hospital deployment or adoption.

Honesty Ledger

outputs/baymax_audit.json ends with a honesty_ledger: every organ maps to one capability lane, may only claim the one sentence its live proof earns, and Baymax ships at the strength of its weakest load-bearing organ — never above it.

Organ Capability lane Verdict gate
NOSE signal-routing served patient signal; ≥95% serious recall gate
Left eye data-truth 55,500 rows scanned this run
Right eye evidence-retrieval 5,000 openFDA reports scanned; population signal, not causality
Brain / Brakes / Hands action-engine decision change, autonomous block, and durable outcome re-read
Nerves clinical-handoff receiver ACK proven; cross-service recovery not yet
Mouth explanation states the FAERS causality boundary

A live gate that fails drops the organ to ❌, and ❌ leaves are never claimed. The headline verdict is the minimum verdict across load-bearing organs, pinned in CI by tests/test_audit.py::test_honesty_ledger_gates_headline_at_weakest_load_bearing_leaf.

Organ Sources

Organ Public source What this repo audits
NOSE healthcare-genai-engineer + healthcare-signal-platform Served patient attention signal plus a separate five-signal openFDA batch proof
Left eye healthcare-ai-data-engineer Full 55,500-row synthetic encounter corpus and reconciliation
Right eye healthcare-da + healthcare-signal-platform Real openFDA evidence plus governed semantic-layer lineage
Brain + hands healthcare-genai-engineer Triage, Bed Ops decision, durable action, ACK, and outcome verification

Honest Boundary

Baymax is a closed-loop ER simulation, not a deployed clinical system. The action engine changes durable SQLite state representing a Bed Ops disposition; it does not write to a hospital EHR or bed-management platform.

The per-case NOSE is deliberately cheap: a versioned deterministic ESI attention signal decides whether to open both eyes. On 497 labelled synthetic patient cases it preserves 95.39% serious-case recall while skipping 4.83% of expensive paths. The five statistical/ML/retrieval signals are separately batch-evaluated on 5,000 real openFDA reports; this repo does not pretend those models score patient queries online.

Repo Map

baymax/
├── baymax/demo.py              ✅ plays the three memorable cases
├── baymax/audit.py             ✅ generates deep evidence receipts
├── baymax/served_nose.py       ✅ serves and evaluates pre-eye attention
├── deployment-readiness/       ✅ simulated acceptance, rollback, and handoff
├── cases/legendary_cases.json  ✅ the public three-case screenplay
├── docs/baymax/constitution/   📖 versioned architecture health records
├── tests/test_audit.py         ✅ pins every movie outcome in CI
├── scripts/sync_sources.sh     ✅ fetches the four public sibling sources
├── outputs/                    🟡 audit + simulated rollout receipts
├── .github/workflows/audit.yml ✅ regenerates proof on every change
├── Makefile                    ✅ one-command audit entry points
├── SPEC.md                     📖 scope and proof boundary
└── README.md                   📖 recruiter audit map
  • retrieval discovery: symptom-language expansion retrieves an unmentioned CHF precedent

About

Closed-loop agent showroom proving attention routing, cross-domain decision changes, safety brakes, durable actions, outcome verification, and simulated rollout controls.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors