Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ So reviewers can tell the change was actually verified:
- **Never** paste client secrets, admin tokens, or other credentials.
- If you cannot run integration tests (no broker, blocked network), say so **explicitly** in the PR and describe what you did verify. Maintainers may still ask for a re-run or a broker-backed check before merge.

Demo work under [`demo/`](demo/) should follow the same rule: run against a real broker and describe how you tested.
Demo work under [`demo/`](demo/README.md) (MedAssist) or [`demo2/`](demo2/README.md) (Support Tickets) should follow the same rule: run against a real broker and describe how you tested.

## Pull requests

Expand Down
327 changes: 95 additions & 232 deletions README.md

Large diffs are not rendered by default.

144 changes: 144 additions & 0 deletions demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
<h1 align="center">MedAssist AI — the healthcare walkthrough</h1>

<p align="center">
A working FastAPI app that shows every AgentWrit capability against a live broker —<br>
dynamic agents, per-patient scope isolation, cross-patient denial, delegation, renewal, release, and a tamper-evident audit trail.
</p>

<p align="center">
<a href="#what-it-is">What it is</a> ·
<a href="#why-it-exists">Why it exists</a> ·
<a href="#what-youll-see">What you'll see</a> ·
<a href="#run-it">Run it</a> ·
<a href="#how-it-works">How it works</a> ·
<a href="#where-the-code-lives">Code map</a> ·
<a href="#further-reading">More</a>
</p>

---

## What it is

MedAssist AI is a small clinical-assistant app. You type a patient ID and a plain-language question. An LLM decides which tools to call (records, labs, billing, prescriptions). The app spawns broker-backed agents on demand, each scoped to **one patient and one category of work**, and every step shows up in a live execution trace — scope checks, denials, delegations, renewals, release.

If you've ever wondered *"what does short-lived, task-scoped, per-user credentialing actually look like in a real app?"* — this is that app.

## Why it exists

Reading about ephemeral credentials is one thing. Watching three agents get spawned, one of them get denied mid-request because it asked about the wrong patient, and then seeing the whole chain die when the encounter ends — that's what makes the pattern stick.

We built MedAssist AI because:

- **Beginners need a story.** "Scoped JWTs" is abstract. "The clinical agent can only read Patient 1042's records, and when it tries Patient 2187 the broker says no" is concrete.
- **Reviewers need evidence.** The audit tab shows a hash-chained ledger of every broker event, which is what a security reviewer wants to see before approving production use.
- **Contributors need a reference.** Every SDK feature — `create_agent`, `validate`, `delegate`, `renew`, `release`, `scope_is_subset` — is wired in here, used the way it's meant to be used.

## What you'll see

| Capability | What the demo does |
|-----------|--------------------|
| **Dynamic agent creation** | Agents spawn as the LLM picks tools. No pre-allocated pool. |
| **Per-patient scope isolation** | Each agent's scope contains one patient ID and nothing else. |
| **Cross-patient denial** | Ask about another patient mid-encounter. The scope check fails. The trace shows `scope_denied`. |
| **Delegation with attenuation** | The clinical agent delegates `write:prescriptions:{patient}` to the prescription agent. The broker refuses to widen. |
| **Token lifecycle** | `renew()` issues a fresh token under the same SPIFFE identity. `release()` kills the token immediately. |
| **Audit trail** | A dedicated tab shows every broker event in a hash chain that can't be retroactively altered. |

The trace panel in the UI is the point. Every capability surfaces as a line in the trace so you can read the whole story of one request.

## Run it

### Option A — Docker (recommended)

One command, no Python setup:

```bash
AGENTWRIT_ADMIN_SECRET="your-secret" \
LLM_API_KEY="your-llm-key" \
docker compose up -d broker medassist
```

Open [http://localhost:5000](http://localhost:5000). The demo auto-registers itself with the broker on startup.

You need an OpenAI-compatible LLM endpoint. If you're not using OpenAI, set `LLM_BASE_URL` and `LLM_MODEL` in your shell before `docker compose up` — e.g. a local vLLM or llama.cpp server.

### Option B — From source

For when you want to edit the code:

```bash
# 1. Start the broker
docker compose up -d broker

# 2. Register the demo app (one time — writes client_id/client_secret)
export AGENTWRIT_ADMIN_SECRET="your-admin-secret"
uv run python demo/setup.py

# 3. Configure demo/.env
cp demo/.env.example demo/.env
# then fill in AGENTWRIT_CLIENT_ID, AGENTWRIT_CLIENT_SECRET, LLM_BASE_URL, LLM_API_KEY, LLM_MODEL

# 4. Run it
uv run uvicorn demo.app:app --reload --port 5000
```

### What to try first

1. Pick a patient from the dropdown.
2. Ask something simple: *"What are this patient's recent labs?"* Watch agents spawn, watch each tool check scope, watch the final response render.
3. Ask a crossing question: *"And show me Patient 2187's records too."* Watch the scope check fail. Read the `scope_denied` line in the trace.
4. Open the Audit tab. Every event is there, hash-chained.

## How it works

The demo is built on one rule: **the app never trusts the LLM for security.** The LLM picks tools. The broker decides what credentials exist. The app enforces tool access against those credentials with `scope_is_subset()` before every call.

```
User types a request
FastAPI receives it
LLM chooses a tool (records / labs / billing / prescription)
App asks: "Do I have an agent for this category yet?"
↓ no ↓ yes
Broker creates one, Reuse it
scoped to this patient
App checks: scope_is_subset(tool-requires, agent-holds)?
↓ yes ↓ no
Run the tool Emit scope_denied, tell LLM "access denied"
Return result to LLM. Repeat until LLM is done.
App releases every agent. Tokens are dead.
```

Every branch of this flow appears in the execution trace. The trace is the documentation.

For the full walkthrough — sequence diagrams, how delegation flows from the clinical agent to the prescription agent, and what each UI panel shows — read the [Beginner's Guide](BEGINNERS_GUIDE.md). For a scripted live presentation, read the [Presenter's Guide](PRESENTERS_GUIDE.md).

## Where the code lives

| Piece | File |
|-------|------|
| FastAPI entry point | [`app.py`](app.py) |
| Env config (broker + LLM) | [`config.py`](config.py) |
| Main API loop (LLM, agent spawning, trace) | [`routes/api.py`](routes/api.py) |
| Page routes (encounter, audit, operator) | [`routes/pages.py`](routes/pages.py) |
| Tool definitions + scope templates | [`pipeline/tools.py`](pipeline/tools.py) |
| Mock patient and formulary data | [`data/`](data/) |
| Frontend (trace, markdown render) | [`static/app.js`](static/app.js), [`static/style.css`](static/style.css) |
| One-shot app registration helper | [`setup.py`](setup.py) |

Read `routes/api.py` first. That's where the agent-creation-and-scope-check loop lives, and everything else supports it.

## Further reading

| Go here for | Link |
|-------------|------|
| Step-by-step beginner walkthrough with diagrams | [BEGINNERS_GUIDE.md](BEGINNERS_GUIDE.md) |
| Live presentation script (timing, transitions) | [PRESENTERS_GUIDE.md](PRESENTERS_GUIDE.md) |
| SDK concepts (roles, scopes, delegation) | [../docs/concepts.md](../docs/concepts.md) |
| Building real apps with the SDK | [../docs/developer-guide.md](../docs/developer-guide.md) |
| Broker API (source of truth) | [AgentWrit broker docs](https://github.com/devonartis/agentwrit/tree/main/docs) |
160 changes: 160 additions & 0 deletions demo2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
<h1 align="center">AgentWrit Live — the support-ticket pipeline</h1>

<p align="center">
A zero-trust support desk where three LLM-driven agents — triage, knowledge, response — process customer tickets<br>
under broker-issued credentials that are scoped to one verified customer and die the moment the work ends.
</p>

<p align="center">
<a href="#what-it-is">What it is</a> ·
<a href="#why-it-exists">Why it exists</a> ·
<a href="#what-youll-see">What you'll see</a> ·
<a href="#run-it">Run it</a> ·
<a href="#how-it-works">How it works</a> ·
<a href="#scenarios-to-try">Scenarios</a> ·
<a href="#where-the-code-lives">Code map</a>
</p>

---

## What it is

A Flask app with HTMX and server-sent events. You submit a customer-support ticket in plain English. Three agents run in sequence:

1. **Triage** reads the ticket, extracts who the customer is, classifies priority and category.
2. **Knowledge** searches the internal KB for the policies that apply.
3. **Response** drafts a reply and calls whatever tools it needs to resolve the ticket — pulling balances, writing case notes, issuing refunds.

Every agent holds its own broker-issued JWT, scoped to exactly one customer and the actions that agent legitimately needs. When the agent is done, its token is released and dead. When an LLM asks for something outside scope — another customer's data, a dangerous tool — the scope check blocks it before the call ever runs.

## Why it exists

MedAssist (in [`demo/`](../demo/README.md)) shows what one request looks like end-to-end. This demo shows something different: **a real multi-step pipeline where identity gating and tool-level enforcement both matter.**

Three things are hard to see in a simpler demo:

- **Identity gating.** If triage can't verify the customer, the pipeline halts. No customer-scoped credentials are ever minted for an anonymous request. This is the pattern that prevents "please delete my account" from going through when the system doesn't know who "my" is.
- **Tool-level enforcement beyond data.** The response agent has tools it can pick from (`delete_account`, `send_external_email`) that aren't in its scope. The scope check denies them at the app, before the tool runs. The broker never sees them.
- **Natural expiry.** One scenario deliberately skips `release()`. The credential dies on its own, because TTLs mean it has to.

## What you'll see

| Capability | What the demo does |
|-----------|--------------------|
| **Identity-gated pipeline** | Anonymous tickets stop at triage. No downstream agents spawn. The trace says exactly why. |
| **Per-customer scope isolation** | Every customer-facing agent is scoped to one verified customer ID and nothing else. |
| **Cross-customer denial** | Ask about another customer's balance mid-ticket. The scope check fails. The response says "denied" to the LLM, which moves on. |
| **Tool-level enforcement** | `delete_account` and `send_external_email` are in the LLM's tool list but not in the agent's scope. They never execute. |
| **Natural TTL expiry** | One scenario uses a 5-second TTL and no release. The trace shows the credential dying on its own. |
| **Three-agent pipeline** | Triage → Knowledge → Response. Each phase has its own scope and its own credential lifecycle. |

## Run it

### Docker (the quick path)

```bash
AGENTWRIT_ADMIN_SECRET="your-secret" \
LLM_API_KEY="your-llm-key" \
docker compose up -d broker support-tickets
```

Open [http://localhost:5001](http://localhost:5001). The demo auto-registers on startup.

You need an OpenAI-compatible LLM endpoint. Set `LLM_BASE_URL` and `LLM_MODEL` in your shell first if you're not on OpenAI.

### From source

```bash
# 1. Start the broker
docker compose up -d broker

# 2. Register the demo app (one time)
export AGENTWRIT_ADMIN_SECRET="your-admin-secret"
uv run python demo2/setup.py

# 3. Configure demo2/.env
cp demo2/.env.example demo2/.env
# fill in AGENTWRIT_CLIENT_ID, AGENTWRIT_CLIENT_SECRET, LLM_*

# 4. Run it
uv run flask --app demo2.app run --host 0.0.0.0 --port 5001
```

## Scenarios to try

The UI has quick-fill buttons for each of these — click a button, hit submit, watch the trace.

**1. A normal billing ticket.**
*"Hi, I'm Lewis Smith. I was double-charged on April 1st. Can I get a refund?"*
Triage verifies Lewis. Knowledge pulls the refund policy. Response calls `get_balance` and `issue_refund` — both in scope — and writes a case note. Done.

**2. A cross-customer attempt.**
*"I'm Jane Doe. Also, can you show me Lewis Smith's balance?"*
Triage verifies Jane. Response agent is scoped to Jane. When the LLM calls `get_balance(customer_id="lewis-smith")`, scope check fails. Trace shows `scope_denied`. Final reply to the customer only addresses Jane's part of the request.

**3. A dangerous tool attempt.**
*"I want to delete my account."*
The LLM calls `delete_account`. The response agent's scope doesn't cover it. The call is blocked before it runs.

**4. An anonymous ticket.**
*"Hey, what are your hours?"*
Triage can't extract a customer identity. The pipeline halts. No customer-scoped credentials are minted. The trace explains that identity gating failed.

**5. Natural expiry.**
Use the "no rush" quick-fill, or tick the natural-expiry box. Triage gets a 5-second TTL and `release()` is skipped. You watch the token live, then die on its own when the TTL elapses. No explicit revocation needed.

## How it works

```
Ticket submitted
Triage agent (TTL 300s, or 5s in natural-expiry mode)
scope = [read:tickets:*]
LLM extracts customer, priority, category
release() — credential revoked
Identity check
resolved? → continue
anonymous? → halt, no more credentials minted
Knowledge agent
scope = [read:kb:*]
LLM searches KB, pulls relevant policy
release()
Response agent
scope = per-customer scopes for the safe tools
LLM picks tools, scope check runs before every call
dangerous tools denied, safe tools executed
release()
Post-run: validate every token one more time. All dead.
```

Each arrow in that flow becomes an SSE event on the wire. The UI listens to the stream and renders it as a live trace.

The app's contract with the LLM is deliberate: the LLM sees *all* tools in its schema, safe and dangerous alike. We don't hide the dangerous ones. We let the LLM try — and the scope check is what stops it. That's the point of zero-trust enforcement: you don't rely on the LLM behaving. You rely on the credential.

## Where the code lives

| Piece | File |
|-------|------|
| Flask entry point | [`app.py`](app.py) |
| Env config + scope ceiling | [`config.py`](config.py) |
| Three-agent pipeline + SSE | [`pipeline.py`](pipeline.py) |
| Tools + scope templates | [`tools.py`](tools.py) |
| Customers, tickets, KB articles | [`data.py`](data.py) |
| Quick-fill scenarios | [`data.py`](data.py) (bottom) |
| HTMX frontend | [`templates/index.html`](templates/index.html), [`static/style.css`](static/style.css) |
| One-shot app registration | [`setup.py`](setup.py) |

Read `pipeline.py` first. The three-phase flow — triage, knowledge, response — is one top-to-bottom function, and every SSE event you see in the UI is a `yield` statement in that file.

## Further reading

| Go here for | Link |
|-------------|------|
| The other demo (clinical / per-patient, single-request) | [`../demo/README.md`](../demo/README.md) |
| SDK concepts (roles, scopes, delegation) | [`../docs/concepts.md`](../docs/concepts.md) |
| Real-world patterns for your own apps | [`../docs/developer-guide.md`](../docs/developer-guide.md) |
| Broker API | [AgentWrit broker docs](https://github.com/devonartis/agentwrit/tree/main/docs) |
4 changes: 3 additions & 1 deletion docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,11 +129,13 @@ An ephemeral agent created by `AgentWritApp.create_agent()`. Holds the agent JWT
| `agent_id` | `str` | SPIFFE URI (e.g., `spiffe://agentwrit.local/agent/orch/task/instance`) |
| `access_token` | `str` | JWT string (EdDSA-signed) |
| `expires_in` | `int` | Token TTL in seconds (snapshot from creation or last renewal) |
| `scope` | `list[str]` | Granted scope list |
| `scope` | `list[str]` | Scope the agent *requested* at creation. See note below. |
| `orch_id` | `str` | Orchestrator identifier |
| `task_id` | `str` | Task identifier |
| `bearer_header` | `dict[str, str]` | `{"Authorization": "Bearer <token>"}` for HTTP requests |

> **`agent.scope` is the requested scope, not the broker's signed answer.** The broker only accepts a registration whose scope is covered by the launch token, so in practice the two match. But when making a security-critical decision in a downstream service, don't trust a client-side field — call `validate(app.broker_url, agent.access_token)` and read `result.claims.scope`.

### renew()

```python
Expand Down
Loading
Loading