Reversible PII pseudonymization for agentic LLM workflows.
Gaze sits between your data and the LLM. It swaps PII for stable, session-scoped tokens on the way out, and restores the originals on the way back. The agent never sees raw personal data; the data owner never loses the ability to read the agent's reply.
git clone https://github.com/EmpireTwo/gaze.git
cd gaze
cargo install --path crates/gaze-cli
echo 'Email alice@example.invalid about ORD-789012.' | gaze clean{
"clean_text": "Email <{session_hex}:Email_1> about ORD-789012.",
"session_blob": "<base64>",
"stats": {"detections": 1}
}Send clean_text to the LLM. Keep session_blob server-side — it is the signed restore manifest, and it must never reach the model.
Round-trip the model's reply through restore on the same manifest:
echo '{"session_blob":"<base64>","text":"Confirmation sent to <{session_hex}:Email_1>."}' \
| gaze restore{"text":"Confirmation sent to alice@example.invalid."}Full CLI surface — flags, structured-document mode, audit logging, policy TOML — is in crates/gaze-cli/README.md.
PII handling in LLM apps usually falls into one of three buckets:
- No redaction. Real emails, phone numbers, and order IDs end up in the model provider's logs.
- One-way redaction. You strip PII, the agent replies "I've sent the confirmation to
<REDACTED>", and you have no way to thread the reply back to the actual user. - LLM-based redaction. A second model call decides what's PII. Non-deterministic, non-auditable, costs another round trip per turn.
Gaze takes a fourth path: deterministic, rule-based detection with a signed restore manifest. Reversible without giving up an audit trail.
- Fail closed. Ambiguous matches are tokenized, never silently passed. Unknown rulepack validators or normalizers fail at policy load — no degraded mode.
- Reversible by design. Tokens like
<{session_hex}:Email_1>are session-scoped and counted by class. Restore goes through a signedSensitiveSnapshot, not string substitution. - Auditable. Every emitted token traces to a recognizer + rule. Optional metadata-only SQLite log via
gaze clean --audit-db; raw PII is never written to the log. - Deterministic. Detection is regex/dictionary-first. NER and the OpenAI-filter safety net are opt-in observers. They cannot mutate the manifest or the restore path.
git clone https://github.com/EmpireTwo/gaze.git
cd gaze
cargo install --path crates/gaze-cliPre-built binaries for Apple Silicon macOS and Linux x86_64 (glibc 2.39+) are attached to each GitHub release. Other targets: build from source with cargo build --release -p gaze-cli.
For library use — linking the Rust runtime directly instead of shelling out — see Use from Rust below.
regex (always-on) ─┐
dictionary (opt-in) ├──► resolver ──► tokens ──► CleanDocument
NER (opt-in) ─┘ │
│ conflict tiers:
│ class > rule > score > length > id
│
├──► Pass-3 SafetyNet (observer)
│ reads clean text + manifest
│ emits LeakReport, never mutates
│
└──► SensitiveSnapshot (signed)
│
▼
restore
Three deterministic detection passes plus an optional observer pass. The safety net cannot modify the clean text or the restore path; it only emits suspect reports against the manifest of emitted tokens.
Six published crates. Pick the smallest surface that does the job.
| Crate | Use when |
|---|---|
gaze-pii (lib name gaze) |
You want the runtime: Pipeline, Session, Policy, Recognizer, restore. |
gaze-assembly |
You want bundled defaults without hand-wiring recognizers. |
gaze-recognizers |
You're writing a custom recognizer or rulepack. |
gaze-audit |
You want SQLite-backed metadata audit logging. Adopt directly; gaze core has no rusqlite dep in any feature graph. |
gaze-cli |
You want a process boundary for non-Rust adapters (Laravel, Python, etc.). |
gaze-types |
You want the value contracts (RedactionLogger, Manifest, LeakReport) without ML deps. |
Crate boundaries and the audit-isolation gate: docs/architecture/crates.md.
Document extension for v0.7.1+ codec adapters: docs/architecture/document-extension.md.
Bundled rulepacks (composable through CorePipelineConfig::with_bundled_rulepack or [policy.rulepacks]):
core— always-on. Email (RFC-validated), and locale-awareNamecoverage cued off forwarded headers, agent reply preambles, and auto-footer sender lines.core-extended— opt-in. Phone (E.164 + national), IPv4/IPv6, postal codes, IBAN (MOD-97), credit card (Luhn).
Validators are a closed enum (EmailRfc, E164Phone, Luhn, IbanMod97); unknown validator names in a rulepack fail at load with a typed error. Locale chain is strict and ordered: CLI > policy > rulepack default > system default.
Tenant-specific PII (order IDs, song titles, artist names) needs a dictionary or custom regex recognizer. See docs/policy.md.
Restore is manifest-first. Tokens are session-scoped, counted by class, and only resolvable through a signed SensitiveSnapshot. There is no string-map fallback.
Optional metadata audit log:
gaze clean --policy policy.toml --audit-db audit.sqlite < input.txt
gaze audit query --audit-db audit.sqlite --class email --action tokenize
gaze audit export --audit-db audit.sqlite --format jsonl --output redactions.jsonl
gaze audit purge --audit-db audit.sqlite --before 2026-01-01T00:00:00ZThe audit DB is opened read-only by query and export. The exported column set excludes raw PII payloads. There is no policy-level retention default and no background auto-purge — adopters drive retention explicitly.
- Version: v0.7.0 (2026-05).
- MSRV: Rust 1.89.
- License: dual
Apache-2.0 OR MIT. - crates.io: published as
gaze-pii. The baregazename is in transfer; until that completes, depend ongaze-pii. Source-compat is preserved via[lib].name = "gaze". - Contract surface:
Pipeline,Session,Policy, rulepack schema, and token shape are stable across the v0.7 line. SafetyNet contract:docs/architecture/safety-nets.md.
- Bundled detection is strongest for emails, names, locations, organizations, IBANs, credit cards, IPv4/IPv6, and DACH/EN postal + phone shapes. Tenant-specific PII needs a custom recognizer.
--rulepack-bundled core-extendedwithout a policy activatesphone.national.de,phone.national.us,postal.us,postal.de. Adopters wanting narrower scope must supply a policy or pass--locale=global.- Linux x86_64 binaries link against glibc 2.39+ (Ubuntu 24.04, Debian 13, RHEL 10, or newer). Older distros: build from source.
- No Intel macOS, no musl, no Windows binaries shipped today; build from source.
The CLI is a process boundary around the Rust runtime; you can link the runtime directly:
[dependencies]
gaze-pii = "0.7"
gaze-assembly = "0.7"The crate is published as gaze-pii because the bare gaze name is in transfer; the import path stays use gaze::... because [lib].name = "gaze" is preserved.
- Minimal example and the API surface table:
crates/gaze/README.md(also rendered oncrates.io/crates/gaze-pii). - Full walk-through with structured documents, tenant-specific recognizers, and policy TOML:
docs/getting-started.md.
The workspace publishes via the publish-crates.yml GitHub Actions workflow using crates.io trusted-publisher OIDC auth; it does not need a long-lived CARGO_REGISTRY_TOKEN secret.
- Tag push (
git tag v<version> && git push --tags) runs a real publish on every workspace crate in topological order. - Manual dispatch with
dry_run=truepackages each crate without publishing, useful for catching metadata or dependency issues before a release tag.
See CONTRIBUTING.md. Repository gates (xtask + Dylint) enforce the contracts in docs/architecture/. Run them locally before pushing:
cargo fmt --all -- --check
cargo clippy --workspace --all-features --all-targets -- -D warnings
cargo test --workspace --all-features
cargo run -p xtask -- ci-feature-matrixThe Gaze workspace publishes 8 crates. All current versions point at this repository as their canonical source.
| Crate | Purpose |
|---|---|
gaze-pii |
Umbrella runtime — pipeline, sessions, policy, manifest. The crate adopters typically depend on. |
gaze-types |
Shared value contracts; serde-only, no ML/SQL deps. |
gaze-recognizers |
Detection backends (regex / dictionary / NER) and bundled rulepacks. |
gaze-audit |
Passive SQLite audit sink, isolated from core. |
gaze-assembly |
Policy-to-pipeline builder shared by CLI-style adopters. |
gaze-cli |
Command-line gaze clean / gaze restore binary. |
gaze-mcp-core |
MCP chokepoint runtime — Tool / ToolCtx / PiiEnvelope dispatch. |
gaze-mcp-rmcp |
rmcp transport adapter for gaze-mcp-core. |
cargo add gaze-piiDual-licensed under either of Apache-2.0 or MIT, at your option.