Skip to content

EmpireTwo/gaze

Gaze

Crates.io License docs.rs Tests GitHub stars

Reversible PII pseudonymization for agentic LLM workflows.

Gaze sits between your data and the LLM. It swaps PII for stable, session-scoped tokens on the way out, and restores the originals on the way back. The agent never sees raw personal data; the data owner never loses the ability to read the agent's reply.

git clone https://github.com/EmpireTwo/gaze.git
cd gaze
cargo install --path crates/gaze-cli

echo 'Email alice@example.invalid about ORD-789012.' | gaze clean
{
  "clean_text": "Email <{session_hex}:Email_1> about ORD-789012.",
  "session_blob": "<base64>",
  "stats": {"detections": 1}
}

Send clean_text to the LLM. Keep session_blob server-side — it is the signed restore manifest, and it must never reach the model.

Round-trip the model's reply through restore on the same manifest:

echo '{"session_blob":"<base64>","text":"Confirmation sent to <{session_hex}:Email_1>."}' \
  | gaze restore
{"text":"Confirmation sent to alice@example.invalid."}

Full CLI surface — flags, structured-document mode, audit logging, policy TOML — is in crates/gaze-cli/README.md.

Why this exists

PII handling in LLM apps usually falls into one of three buckets:

  1. No redaction. Real emails, phone numbers, and order IDs end up in the model provider's logs.
  2. One-way redaction. You strip PII, the agent replies "I've sent the confirmation to <REDACTED>", and you have no way to thread the reply back to the actual user.
  3. LLM-based redaction. A second model call decides what's PII. Non-deterministic, non-auditable, costs another round trip per turn.

Gaze takes a fourth path: deterministic, rule-based detection with a signed restore manifest. Reversible without giving up an audit trail.

Guarantees

  • Fail closed. Ambiguous matches are tokenized, never silently passed. Unknown rulepack validators or normalizers fail at policy load — no degraded mode.
  • Reversible by design. Tokens like <{session_hex}:Email_1> are session-scoped and counted by class. Restore goes through a signed SensitiveSnapshot, not string substitution.
  • Auditable. Every emitted token traces to a recognizer + rule. Optional metadata-only SQLite log via gaze clean --audit-db; raw PII is never written to the log.
  • Deterministic. Detection is regex/dictionary-first. NER and the OpenAI-filter safety net are opt-in observers. They cannot mutate the manifest or the restore path.

Install

git clone https://github.com/EmpireTwo/gaze.git
cd gaze
cargo install --path crates/gaze-cli

Pre-built binaries for Apple Silicon macOS and Linux x86_64 (glibc 2.39+) are attached to each GitHub release. Other targets: build from source with cargo build --release -p gaze-cli.

For library use — linking the Rust runtime directly instead of shelling out — see Use from Rust below.

Pipeline shape

                       regex (always-on)  ─┐
                       dictionary (opt-in) ├──► resolver ──► tokens ──► CleanDocument
                       NER (opt-in)        ─┘     │
                                                  │  conflict tiers:
                                                  │  class > rule > score > length > id
                                                  │
                                                  ├──► Pass-3 SafetyNet (observer)
                                                  │    reads clean text + manifest
                                                  │    emits LeakReport, never mutates
                                                  │
                                                  └──► SensitiveSnapshot (signed)
                                                              │
                                                              ▼
                                                          restore

Three deterministic detection passes plus an optional observer pass. The safety net cannot modify the clean text or the restore path; it only emits suspect reports against the manifest of emitted tokens.

Workspace

Six published crates. Pick the smallest surface that does the job.

Crate Use when
gaze-pii (lib name gaze) You want the runtime: Pipeline, Session, Policy, Recognizer, restore.
gaze-assembly You want bundled defaults without hand-wiring recognizers.
gaze-recognizers You're writing a custom recognizer or rulepack.
gaze-audit You want SQLite-backed metadata audit logging. Adopt directly; gaze core has no rusqlite dep in any feature graph.
gaze-cli You want a process boundary for non-Rust adapters (Laravel, Python, etc.).
gaze-types You want the value contracts (RedactionLogger, Manifest, LeakReport) without ML deps.

Crate boundaries and the audit-isolation gate: docs/architecture/crates.md.

Document extension for v0.7.1+ codec adapters: docs/architecture/document-extension.md.

Detection coverage

Bundled rulepacks (composable through CorePipelineConfig::with_bundled_rulepack or [policy.rulepacks]):

  • core — always-on. Email (RFC-validated), and locale-aware Name coverage cued off forwarded headers, agent reply preambles, and auto-footer sender lines.
  • core-extended — opt-in. Phone (E.164 + national), IPv4/IPv6, postal codes, IBAN (MOD-97), credit card (Luhn).

Validators are a closed enum (EmailRfc, E164Phone, Luhn, IbanMod97); unknown validator names in a rulepack fail at load with a typed error. Locale chain is strict and ordered: CLI > policy > rulepack default > system default.

Tenant-specific PII (order IDs, song titles, artist names) needs a dictionary or custom regex recognizer. See docs/policy.md.

Audit and restore

Restore is manifest-first. Tokens are session-scoped, counted by class, and only resolvable through a signed SensitiveSnapshot. There is no string-map fallback.

Optional metadata audit log:

gaze clean --policy policy.toml --audit-db audit.sqlite < input.txt
gaze audit query --audit-db audit.sqlite --class email --action tokenize
gaze audit export --audit-db audit.sqlite --format jsonl --output redactions.jsonl
gaze audit purge --audit-db audit.sqlite --before 2026-01-01T00:00:00Z

The audit DB is opened read-only by query and export. The exported column set excludes raw PII payloads. There is no policy-level retention default and no background auto-purge — adopters drive retention explicitly.

Status

  • Version: v0.7.0 (2026-05).
  • MSRV: Rust 1.89.
  • License: dual Apache-2.0 OR MIT.
  • crates.io: published as gaze-pii. The bare gaze name is in transfer; until that completes, depend on gaze-pii. Source-compat is preserved via [lib].name = "gaze".
  • Contract surface: Pipeline, Session, Policy, rulepack schema, and token shape are stable across the v0.7 line. SafetyNet contract: docs/architecture/safety-nets.md.

Limits

  • Bundled detection is strongest for emails, names, locations, organizations, IBANs, credit cards, IPv4/IPv6, and DACH/EN postal + phone shapes. Tenant-specific PII needs a custom recognizer.
  • --rulepack-bundled core-extended without a policy activates phone.national.de, phone.national.us, postal.us, postal.de. Adopters wanting narrower scope must supply a policy or pass --locale=global.
  • Linux x86_64 binaries link against glibc 2.39+ (Ubuntu 24.04, Debian 13, RHEL 10, or newer). Older distros: build from source.
  • No Intel macOS, no musl, no Windows binaries shipped today; build from source.

Use from Rust

The CLI is a process boundary around the Rust runtime; you can link the runtime directly:

[dependencies]
gaze-pii = "0.7"
gaze-assembly = "0.7"

The crate is published as gaze-pii because the bare gaze name is in transfer; the import path stays use gaze::... because [lib].name = "gaze" is preserved.

Publishing

The workspace publishes via the publish-crates.yml GitHub Actions workflow using crates.io trusted-publisher OIDC auth; it does not need a long-lived CARGO_REGISTRY_TOKEN secret.

  • Tag push (git tag v<version> && git push --tags) runs a real publish on every workspace crate in topological order.
  • Manual dispatch with dry_run=true packages each crate without publishing, useful for catching metadata or dependency issues before a release tag.

Contributing

See CONTRIBUTING.md. Repository gates (xtask + Dylint) enforce the contracts in docs/architecture/. Run them locally before pushing:

cargo fmt --all -- --check
cargo clippy --workspace --all-features --all-targets -- -D warnings
cargo test --workspace --all-features
cargo run -p xtask -- ci-feature-matrix

Available on crates.io

The Gaze workspace publishes 8 crates. All current versions point at this repository as their canonical source.

Crate Purpose
gaze-pii Umbrella runtime — pipeline, sessions, policy, manifest. The crate adopters typically depend on.
gaze-types Shared value contracts; serde-only, no ML/SQL deps.
gaze-recognizers Detection backends (regex / dictionary / NER) and bundled rulepacks.
gaze-audit Passive SQLite audit sink, isolated from core.
gaze-assembly Policy-to-pipeline builder shared by CLI-style adopters.
gaze-cli Command-line gaze clean / gaze restore binary.
gaze-mcp-core MCP chokepoint runtime — Tool / ToolCtx / PiiEnvelope dispatch.
gaze-mcp-rmcp rmcp transport adapter for gaze-mcp-core.
cargo add gaze-pii

License

Dual-licensed under either of Apache-2.0 or MIT, at your option.

About

Reversible PII pseudonymization runtime for agentic LLM workflows.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages