Production-hardened Python redaction for structured logs, LLM agent payloads, and
AWS-signed URLs. Zero dependencies — pure stdlib. Built for dict[str, Any], not
for free-form prose.
Status: Beta (0.1.x) · Pre-1.0 — the API may evolve based on early-user feedback before v1.0. Pin to
redactkit~=0.1.1if you want patch-level updates only.
from redactkit import redact_args
redact_args({"password": "hunter2", "user": "alice"})
# → {"password": "***", "user": "alice"}pip install redactkitRequires Python ≥ 3.10. No runtime dependencies.
- Structured-data first. Recursive
dict/list/tupletraversal out of the box. Drop into any code that already speaksMapping[str, Any]. - AWS SigV4 redaction. Scrubs
X-Amz-Signature,X-Amz-Credential,X-Amz-Security-Tokenfrom presigned URLs — a real production gap most redaction libraries ignore. - Overflow splitting.
summarize_payloadreturns a short wire-safe summary plus an optional full body, so a 5 MB tool result doesn't blow your log budget. - Comprehensive + extensible denylist. Passwords, tokens, JWT, OAuth, bearer,
credentials, cookies, sessions covered out of the box. Add your own via
extend_key_pattern— no module-state mutation, no monkey-patching. - Zero deps, production provenance. Pure stdlib (Python ≥ 3.10). Extracted from the Convilyn agent platform — used in LLM agent middleware, supervisor handoffs, event emission, and HTTP response redaction.
| Tool | Approach | Best at | Deps | Pick when… |
|---|---|---|---|---|
| Microsoft Presidio | ML-based NER (spaCy / transformers) | Free-form text PII (names, addresses) | Heavy (~1 GB models) | Document / chat PII detection in regulated industries |
| scrubadub | NLP rules + named recognizers | Free-form text (emails, phones, names) | nltk, textblob | Scrubbing user-generated prose |
Hand-rolled logging.Filter |
Custom filter per team | Logging-specific | None | Reinventing the wheel; AWS SigV4 never covered |
| redactkit | Key denylist + AWS SigV4 + overflow splitting | Structured data: dicts, JSON, kwargs, OTel attrs, request bodies | None | LLM agent logs, API request/response logs, anywhere dict[str, Any] is the unit |
Non-goals. redactkit does not do free-form NLP PII detection, cryptographic anonymization, or database-column encryption. Reach for Presidio or scrubadub for those.
from redactkit import redact_args
payload = {
"account": {"bearer_token": "xyz", "user": "alice"},
"items": [{"secret": "s"}, {"name": "plain"}],
}
redact_args(payload)
# → {
# "account": {"bearer_token": "***", "user": "alice"},
# "items": [{"secret": "***"}, {"name": "plain"}],
# }Case-insensitive, recursive, and non-mutating. Covers nested dicts, list-of-dicts, and tuples (preserving type).
from redactkit import redact_args
redact_args({
"url": "https://s3.example.com/x?X-Amz-Signature=abcdef123&foo=1",
})
# → {"url": "https://s3.example.com/x?X-Amz-Signature=***&foo=1"}Strings inside payloads get scanned for AWS Signature V4 query fragments and scrubbed in place. Most logging filters miss this — redactkit doesn't.
from redactkit import summarize_payload, output_digest
big_tool_result = {"items": [...]} # 5 MB
summary, overflow = summarize_payload(big_tool_result, max_bytes=2048)
# Attach the short summary to your span / log event:
span.set_attribute("tool.output_summary", summary)
# Persist the full body to S3 if it overflowed:
if overflow is not None:
digest = output_digest(overflow)
s3.put_object(Bucket="logs", Key=f"overflow/{digest}", Body=overflow)
span.set_attribute("tool.output_ref", f"s3://logs/overflow/{digest}")UTF-8 boundary safe — no mojibake even if the cut lands mid-character.
| Symbol | Kind | Purpose |
|---|---|---|
redact_args(payload, *, key_pattern=SENSITIVE_KEY_RE) |
function | Deep-redact dicts/lists/strings |
redact_text(value, *, key_pattern=SENSITIVE_KEY_RE) |
function | Scrub key=value patterns + presigned URLs from free-form text |
redact_url_query(url) |
function | Redact sensitive values in URL query strings |
summarize_payload(payload, *, max_bytes=2048, already_redacted=False) |
function | Wire-safe truncation; returns (summary, overflow_body) |
output_digest(body) |
function | 16-hex SHA-256 prefix — content-addressable overflow reference |
extend_key_pattern(extra_terms) |
function | Open/Closed denylist extension without module-state mutation |
OutboundErrorRedactor(pattern) |
class | Mask caller-supplied internal terms in error/log payloads |
OutboundErrorRedactor.from_terms(terms) |
classmethod | Compile a term list into a redactor |
SENSITIVE_KEY_RE, SENSITIVE_KV_TEXT_RE, PRESIGNED_QUERY_RE |
regex | Public patterns for direct use |
DEFAULT_KEY_TERMS |
tuple[str, ...] | Raw fragments backing SENSITIVE_KEY_RE |
MASK |
str | The redaction placeholder ("***") |
MAX_SUMMARY_BYTES |
int | Default cap for summarize_payload (2048) |
The default denylist covers password / token / secret / bearer / cookie / session families. To add project-specific field names without monkey-patching:
from redactkit import extend_key_pattern, redact_args
my_pattern = extend_key_pattern([r"vendor_passcode", r"internal_id"])
redact_args({"vendor_passcode": "x", "user": "alice"}, key_pattern=my_pattern)
# → {"vendor_passcode": "***", "user": "alice"}extend_key_pattern returns a new compiled regex; SENSITIVE_KEY_RE is never
mutated (Open/Closed principle).
Does it handle free-form text PII (names, addresses)?
No. Use Presidio or scrubadub for that. redactkit's strength is structured data and known-schema secrets.
Does it handle nested dicts and lists?
Yes. redact_args walks recursively. Tuples preserve their type.
Can I add my own sensitive key names?
Yes — use extend_key_pattern([r"my_term"]) and pass the result via the
key_pattern= argument. No module-level state is mutated.
Is it thread-safe?
Yes. All redaction functions are pure (no global mutation, no I/O) and the module-level regexes are immutable compiled patterns.
Is redact_args non-mutating?
Yes. It always returns a new object — the input dict / list / tuple is never modified in place.
Will redaction slow down my hot path?
The dominant cost is the regex .search per dict key. For typical agent
payloads (< 100 keys, < 10 KB serialized) the overhead is sub-millisecond.
Issues and PRs welcome. See CONTRIBUTING.md for the dev setup, what we do and don't accept, and the PR checklist. Conduct expectations are in CODE_OF_CONDUCT.md.
Found a redaction bypass or ReDoS pattern? Please don't open a public issue. See SECURITY.md for the private-advisory process and our 90-day coordinated disclosure window.
Used in production by Convilyn. Open a PR adding your project here once you've shipped redactkit to prod.
MIT.