Skip to content

CoreNovus/redactkit

redactkit

PyPI Python CI Type Checked License

Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs. Zero dependencies — pure stdlib. Built for dict[str, Any], not for free-form prose.

Status: Beta (0.1.x) · Pre-1.0 — the API may evolve based on early-user feedback before v1.0. Pin to redactkit~=0.1.1 if you want patch-level updates only.

from redactkit import redact_args

redact_args({"password": "hunter2", "user": "alice"})
# → {"password": "***", "user": "alice"}

Install

pip install redactkit

Requires Python ≥ 3.10. No runtime dependencies.

Why redactkit?

  1. Structured-data first. Recursive dict / list / tuple traversal out of the box. Drop into any code that already speaks Mapping[str, Any].
  2. AWS SigV4 redaction. Scrubs X-Amz-Signature, X-Amz-Credential, X-Amz-Security-Token from presigned URLs — a real production gap most redaction libraries ignore.
  3. Overflow splitting. summarize_payload returns a short wire-safe summary plus an optional full body, so a 5 MB tool result doesn't blow your log budget.
  4. Comprehensive + extensible denylist. Passwords, tokens, JWT, OAuth, bearer, credentials, cookies, sessions covered out of the box. Add your own via extend_key_pattern — no module-state mutation, no monkey-patching.
  5. Zero deps, production provenance. Pure stdlib (Python ≥ 3.10). Extracted from the Convilyn agent platform — used in LLM agent middleware, supervisor handoffs, event emission, and HTTP response redaction.

When to pick redactkit vs. alternatives

Tool Approach Best at Deps Pick when…
Microsoft Presidio ML-based NER (spaCy / transformers) Free-form text PII (names, addresses) Heavy (~1 GB models) Document / chat PII detection in regulated industries
scrubadub NLP rules + named recognizers Free-form text (emails, phones, names) nltk, textblob Scrubbing user-generated prose
Hand-rolled logging.Filter Custom filter per team Logging-specific None Reinventing the wheel; AWS SigV4 never covered
redactkit Key denylist + AWS SigV4 + overflow splitting Structured data: dicts, JSON, kwargs, OTel attrs, request bodies None LLM agent logs, API request/response logs, anywhere dict[str, Any] is the unit

Non-goals. redactkit does not do free-form NLP PII detection, cryptographic anonymization, or database-column encryption. Reach for Presidio or scrubadub for those.

Three killer examples

1. Structured dict redaction

from redactkit import redact_args

payload = {
    "account": {"bearer_token": "xyz", "user": "alice"},
    "items": [{"secret": "s"}, {"name": "plain"}],
}
redact_args(payload)
# → {
#     "account": {"bearer_token": "***", "user": "alice"},
#     "items": [{"secret": "***"}, {"name": "plain"}],
# }

Case-insensitive, recursive, and non-mutating. Covers nested dicts, list-of-dicts, and tuples (preserving type).

2. AWS presigned URL scrubbing

from redactkit import redact_args

redact_args({
    "url": "https://s3.example.com/x?X-Amz-Signature=abcdef123&foo=1",
})
# → {"url": "https://s3.example.com/x?X-Amz-Signature=***&foo=1"}

Strings inside payloads get scanned for AWS Signature V4 query fragments and scrubbed in place. Most logging filters miss this — redactkit doesn't.

3. Overflow splitting for big payloads

from redactkit import summarize_payload, output_digest

big_tool_result = {"items": [...]}  # 5 MB
summary, overflow = summarize_payload(big_tool_result, max_bytes=2048)

# Attach the short summary to your span / log event:
span.set_attribute("tool.output_summary", summary)

# Persist the full body to S3 if it overflowed:
if overflow is not None:
    digest = output_digest(overflow)
    s3.put_object(Bucket="logs", Key=f"overflow/{digest}", Body=overflow)
    span.set_attribute("tool.output_ref", f"s3://logs/overflow/{digest}")

UTF-8 boundary safe — no mojibake even if the cut lands mid-character.

Public API

Symbol Kind Purpose
redact_args(payload, *, key_pattern=SENSITIVE_KEY_RE) function Deep-redact dicts/lists/strings
redact_text(value, *, key_pattern=SENSITIVE_KEY_RE) function Scrub key=value patterns + presigned URLs from free-form text
redact_url_query(url) function Redact sensitive values in URL query strings
summarize_payload(payload, *, max_bytes=2048, already_redacted=False) function Wire-safe truncation; returns (summary, overflow_body)
output_digest(body) function 16-hex SHA-256 prefix — content-addressable overflow reference
extend_key_pattern(extra_terms) function Open/Closed denylist extension without module-state mutation
OutboundErrorRedactor(pattern) class Mask caller-supplied internal terms in error/log payloads
OutboundErrorRedactor.from_terms(terms) classmethod Compile a term list into a redactor
SENSITIVE_KEY_RE, SENSITIVE_KV_TEXT_RE, PRESIGNED_QUERY_RE regex Public patterns for direct use
DEFAULT_KEY_TERMS tuple[str, ...] Raw fragments backing SENSITIVE_KEY_RE
MASK str The redaction placeholder ("***")
MAX_SUMMARY_BYTES int Default cap for summarize_payload (2048)

Extending the denylist

The default denylist covers password / token / secret / bearer / cookie / session families. To add project-specific field names without monkey-patching:

from redactkit import extend_key_pattern, redact_args

my_pattern = extend_key_pattern([r"vendor_passcode", r"internal_id"])
redact_args({"vendor_passcode": "x", "user": "alice"}, key_pattern=my_pattern)
# → {"vendor_passcode": "***", "user": "alice"}

extend_key_pattern returns a new compiled regex; SENSITIVE_KEY_RE is never mutated (Open/Closed principle).

FAQ

Does it handle free-form text PII (names, addresses)?

No. Use Presidio or scrubadub for that. redactkit's strength is structured data and known-schema secrets.

Does it handle nested dicts and lists?

Yes. redact_args walks recursively. Tuples preserve their type.

Can I add my own sensitive key names?

Yes — use extend_key_pattern([r"my_term"]) and pass the result via the key_pattern= argument. No module-level state is mutated.

Is it thread-safe?

Yes. All redaction functions are pure (no global mutation, no I/O) and the module-level regexes are immutable compiled patterns.

Is redact_args non-mutating?

Yes. It always returns a new object — the input dict / list / tuple is never modified in place.

Will redaction slow down my hot path?

The dominant cost is the regex .search per dict key. For typical agent payloads (< 100 keys, < 10 KB serialized) the overhead is sub-millisecond.

Contributing

Issues and PRs welcome. See CONTRIBUTING.md for the dev setup, what we do and don't accept, and the PR checklist. Conduct expectations are in CODE_OF_CONDUCT.md.

Security

Found a redaction bypass or ReDoS pattern? Please don't open a public issue. See SECURITY.md for the private-advisory process and our 90-day coordinated disclosure window.

Production users

Used in production by Convilyn. Open a PR adding your project here once you've shipped redactkit to prod.

License

MIT.

About

Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs. Zero deps.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages