feat(content-guards)!: generalize no-real-ips → sensitive-content-guard by JacobPEvans · Pull Request #319 · JacobPEvans/claude-code-plugins

JacobPEvans · 2026-05-24T02:11:31Z

Summary

Generalizes the IPv4-only no-real-ips hook into a 7-detector
sensitive-content-guard covering the full org secrets-policy.md
attack surface. Each detector has its own allowlist; the first-block /
second-allow UX is preserved per-(file, detector, value) so a retry
acknowledges only the specific category.

Detectors

ipv4 — IPv4 outside 192.168.0.0/24, loopback, 0.0.0.0,
broadcast (255.255.255.x), link-local metadata (169.254.169.254).
Skips rev: vX.Y.Z version pins.
ipv6 — outside ::, ::1, fe80:: (link-local),
fc00::/7 (ULA), 2001:db8:: (RFC 3849 doc prefix), ff00::
(multicast). Skips cas-sha256:/sha256: hash lines.
email — real addresses outside noreply@github.com,
*@users.noreply.github.com, *@example.{com,org,net,local},
*@test, *@localhost, <user@host> placeholder shapes.
absolute_user_path — hard-coded /Users/<name>/ or
/home/<name>/ outside ${USER}, $USER, <user> placeholders.
private_key_header — -----BEGIN ... PRIVATE KEY-----;
always blocked.
aws_account_id — bare 12-digit numbers on lines mentioning
account_id, arn:aws:, aws_account_id, or :account:. Allows
AWS's documented 123456789012 placeholder.
real_domain — only flags tokens whose TLD is in a focused
~29-entry REAL_TLDS allowlist of popular public TLDs (com,
net, org, io, ai, dev, app, co, cloud, gov, edu,
mil, info, biz, me, tv, fm, ly, us, uk, de,
jp, ca, au, fr, cn, eu, tech, xyz, online, sh).
Anything outside that set (filenames like foo.py, version strings)
is treated as not-a-domain. Also allows *.example.*, *.test,
*.localhost, *.invalid, *.local, and a short explicit
allowlist (github.com, api.github.com,
raw.githubusercontent.com, docs.jacobpevans.com, runs-on.com,
healthchecks.io). Skips pre-commit repo:, container image:,
and markdown link-reference lines.

State key is (file, detector, value) so acknowledging one IPv4 does
NOT pre-allow an unrelated email.

Test plan

Bats: 55/55 pass across
tests/content-guards/sensitive-content/sensitive-content.bats (IPv4
regression + state machine, 25 cases) and
tests/content-guards/sensitive-content/detectors.bats (per-detector
- cross-detector isolation, 30 cases).
pre-commit run passes (JSON, markdown, EOF newlines, large file
cap).
Verified _domain_allowed against 14 representative cases after
the REAL_TLDS flip (filenames like foo.py/foo.tsx/foo.md pass;
real .io/.com/.ai/.dev/.gov/.uk block; allowlist exacts
preserved).

Breaking changes

validate-no-real-ips.py → validate-sensitive-content.py
State file no-real-ips-state.json → sensitive-content-state.json
Env var NO_REAL_IPS_STATE_FILE → SENSITIVE_CONTENT_STATE_FILE

5-min TTL on the old state file makes the rename self-healing; no
migration code needed.

False-positive notes

The real_domain detector is the highest false-positive risk. The
focused 29-entry REAL_TLDS allowlist is the main mitigation —
anything not ending in a TLD we care about is left alone. Line-level
skips for repo:/image:/markdown link-references handle common
documentation patterns. If churn shows up in practice, the
first-block / second-allow UX gives the agent a clean
acknowledge-and-proceed path.

Related: JacobPEvans/orbstack-kubernetes#234

Blocks IPv4 literals in Write content and Edit new_string when they fall outside the allowlist: 192.168.0.0/24 (sanctioned sample CIDR), loopback, 0.0.0.0, broadcast, and 169.254.169.254 (cloud metadata). Skips lines matching pre-commit version-pin shape ("rev: v0.10.0.1"). First-block / second-allow flow: the first attempt to write a non-allowed IP into a given file blocks with a clear warning explaining the risk and the allowed alternatives. A retry within 5 minutes (same file + same IP) is treated as the agent's acknowledgment and is allowed through — for legitimate uses like private repos, .gitignored files, or scratch buffers. Per-(file, IP) tracking: a new IP on the second write still blocks; the same IP in a different file blocks anew. State lives in $XDG_CACHE_HOME/content-guards/no-real-ips-state.json with a 300s TTL and prune-on-read. Wired into content-guards/hooks/hooks.json alongside validate-token-limits under the existing PreToolUse Write|Edit matcher. Motivated by a real leak in JacobPEvans/orbstack-kubernetes PR #234, where an agent iterating on a failing test pasted the live Splunk IP (observed in Cribl Stream's outputs.yml output) verbatim into two new test cases. The repo's existing pre-commit no-real-ips hook missed it because it only scanned *.yaml/*.sh under k8s/, scripts/, docker/. This PreToolUse hook catches the same class of leak at write time, before it ever lands on disk, and covers every Claude-managed repo automatically. Coverage: 16 bats tests (tool filtering, allowlist, version-pin skip, first-block / second-allow flow, per-file tracking, multi-IP partial acknowledgment). Assisted-by: Claude <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces the no-real-ips content guard, which prevents the accidental commitment of live IPv4 addresses by blocking them on the first attempt and requiring a retry within five minutes for acknowledgment. The implementation includes a Python validation script, hook registration, and a comprehensive BATS test suite. Review feedback focuses on hardening the implementation by refining the IPv4 regex to strictly match the 0-255 octet range, ensuring atomic state file writes to prevent corruption during concurrent execution, and normalizing file paths to absolute paths for consistent acknowledgment tracking.

…ize paths Addresses gemini-code-assist review feedback on PR #319. - IP_PATTERN and ALLOWED_PATTERNS now use a strict 0-255 octet sub-pattern (_OCTET) so values like 999.999.999.999 no longer match as IPs at all. Reduces false positives. - save_state writes to a sibling .tmp file and os.replace's into place. Atomic against concurrent hook invocations during parallel tool execution. - file_path is normalized via os.path.realpath (stronger than the suggested os.path.abspath — also resolves symlinks). On macOS the /var -> /private/var symlink would otherwise cause the same file to be tracked under two state keys depending on how the agent spelled the path. realpath collapses both spellings to the same canonical path. Adds 4 bats tests (TC6a/b/c, TC7) covering the new behaviors. Assisted-by: Claude <noreply@anthropic.com>

@example

Renames the IPv4-only hook to a general sensitive-content guard covering 7 detector categories with clean regexes and low false-positive rates. Each detector has its own allowlist and shares the first-block / second- allow UX so legitimate uses (private repos, scratch files, .gitignored paths) can proceed on retry. Detectors: - ipv4: existing behavior preserved (192.168.0.0/24, loopback, 0.0.0.0, broadcast, link-local metadata) - ipv6: outside ::, ::1, fe80::, fc00::/7, 2001:db8::, ff00:: - email: outside noreply@github.com, *.users.noreply.github.com, *@example.{com,org,net,local}, *@test, *@localhost, <placeholder@> - absolute_user_path: hard-coded /Users/<name>/ or /home/<name>/ outside ${USER}/$USER/<user> placeholders - private_key_header: always blocked - aws_account_id: line-context-gated 12-digit numbers, allows AWS's documented 123456789012 sample - real_domain: FQDN-shaped tokens outside *.example.*, *.test, *.localhost, *.invalid, *.local, and a short explicit allowlist (github.com, docs.jacobpevans.com, runs-on.com, healthchecks.io) State key is (file, detector, value) so acknowledging one IPv4 does not pre-allow an unrelated email or domain. Bats tests split into sensitive-content.bats (IPv4 regression: 25 cases) and detectors.bats (per-detector + isolation: 30 cases). All 55 tests pass. BREAKING CHANGE: renames validate-no-real-ips.py to validate-sensitive-content.py, state file no-real-ips-state.json to sensitive-content-state.json, env var NO_REAL_IPS_STATE_FILE to SENSITIVE_CONTENT_STATE_FILE. Assisted-by: Claude <noreply@anthropic.com>

The detector's is_allowed is always False (private keys never have a legitimate allowlist), so the argument is intentionally unused. Rename `_v` to `_` to match the Pyright convention for ignored args. Assisted-by: Claude <noreply@anthropic.com>

…list Replace the 86-entry file-extension skip set with a focused ~29-TLD allowlist of popular real TLDs (com, net, org, io, ai, dev, app, co, cloud, gov, edu, mil, info, biz, me, tv, fm, ly, us, uk, de, jp, ca, au, fr, cn, eu, tech, xyz, online, sh). Only candidates whose TLD is in this set are even considered; everything else (filenames, version strings, anything ending in an unfamiliar suffix) is allowed by default. Lower false-positive risk and far easier to audit than enumerating every possible non-TLD suffix. Verified domain logic against 14 representative cases (filename foo.py allowed, real .io/.ai/.dev blocked, allowlist exacts preserved). Assisted-by: Claude <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

Comment thread content-guards/scripts/validate-no-real-ips.py Outdated

Comment thread content-guards/scripts/validate-no-real-ips.py Outdated

Comment thread content-guards/scripts/validate-no-real-ips.py Outdated

Comment thread content-guards/scripts/validate-no-real-ips.py Outdated

JacobPEvans mentioned this pull request May 24, 2026

chore: audit cleanup pass (2026-05-22) JacobPEvans/orbstack-kubernetes#234

Open

4 tasks

JacobPEvans added 2 commits May 23, 2026 22:19

JacobPEvans changed the title ~~feat(content-guards): add no-real-ips PreToolUse Write/Edit hook~~ feat(content-guards)!: generalize no-real-ips → sensitive-content-guard May 24, 2026

JacobPEvans added 2 commits May 24, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(content-guards)!: generalize no-real-ips → sensitive-content-guard#319

feat(content-guards)!: generalize no-real-ips → sensitive-content-guard#319
JacobPEvans wants to merge 5 commits into
mainfrom
feat/no-real-ips-guard

JacobPEvans commented May 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JacobPEvans commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Detectors

Test plan

Breaking changes

False-positive notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JacobPEvans commented May 24, 2026 •

edited

Loading