fix(scan): honor .rafter.yml scan.exclude_paths on both engines (sable-yz0)#152
Merged
Conversation
…e-yz0)
Customer report: planted fake Stripe keys in three scan.exclude_paths
entries AND a non-excluded path; all three got flagged — exclude_paths
silently ignored. Two root causes:
1. BetterleaksScanner.scanDirectory() never received excludePaths.
The patterns engine got it, the betterleaks happy-path didn't —
and `auto` (the default when the binary is on disk) picks
betterleaks. Customer was running through the betterleaks path,
which is why nothing was excluded.
2. The patterns engine's walker only matched excludePaths entries
against single directory NAMES (`entry.name`). Customers writing
`components/common/Mermaid.tsx` (file path) or
`supabase/migrations/foo.sql` (multi-segment path) got no
filtering at all.
Fix: post-filter chokepoint after both engines (and after staged /
diff aggregation), using path-aware semantics. Both runtimes:
- Exact match: rel_path == pattern
- Directory prefix: rel_path starts with pattern + "/". Trailing "/"
on the pattern is normalized away (`scripts/` == `scripts`).
- Dir-name anywhere: any segment of rel_path equals pattern.
Preserves the historical RegexScanner walker behavior so existing
`node_modules` policies still work for nested copies.
- Glob: patterns containing `* ? [` run through minimatch (Node) /
fnmatch (Python), with auto-anchor for relative globs.
Customer's exact repro now passes on both engines: only safe/leaky.ts
fires, all three excluded paths filtered, exit 1.
Rafter agent review: PASS for merge, no critical/high. Two low-severity
ReDoS observations (minimatch / fnmatch pattern-length cap as
defense-in-depth) tracked in follow-up bead.
Tests: 5 Node + 12 Python assertions, all green. Broader regression
suites pass: Python test_agent_scan_history + test_agent_baseline (23
cases), Node baseline.test.ts (12 cases).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rome-1
added a commit
that referenced
this pull request
Jun 1, 2026
…sable-c1c) (#154) Backend triage (hq-4tcey, closed) confirmed rafter-backend reads policy at .rafter/config.yml (subdir + config.yml) with exclude_paths / custom_patterns flat at the top level, while the CLI canonical is .rafter.yml with scan.* nested. Customers writing either shape get honored by only one tool — the customer's .rafter.yml worked locally (after sable-yz0) and silently no-op'd on the remote cloud scan. Bilateral alignment plan: - Backend adds .rafter.yml fallback with scan.* schema compat (their preferred direction). - CLI (this PR) reads .rafter/config.yml indefinitely and accepts the backend flat shape alongside the canonical nested form. No deprecation — both shapes supported permanently. Two changes in policy-loader.ts / policy_loader.py: 1. findPolicyFile / find_policy_file walks all four candidates in precedence order: .rafter.yml > .rafter.yaml > .rafter/config.yml > .rafter/config.yaml. The canonical dotfile wins if both shapes exist (matching prior CLI behavior for users who already wrote .rafter.yml). 2. mapPolicy / _map_policy accepts top-level `exclude_paths` and `custom_patterns` (backend flat shape) and folds them into policy.scan.* — but only if the nested form wasn't already set. Nested form takes precedence on collision. Top-level compat keys are added to VALID_TOP_LEVEL_KEYS so they don't trigger "Unknown policy key" warnings on validate. Tests: 8 Node + 7 Python new assertions covering subdir-only discovery, extension variants, dotfile-wins precedence, top-level schema accepted on both file paths, nested wins on collision, no warnings for compat keys. Broader regression sweeps (25 Python + 16 Node) still pass. Paired with sable-yz0 (PR #152), this unblocks customers writing either CLI-shape or backend-shape policy. Once rafter-backend adds the .rafter.yml fallback (their side of the bilateral fix), both tools honor either file in either shape. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Customer planted fake Stripe keys in three
scan.exclude_pathsentries AND a non-excluded path → all three got flagged.exclude_pathswas silently dropped. Fix the two root causes in v0.8.3:BetterleaksScanner.scanDirectory()never receivedexcludePaths— only the patterns engine got it, andauto(the default) picks betterleaks when the binary is on disk.excludePathsentries against single directory NAMES (entry.name) — multi-segment paths likecomponents/common/Mermaid.tsxandsupabase/migrations/foo.sqlgot no filtering.Solution: a single
applyExcludePaths/_apply_exclude_pathschokepoint after both engines (and the--staged/--diffmodes), with path-aware semantics:scripts/orscriptsscriptsdir at root + everything under it; also any segment namedscriptsanywhere (preserves the historical dir-name-anywhere behavior so existingnode_modulespolicies still cover nested copies)components/common/Mermaid.tsx**/*.sql(any glob char)Customer's exact repro
Plant a fake Stripe key in each excluded path +
safe/leaky.ts. Runrafter secrets .on either engine. Before: all 4 flagged. After: onlysafe/leaky.ts. Verified locally on both--engine patternsand--engine auto(betterleaks).Tests
node/tests/scan-exclude-paths.test.ts(subprocess end-to-end)python/tests/test_scan_exclude_paths.py(helpers + chokepoint +_scan_directoryE2E)python/tests/test_agent_scan_history.py+test_agent_baseline.py(regression)node/tests/baseline.test.ts(regression)Security review
rafteragent review — PASS for merge, no critical/high findings.Two low-severity defense-in-depth observations tracked as the follow-up bead
sable-aem(P3): cap pattern length at 512 chars inpolicy-loaderto neutralize pathological minimatch/fnmatch input. Attacker model is local-only (need write access to.rafter.yml, at which point scanning is already neuterable), so impact is low and out of scope for this PR.Two info-level notes:
srcortmpdrop findings under any matching segment in the tree — documented and intentional (back-compat with the historicalRegexScannerwalker). Worth surfacing in user-facing docs in a follow-up.scanRootkeep the absolute path. Theoretical edge case — scanners produce in-root paths today;path.resolvecanonicalizes..before the prefix check, so traversal can't sneak underscanRoot.Tracking
sable-yz0(P0 bug, claimed by me)sable-s2t(P1) — PR-comment formatter shows hashedR-XXXXXrule IDs instead of scanner-native namessable-5vo(P1) — verify remote scanner (rafter run//api/static/scan) honors.rafter.yml scan.exclude_pathssable-aem(P3) — pattern-length capWhat's still customer-blocking after this lands
sable-5vocovers the verification path. Ifrafter rundoesn't honor.rafter.ymleither, the fallback israfter agent baselineand we'll document it.🤖 Generated with Claude Code