Skip to content

fix(scan): honor .rafter.yml scan.exclude_paths on both engines (sable-yz0)#152

Merged
Rome-1 merged 1 commit into
mainfrom
sable-yz0-exclude-paths-fix
Jun 1, 2026
Merged

fix(scan): honor .rafter.yml scan.exclude_paths on both engines (sable-yz0)#152
Rome-1 merged 1 commit into
mainfrom
sable-yz0-exclude-paths-fix

Conversation

@Rome-1
Copy link
Copy Markdown
Collaborator

@Rome-1 Rome-1 commented Jun 1, 2026

Summary

Customer planted fake Stripe keys in three scan.exclude_paths entries AND a non-excluded path → all three got flagged. exclude_paths was silently dropped. Fix the two root causes in v0.8.3:

  1. BetterleaksScanner.scanDirectory() never received excludePaths — only the patterns engine got it, and auto (the default) picks betterleaks when the binary is on disk.
  2. The patterns engine's walker only matched excludePaths entries against single directory NAMES (entry.name) — multi-segment paths like components/common/Mermaid.tsx and supabase/migrations/foo.sql got no filtering.

Solution: a single applyExcludePaths / _apply_exclude_paths chokepoint after both engines (and the --staged / --diff modes), with path-aware semantics:

Pattern form Matches
scripts/ or scripts the scripts dir at root + everything under it; also any segment named scripts anywhere (preserves the historical dir-name-anywhere behavior so existing node_modules policies still cover nested copies)
components/common/Mermaid.tsx exact file at that relative path
**/*.sql (any glob char) minimatch (Node) / fnmatch (Python), with auto-anchor for relative globs

Customer's exact repro

# .rafter.yml
scan:
  exclude_paths:
    - scripts/
    - components/common/Mermaid.tsx
    - supabase/migrations/20250215000000_resend_setup.sql

Plant a fake Stripe key in each excluded path + safe/leaky.ts. Run rafter secrets . on either engine. Before: all 4 flagged. After: only safe/leaky.ts. Verified locally on both --engine patterns and --engine auto (betterleaks).

Tests

Suite Cases Status
node/tests/scan-exclude-paths.test.ts (subprocess end-to-end) 5 Pass
python/tests/test_scan_exclude_paths.py (helpers + chokepoint + _scan_directory E2E) 12 Pass
python/tests/test_agent_scan_history.py + test_agent_baseline.py (regression) 23 Pass
node/tests/baseline.test.ts (regression) 12 Pass

Security review

rafter agent review — PASS for merge, no critical/high findings.

Two low-severity defense-in-depth observations tracked as the follow-up bead sable-aem (P3): cap pattern length at 512 chars in policy-loader to neutralize pathological minimatch/fnmatch input. Attacker model is local-only (need write access to .rafter.yml, at which point scanning is already neuterable), so impact is low and out of scope for this PR.

Two info-level notes:

  • Dir-name-anywhere is broad by design. Bare entries like src or tmp drop findings under any matching segment in the tree — documented and intentional (back-compat with the historical RegexScanner walker). Worth surfacing in user-facing docs in a follow-up.
  • Findings outside scanRoot keep the absolute path. Theoretical edge case — scanners produce in-root paths today; path.resolve canonicalizes .. before the prefix check, so traversal can't sneak under scanRoot.

Tracking

  • Bead: sable-yz0 (P0 bug, claimed by me)
  • Sibling beads from the same customer triage:
    • sable-s2t (P1) — PR-comment formatter shows hashed R-XXXXX rule IDs instead of scanner-native names
    • sable-5vo (P1) — verify remote scanner (rafter run / /api/static/scan) honors .rafter.yml scan.exclude_paths
  • Follow-up from rafter review: sable-aem (P3) — pattern-length cap

What's still customer-blocking after this lands

  • They need to upgrade past v0.7.9 (currently ~2 weeks behind master). Patching to v0.8.4 (which will include this fix) unblocks the local-scan case.
  • The remote-scan case still isn't tested; sable-5vo covers the verification path. If rafter run doesn't honor .rafter.yml either, the fallback is rafter agent baseline and we'll document it.

🤖 Generated with Claude Code

…e-yz0)

Customer report: planted fake Stripe keys in three scan.exclude_paths
entries AND a non-excluded path; all three got flagged — exclude_paths
silently ignored. Two root causes:

  1. BetterleaksScanner.scanDirectory() never received excludePaths.
     The patterns engine got it, the betterleaks happy-path didn't —
     and `auto` (the default when the binary is on disk) picks
     betterleaks. Customer was running through the betterleaks path,
     which is why nothing was excluded.
  2. The patterns engine's walker only matched excludePaths entries
     against single directory NAMES (`entry.name`). Customers writing
     `components/common/Mermaid.tsx` (file path) or
     `supabase/migrations/foo.sql` (multi-segment path) got no
     filtering at all.

Fix: post-filter chokepoint after both engines (and after staged /
diff aggregation), using path-aware semantics. Both runtimes:

  - Exact match: rel_path == pattern
  - Directory prefix: rel_path starts with pattern + "/". Trailing "/"
    on the pattern is normalized away (`scripts/` == `scripts`).
  - Dir-name anywhere: any segment of rel_path equals pattern.
    Preserves the historical RegexScanner walker behavior so existing
    `node_modules` policies still work for nested copies.
  - Glob: patterns containing `* ? [` run through minimatch (Node) /
    fnmatch (Python), with auto-anchor for relative globs.

Customer's exact repro now passes on both engines: only safe/leaky.ts
fires, all three excluded paths filtered, exit 1.

Rafter agent review: PASS for merge, no critical/high. Two low-severity
ReDoS observations (minimatch / fnmatch pattern-length cap as
defense-in-depth) tracked in follow-up bead.

Tests: 5 Node + 12 Python assertions, all green. Broader regression
suites pass: Python test_agent_scan_history + test_agent_baseline (23
cases), Node baseline.test.ts (12 cases).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rome-1 added a commit that referenced this pull request Jun 1, 2026
…sable-c1c) (#154)

Backend triage (hq-4tcey, closed) confirmed rafter-backend reads policy
at .rafter/config.yml (subdir + config.yml) with exclude_paths /
custom_patterns flat at the top level, while the CLI canonical is
.rafter.yml with scan.* nested. Customers writing either shape get
honored by only one tool — the customer's .rafter.yml worked locally
(after sable-yz0) and silently no-op'd on the remote cloud scan.

Bilateral alignment plan:
  - Backend adds .rafter.yml fallback with scan.* schema compat
    (their preferred direction).
  - CLI (this PR) reads .rafter/config.yml indefinitely and accepts
    the backend flat shape alongside the canonical nested form. No
    deprecation — both shapes supported permanently.

Two changes in policy-loader.ts / policy_loader.py:

1. findPolicyFile / find_policy_file walks all four candidates in
   precedence order: .rafter.yml > .rafter.yaml > .rafter/config.yml
   > .rafter/config.yaml. The canonical dotfile wins if both shapes
   exist (matching prior CLI behavior for users who already wrote
   .rafter.yml).

2. mapPolicy / _map_policy accepts top-level `exclude_paths` and
   `custom_patterns` (backend flat shape) and folds them into
   policy.scan.* — but only if the nested form wasn't already set.
   Nested form takes precedence on collision. Top-level compat keys
   are added to VALID_TOP_LEVEL_KEYS so they don't trigger
   "Unknown policy key" warnings on validate.

Tests: 8 Node + 7 Python new assertions covering subdir-only discovery,
extension variants, dotfile-wins precedence, top-level schema accepted
on both file paths, nested wins on collision, no warnings for compat
keys. Broader regression sweeps (25 Python + 16 Node) still pass.

Paired with sable-yz0 (PR #152), this unblocks customers writing
either CLI-shape or backend-shape policy. Once rafter-backend adds the
.rafter.yml fallback (their side of the bilateral fix), both tools
honor either file in either shape.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@Rome-1 Rome-1 merged commit 55f6fa4 into main Jun 1, 2026
@Rome-1 Rome-1 deleted the sable-yz0-exclude-paths-fix branch June 1, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant