fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend by anirudhp26 · Pull Request #201 · PlawIO/veto

anirudhp26 · 2026-04-27T14:05:19Z

Audit follow-up — second of three PRs (after #200; see #196 for the original audit).

Findings closed

1. Validator `priority=0` silently sorted as 100 (Python)

_sort_validators (validator.py:305) and normalize_validator (config.py:160) both used priority or 100, evaluating 0 as falsy. A user setting "run me first, ahead of everything" got the opposite. Replaced with explicit priority is not None checks. (TS already used ?? so this is Python-only.)

2. Audit-trail mismatch on `decision: 'modify'`

When a validator returned modify with modified_arguments, the engine forwarded the modified context downstream and the tool ran with the new args, but HistoryTracker.record(...) was called with the original call.arguments. Net effect: incident review showed "user said X" while the tool actually executed "Y". Fixed in both SDKs — history now reflects the args the tool actually saw.

3. Silent fail-open on rejected `matches` regex

When is_safe_pattern rejected a pattern (length cap, ReDoS heuristic, etc.) the condition silently evaluated to false. A block rule meant "stop dangerous commands"; in reality those commands sailed past. Added a load-time pass that emits a loud error-level log per offender so misconfigured rules surface in startup output. _warn_about_unsafe_rule_patterns (Python) and warnAboutUnsafeRulePatterns (TS).

4. Budget double-refund silently zeroed `spent`

Two refunds of the same reservation dropped tracked spend to 0; subsequent calls then bypassed the limit. Added a token-based API on BudgetTracker:

const reservation = tracker.reserveCall(toolName, args);  // BudgetReservation | null
// …
tracker.releaseReservation(reservation);  // idempotent — repeats are no-ops

The interceptor and browser veto switched to the token API. The legacy reserve(toolName, args): number / refund(amount: number) pair stays for backward compatibility with downstream callers.

5. `require_approval` permanently held the budget reservation

The interceptor only released on decision === 'deny', so an approval that was later rejected left budget committed forever. Now releases on both deny and require_approval — the call doesn't actually run; on retry, the caller reserves again.

Tests

packages/sdk-python/tests/test_validation_hardening.py — 5 cases (priority=0 ordering, modify→history, unsafe-regex error log, safe-regex no-false-positive, helper-direct).
packages/sdk/tests/core/validation-hardening.test.ts — 3 cases (token idempotency, null-reservation tolerance, legacy API preserved).
Full suites green: TS 1385/1385, Python 333/333.

Behavioural notes for downstream

BudgetTracker.reserveCall / releaseReservation are additive APIs. Existing reserve / refund / record / check keep their original signatures and behaviour.
History entries for decision: 'modify' calls now contain the modified args. Anything that asserted on entry.arguments matching call.arguments for these specific calls will need to update.

Test plan

Ship a rule with an intentionally bad regex (e.g. (rm.*|wget.*)) and confirm an ERROR line appears at startup naming the rule.
Add a custom validator with priority=0 (Python) and confirm it runs before validators with priority 5/10.
Use a validator that returns decision: 'modify'; confirm history.getAll() shows the modified args.
Reserve budget, refund the same reservation twice via releaseReservation; confirm getStatus().spent only drops once.

…fail-open, budget double-spend * Python priority=0 treated as falsy → ran with priority=100. Fixed sort + normalize_validator. * decision: 'modify' recorded the original args in history. Now records args the tool actually saw. * Rejected `matches` regex silently made the rule never match (fail-open). Now emits an error log at load time. * Budget double-refund silently zeroed spent. New token-based API: reserveCall / releaseReservation. Idempotent. * require_approval permanently held the reservation. Now released alongside deny. Tests: Python 5 new (333 total), TS 3 new (1385 total). All green.

github-actions · 2026-04-27T14:05:37Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

github-actions · 2026-04-27T14:05:39Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

capy-ai

Added 1 comment

capy-ai · 2026-04-27T14:09:24Z

+  private warnAboutUnsafeRulePatterns(rules: ReadonlyArray<unknown>): void {
+    for (const raw of rules) {
+      if (!raw || typeof raw !== 'object') continue;
+      const rule = raw as Record<string, unknown>;


[🟡 Medium] [🔵 Bug]

Both warnAboutUnsafeRulePatterns (TS) and _warn_about_unsafe_rule_patterns (Python) only iterate rule.conditions but ignore rule.condition_groups. Rules support both fields — condition_groups is an OR-of-AND grouping that uses the same RuleCondition objects with operator: 'matches'. An unsafe regex placed inside condition_groups will silently fail-open at runtime without any startup warning, which is exactly the audit finding #3 scenario this code intends to prevent.

// packages/sdk/src/core/veto.ts const conditions = rule.conditions; if (!Array.isArray(conditions)) continue; // ← condition_groups never checked

# packages/sdk-python/veto/core/veto.py conditions = rule.get("conditions") or [] # ← condition_groups never checked

Fix: after iterating conditions, also iterate rule.condition_groups (a list[list[...]] / RuleCondition[][]) and check each inner condition for operator === 'matches' with an unsafe pattern. A helper that yields all conditions from both fields would avoid duplication.

PR review (capy-ai[bot] on #201): the warn-at-load helper iterated `rule.conditions` (the flat AND list) but ignored `rule.condition_groups` (the OR-of-AND list-of-lists). An unsafe `matches` pattern placed inside condition_groups silently slipped past the startup check and still failed open at runtime — exactly the audit-finding scenario the helper was meant to prevent. Both SDKs now factor a small `_iter_rule_conditions` / `iterRuleConditions` helper that yields every condition from both fields, regardless of which form a given rule uses (or whether it mixes both). The warn helper iterates the unified stream so condition_groups patterns are checked the same way. Tests cover both pure-condition_groups and mixed-shape rules, and explicitly assert exactly one error per offending pattern.

anirudhp26 · 2026-04-27T14:34:08Z

Thanks @capy-ai — confirmed the gap, fixed in 15e7720. Both SDKs now factor a tiny iterRuleConditions / _iter_rule_conditions helper that yields conditions from both rule.conditions and rule.condition_groups. The warn helper iterates the unified stream. New tests cover pure-condition_groups and mixed-shape rules.

mypy strict on CI flagged the new generator helper for a missing return annotation. Add Iterator[Any] from typing.

anirudhp26 · 2026-04-27T15:17:10Z

✅ Live integration test — all green

Beyond the unit tests, ran a real-usage harness inside Docker that builds an actual Veto.from_rules, registers real validators, runs real guard() / intercept() calls, captures stderr from the real StreamLogger, and asserts observable behaviour. No mocks, no test framework — same shape a customer would write.

Image: examples/claude-code/audit/live/Dockerfile. Runner: examples/claude-code/audit/live/run_all.sh. Built off the merged-together state of #200 + #201 + #202 to verify the fixes compose.

================================================================
  Live SDK integration — Python
================================================================

=== PR #200 — stream logger via real Veto.from_rules + guard() ===
  PASS  tool input with embedded newlines stays on one row
  PASS  newline rendered as `\n`, not double-escaped `\\n`
  PASS  deny row carries policy:<id> tag

=== PR #200 — NO_COLOR / FORCE_COLOR via subprocess ===
  PASS  FORCE_COLOR=1 emits ANSI escape sequences
  PASS  NO_COLOR=1 wins over FORCE_COLOR=1 — no ANSI in output

=== PR #201 — validator priority=0 runs first (real Veto.from_rules) ===
  PASS  validator with priority=0 actually runs first (not silently 100)

=== PR #201 — modify decision records final args, not original ===
  PASS  history records the modified args (whitespace-stripped), not original

=== PR #201 — unsafe regex flagged at load (flat + condition_groups) ===
  PASS  unsafe regex in `conditions` is flagged
  PASS  unsafe regex in `condition_groups` is flagged
  PASS  unsafe regex in mixed shape is flagged
  PASS  safe regex is NOT flagged

=== PR #202 — Python `matches` operator case-insensitive by default ===
  PASS  Python `matches 'hello'` is case-insensitive against 'HELLO world'

  Python SDK live integration: 12/12 passed

================================================================
  Live SDK integration — TypeScript
================================================================

=== PR #200 — stream logger via real StreamLogger ===
  PASS  multi-line arg stays on a single output row
  PASS  newline rendered as `\n` (single backslash + n)

=== PR #200 — non-finite latency renders as `-`, no exception ===
  PASS  NaN / ±Infinity / negative latency does not throw
  PASS  all four non-finite/negative cases render the `-` placeholder

=== PR #200 — `isDecisionStreamLogger` no longer duck-types ===
  PASS  user logger with unrelated `streamDecision` method is rejected
  PASS  real StreamLogger instance is recognised

=== PR #201 — token-based BudgetTracker.reserveCall / releaseReservation ===
  PASS  two reserves push spent to 60
  PASS  three releases of the same token only release once (idempotent)
  PASS  releasing null is a no-op (no throw)
  PASS  second reservation can still be released

=== PR #201 — unsafe regex in condition_groups flagged at load ===
  PASS  TS warn-helper walks condition_groups and surfaces the offender

=== PR #202 — TS `matches` operator case-insensitive by default ===
  PASS  TS `matches 'hello'` is case-insensitive against 'HELLO world'
  PASS  TS `(?-i)hello` opt-in case-sensitive against 'HELLO' is false

  TypeScript SDK live integration: 13/13 passed

  ALL PASS — 25/25

Unit-test counts (post review-comment fixes)

	TS	Python
Suite total	1399	351
New regression coverage from this PR series	20 (logger 17 + validation 3)	28 (logger 23 + validation 5)

Suggested merge order

fix(stream-logger): harden output, formatter, and logger detection #200 (stream-logger hardening) — foundational; introduces BaseStreamLogger + the filter-at-source pattern that fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend #201 relies on for clean stream output.
fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend #201 (validation hardening) — builds on fix(stream-logger): harden output, formatter, and logger detection #200 (no longer needs the cross-layer warn-suppression hack).
fix(parity): cross-SDK regex case-sensitivity + dedupe length checks #202 (cross-SDK parity) — independent; lands cleanly on whichever order.

Verified via a 3-way merge of all three branches into a temporary working tree — no conflicts, all suites green.

…202) * fix(parity): align TS expression-DSL `matches` with Python; dedupe length checks Audit follow-up — third of three PRs (after #200 stream-logger, #201 validation hardening; see #196 for the original audit). Cross-SDK regex parity ---------------------- The expression-DSL `matches` operator behaved differently across SDKs: * Python (`evaluate_legacy_condition`) compiles with `re.IGNORECASE`. * TS (`compiler/evaluator.ts:167`) compiled with no flags. Same policy expression therefore returned different decisions across SDKs — a tested-against-Python rule could silently fail in TS prod (or vice versa). The TS condition-evaluator path was already correct (`createSafeRegex(expected, 'i')` at line 523); only the expression compiler was off. Pinned both to **case-insensitive by default**. Opt back into case-sensitive matching with an inline `(?-i)` prefix at the start of the pattern, e.g. `args.s matches '(?-i)Exact'`. JS RegExp doesn't honour PCRE-style inline flags, so the prefix is parsed and stripped before construction. Length-check dedup ------------------ `is_safe_pattern` / `isSafePattern` already enforces the `MAX_PATTERN_LENGTH = 256` cap. The wrappers in `condition_evaluator.py:create_safe_regex` and `condition-evaluator.ts:createSafeRegex` redundantly checked the same cap, so a future bump to one constant could drift the other. Removed the duplicate checks. Tests ----- * `packages/sdk/tests/compiler/regex-parity.test.ts` — 3 cases pinning default insensitivity, the `(?-i)` opt-out, and character-class behaviour under the default flag. * Full suites: TS 1385/1385, Python 328/328. Behavioural notes for downstream -------------------------------- * TS expression-DSL `matches` is now case-insensitive by default. Any TS-only policy that relied on `'Hello' matches 'Hello'` returning false now returns true. Add `(?-i)` to recover the previous behaviour. * chore: add changeset for cross-SDK regex parity

github-actions Bot added area:sdk Changes in the TypeScript SDK area:python Changes in the Python SDK labels Apr 27, 2026

capy-ai Bot reviewed Apr 27, 2026

View reviewed changes

anirudhp26 mentioned this pull request Apr 27, 2026

fix(parity): cross-SDK regex case-sensitivity + dedupe length checks #202

Merged

3 tasks

anirudhp26 added 2 commits April 27, 2026 20:28

fix(types): annotate _iter_rule_conditions return type

fcc4d24

mypy strict on CI flagged the new generator helper for a missing return annotation. Add Iterator[Any] from typing.

chore: add changeset for validation hardening

710d507

github-actions Bot added the area:docs Documentation updates label Apr 27, 2026

anirudhp26 mentioned this pull request Apr 27, 2026

fix(stream-logger): harden output, formatter, and logger detection #200

Merged

4 tasks

yazcaleb approved these changes Apr 27, 2026

View reviewed changes

yazcaleb merged commit 959f91c into master Apr 27, 2026
14 of 15 checks passed

yazcaleb deleted the ani/audit-fixes-2-validation branch April 27, 2026 18:39

anirudhp26 mentioned this pull request Apr 27, 2026

fix(critical): close silent fail-open patterns in protect() init and rule eval #203

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend#201

fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend#201
yazcaleb merged 4 commits intomasterfrom
ani/audit-fixes-2-validation

anirudhp26 commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

capy-ai Bot left a comment

Uh oh!

capy-ai Bot Apr 27, 2026

Uh oh!

anirudhp26 commented Apr 27, 2026

Uh oh!

anirudhp26 commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anirudhp26 commented Apr 27, 2026

Findings closed

1. Validator priority=0 silently sorted as 100 (Python)

2. Audit-trail mismatch on decision: 'modify'

3. Silent fail-open on rejected matches regex

4. Budget double-refund silently zeroed spent

5. require_approval permanently held the budget reservation

Tests

Behavioural notes for downstream

Test plan

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

capy-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

capy-ai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

anirudhp26 commented Apr 27, 2026

Uh oh!

anirudhp26 commented Apr 27, 2026

✅ Live integration test — all green

Unit-test counts (post review-comment fixes)

Suggested merge order

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Validator `priority=0` silently sorted as 100 (Python)

2. Audit-trail mismatch on `decision: 'modify'`

3. Silent fail-open on rejected `matches` regex

4. Budget double-refund silently zeroed `spent`

5. `require_approval` permanently held the budget reservation

github-actions Bot commented Apr 27, 2026 •

edited

Loading