Skip to content

fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend#201

Merged
yazcaleb merged 4 commits intomasterfrom
ani/audit-fixes-2-validation
Apr 27, 2026
Merged

fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend#201
yazcaleb merged 4 commits intomasterfrom
ani/audit-fixes-2-validation

Conversation

@anirudhp26
Copy link
Copy Markdown
Contributor

Audit follow-up — second of three PRs (after #200; see #196 for the original audit).

Findings closed

1. Validator priority=0 silently sorted as 100 (Python)

_sort_validators (validator.py:305) and normalize_validator (config.py:160) both used priority or 100, evaluating 0 as falsy. A user setting "run me first, ahead of everything" got the opposite. Replaced with explicit priority is not None checks. (TS already used ?? so this is Python-only.)

2. Audit-trail mismatch on decision: 'modify'

When a validator returned modify with modified_arguments, the engine forwarded the modified context downstream and the tool ran with the new args, but HistoryTracker.record(...) was called with the original call.arguments. Net effect: incident review showed "user said X" while the tool actually executed "Y". Fixed in both SDKs — history now reflects the args the tool actually saw.

3. Silent fail-open on rejected matches regex

When is_safe_pattern rejected a pattern (length cap, ReDoS heuristic, etc.) the condition silently evaluated to false. A block rule meant "stop dangerous commands"; in reality those commands sailed past. Added a load-time pass that emits a loud error-level log per offender so misconfigured rules surface in startup output. _warn_about_unsafe_rule_patterns (Python) and warnAboutUnsafeRulePatterns (TS).

4. Budget double-refund silently zeroed spent

Two refunds of the same reservation dropped tracked spend to 0; subsequent calls then bypassed the limit. Added a token-based API on BudgetTracker:

const reservation = tracker.reserveCall(toolName, args);  // BudgetReservation | null
// …
tracker.releaseReservation(reservation);  // idempotent — repeats are no-ops

The interceptor and browser veto switched to the token API. The legacy reserve(toolName, args): number / refund(amount: number) pair stays for backward compatibility with downstream callers.

5. require_approval permanently held the budget reservation

The interceptor only released on decision === 'deny', so an approval that was later rejected left budget committed forever. Now releases on both deny and require_approval — the call doesn't actually run; on retry, the caller reserves again.

Tests

  • packages/sdk-python/tests/test_validation_hardening.py — 5 cases (priority=0 ordering, modify→history, unsafe-regex error log, safe-regex no-false-positive, helper-direct).
  • packages/sdk/tests/core/validation-hardening.test.ts — 3 cases (token idempotency, null-reservation tolerance, legacy API preserved).
  • Full suites green: TS 1385/1385, Python 333/333.

Behavioural notes for downstream

  • BudgetTracker.reserveCall / releaseReservation are additive APIs. Existing reserve / refund / record / check keep their original signatures and behaviour.
  • History entries for decision: 'modify' calls now contain the modified args. Anything that asserted on entry.arguments matching call.arguments for these specific calls will need to update.

Test plan

  • Ship a rule with an intentionally bad regex (e.g. (rm.*|wget.*)) and confirm an ERROR line appears at startup naming the rule.
  • Add a custom validator with priority=0 (Python) and confirm it runs before validators with priority 5/10.
  • Use a validator that returns decision: 'modify'; confirm history.getAll() shows the modified args.
  • Reserve budget, refund the same reservation twice via releaseReservation; confirm getStatus().spent only drops once.

…fail-open, budget double-spend

* Python priority=0 treated as falsy → ran with priority=100. Fixed sort + normalize_validator.
* decision: 'modify' recorded the original args in history. Now records args the tool actually saw.
* Rejected `matches` regex silently made the rule never match (fail-open). Now emits an error log at load time.
* Budget double-refund silently zeroed spent. New token-based API: reserveCall / releaseReservation. Idempotent.
* require_approval permanently held the reservation. Now released alongside deny.

Tests: Python 5 new (333 total), TS 3 new (1385 total). All green.
@github-actions
Copy link
Copy Markdown
Contributor


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@github-actions github-actions Bot added area:sdk Changes in the TypeScript SDK area:python Changes in the Python SDK labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@capy-ai capy-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 1 comment

private warnAboutUnsafeRulePatterns(rules: ReadonlyArray<unknown>): void {
for (const raw of rules) {
if (!raw || typeof raw !== 'object') continue;
const rule = raw as Record<string, unknown>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🟡 Medium] [🔵 Bug]

Both warnAboutUnsafeRulePatterns (TS) and _warn_about_unsafe_rule_patterns (Python) only iterate rule.conditions but ignore rule.condition_groups. Rules support both fields — condition_groups is an OR-of-AND grouping that uses the same RuleCondition objects with operator: 'matches'. An unsafe regex placed inside condition_groups will silently fail-open at runtime without any startup warning, which is exactly the audit finding #3 scenario this code intends to prevent.

// packages/sdk/src/core/veto.ts
const conditions = rule.conditions;
if (!Array.isArray(conditions)) continue; // ← condition_groups never checked
# packages/sdk-python/veto/core/veto.py
conditions = rule.get("conditions") or []
# ← condition_groups never checked

Fix: after iterating conditions, also iterate rule.condition_groups (a list[list[...]] / RuleCondition[][]) and check each inner condition for operator === 'matches' with an unsafe pattern. A helper that yields all conditions from both fields would avoid duplication.

PR review (capy-ai[bot] on #201): the warn-at-load helper iterated
`rule.conditions` (the flat AND list) but ignored
`rule.condition_groups` (the OR-of-AND list-of-lists). An unsafe
`matches` pattern placed inside condition_groups silently slipped
past the startup check and still failed open at runtime — exactly
the audit-finding scenario the helper was meant to prevent.

Both SDKs now factor a small `_iter_rule_conditions` /
`iterRuleConditions` helper that yields every condition from both
fields, regardless of which form a given rule uses (or whether it
mixes both). The warn helper iterates the unified stream so
condition_groups patterns are checked the same way.

Tests cover both pure-condition_groups and mixed-shape rules, and
explicitly assert exactly one error per offending pattern.
@anirudhp26
Copy link
Copy Markdown
Contributor Author

Thanks @capy-ai — confirmed the gap, fixed in 15e7720. Both SDKs now factor a tiny iterRuleConditions / _iter_rule_conditions helper that yields conditions from both rule.conditions and rule.condition_groups. The warn helper iterates the unified stream. New tests cover pure-condition_groups and mixed-shape rules.

mypy strict on CI flagged the new generator helper for a missing
return annotation. Add Iterator[Any] from typing.
@anirudhp26
Copy link
Copy Markdown
Contributor Author

✅ Live integration test — all green

Beyond the unit tests, ran a real-usage harness inside Docker that builds an actual Veto.from_rules, registers real validators, runs real guard() / intercept() calls, captures stderr from the real StreamLogger, and asserts observable behaviour. No mocks, no test framework — same shape a customer would write.

Image: examples/claude-code/audit/live/Dockerfile. Runner: examples/claude-code/audit/live/run_all.sh. Built off the merged-together state of #200 + #201 + #202 to verify the fixes compose.

================================================================
  Live SDK integration — Python
================================================================

=== PR #200 — stream logger via real Veto.from_rules + guard() ===
  PASS  tool input with embedded newlines stays on one row
  PASS  newline rendered as `\n`, not double-escaped `\\n`
  PASS  deny row carries policy:<id> tag

=== PR #200 — NO_COLOR / FORCE_COLOR via subprocess ===
  PASS  FORCE_COLOR=1 emits ANSI escape sequences
  PASS  NO_COLOR=1 wins over FORCE_COLOR=1 — no ANSI in output

=== PR #201 — validator priority=0 runs first (real Veto.from_rules) ===
  PASS  validator with priority=0 actually runs first (not silently 100)

=== PR #201 — modify decision records final args, not original ===
  PASS  history records the modified args (whitespace-stripped), not original

=== PR #201 — unsafe regex flagged at load (flat + condition_groups) ===
  PASS  unsafe regex in `conditions` is flagged
  PASS  unsafe regex in `condition_groups` is flagged
  PASS  unsafe regex in mixed shape is flagged
  PASS  safe regex is NOT flagged

=== PR #202 — Python `matches` operator case-insensitive by default ===
  PASS  Python `matches 'hello'` is case-insensitive against 'HELLO world'

  Python SDK live integration: 12/12 passed

================================================================
  Live SDK integration — TypeScript
================================================================

=== PR #200 — stream logger via real StreamLogger ===
  PASS  multi-line arg stays on a single output row
  PASS  newline rendered as `\n` (single backslash + n)

=== PR #200 — non-finite latency renders as `-`, no exception ===
  PASS  NaN / ±Infinity / negative latency does not throw
  PASS  all four non-finite/negative cases render the `-` placeholder

=== PR #200 — `isDecisionStreamLogger` no longer duck-types ===
  PASS  user logger with unrelated `streamDecision` method is rejected
  PASS  real StreamLogger instance is recognised

=== PR #201 — token-based BudgetTracker.reserveCall / releaseReservation ===
  PASS  two reserves push spent to 60
  PASS  three releases of the same token only release once (idempotent)
  PASS  releasing null is a no-op (no throw)
  PASS  second reservation can still be released

=== PR #201 — unsafe regex in condition_groups flagged at load ===
  PASS  TS warn-helper walks condition_groups and surfaces the offender

=== PR #202 — TS `matches` operator case-insensitive by default ===
  PASS  TS `matches 'hello'` is case-insensitive against 'HELLO world'
  PASS  TS `(?-i)hello` opt-in case-sensitive against 'HELLO' is false

  TypeScript SDK live integration: 13/13 passed

  ALL PASS — 25/25

Unit-test counts (post review-comment fixes)

TS Python
Suite total 1399 351
New regression coverage from this PR series 20 (logger 17 + validation 3) 28 (logger 23 + validation 5)

Suggested merge order

  1. fix(stream-logger): harden output, formatter, and logger detection #200 (stream-logger hardening) — foundational; introduces BaseStreamLogger + the filter-at-source pattern that fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend #201 relies on for clean stream output.
  2. fix(validation): audit findings — priority sort, modify audit trail, regex fail-open, budget double-spend #201 (validation hardening) — builds on fix(stream-logger): harden output, formatter, and logger detection #200 (no longer needs the cross-layer warn-suppression hack).
  3. fix(parity): cross-SDK regex case-sensitivity + dedupe length checks #202 (cross-SDK parity) — independent; lands cleanly on whichever order.

Verified via a 3-way merge of all three branches into a temporary working tree — no conflicts, all suites green.

yazcaleb pushed a commit that referenced this pull request Apr 27, 2026
…202)

* fix(parity): align TS expression-DSL `matches` with Python; dedupe length checks

Audit follow-up — third of three PRs (after #200 stream-logger, #201
validation hardening; see #196 for the original audit).

Cross-SDK regex parity
----------------------
The expression-DSL `matches` operator behaved differently across SDKs:

  * Python (`evaluate_legacy_condition`) compiles with `re.IGNORECASE`.
  * TS    (`compiler/evaluator.ts:167`) compiled with no flags.

Same policy expression therefore returned different decisions across
SDKs — a tested-against-Python rule could silently fail in TS prod (or
vice versa). The TS condition-evaluator path was already correct
(`createSafeRegex(expected, 'i')` at line 523); only the expression
compiler was off.

Pinned both to **case-insensitive by default**. Opt back into
case-sensitive matching with an inline `(?-i)` prefix at the start of
the pattern, e.g. `args.s matches '(?-i)Exact'`. JS RegExp doesn't
honour PCRE-style inline flags, so the prefix is parsed and stripped
before construction.

Length-check dedup
------------------
`is_safe_pattern` / `isSafePattern` already enforces the
`MAX_PATTERN_LENGTH = 256` cap. The wrappers in
`condition_evaluator.py:create_safe_regex` and
`condition-evaluator.ts:createSafeRegex` redundantly checked the same
cap, so a future bump to one constant could drift the other. Removed
the duplicate checks.

Tests
-----
* `packages/sdk/tests/compiler/regex-parity.test.ts` — 3 cases pinning
  default insensitivity, the `(?-i)` opt-out, and character-class
  behaviour under the default flag.
* Full suites: TS 1385/1385, Python 328/328.

Behavioural notes for downstream
--------------------------------
* TS expression-DSL `matches` is now case-insensitive by default. Any
  TS-only policy that relied on `'Hello' matches 'Hello'` returning
  false now returns true. Add `(?-i)` to recover the previous behaviour.

* chore: add changeset for cross-SDK regex parity
@yazcaleb yazcaleb merged commit 959f91c into master Apr 27, 2026
14 of 15 checks passed
@yazcaleb yazcaleb deleted the ani/audit-fixes-2-validation branch April 27, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentation updates area:python Changes in the Python SDK area:sdk Changes in the TypeScript SDK

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants