Conversation
…fail-open, budget double-spend * Python priority=0 treated as falsy → ran with priority=100. Fixed sort + normalize_validator. * decision: 'modify' recorded the original args in history. Now records args the tool actually saw. * Rejected `matches` regex silently made the rule never match (fail-open). Now emits an error log at load time. * Budget double-refund silently zeroed spent. New token-based API: reserveCall / releaseReservation. Idempotent. * require_approval permanently held the reservation. Now released alongside deny. Tests: Python 5 new (333 total), TS 3 new (1385 total). All green.
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
| private warnAboutUnsafeRulePatterns(rules: ReadonlyArray<unknown>): void { | ||
| for (const raw of rules) { | ||
| if (!raw || typeof raw !== 'object') continue; | ||
| const rule = raw as Record<string, unknown>; |
There was a problem hiding this comment.
[🟡 Medium] [🔵 Bug]
Both warnAboutUnsafeRulePatterns (TS) and _warn_about_unsafe_rule_patterns (Python) only iterate rule.conditions but ignore rule.condition_groups. Rules support both fields — condition_groups is an OR-of-AND grouping that uses the same RuleCondition objects with operator: 'matches'. An unsafe regex placed inside condition_groups will silently fail-open at runtime without any startup warning, which is exactly the audit finding #3 scenario this code intends to prevent.
// packages/sdk/src/core/veto.ts
const conditions = rule.conditions;
if (!Array.isArray(conditions)) continue; // ← condition_groups never checked# packages/sdk-python/veto/core/veto.py
conditions = rule.get("conditions") or []
# ← condition_groups never checkedFix: after iterating conditions, also iterate rule.condition_groups (a list[list[...]] / RuleCondition[][]) and check each inner condition for operator === 'matches' with an unsafe pattern. A helper that yields all conditions from both fields would avoid duplication.
PR review (capy-ai[bot] on #201): the warn-at-load helper iterated `rule.conditions` (the flat AND list) but ignored `rule.condition_groups` (the OR-of-AND list-of-lists). An unsafe `matches` pattern placed inside condition_groups silently slipped past the startup check and still failed open at runtime — exactly the audit-finding scenario the helper was meant to prevent. Both SDKs now factor a small `_iter_rule_conditions` / `iterRuleConditions` helper that yields every condition from both fields, regardless of which form a given rule uses (or whether it mixes both). The warn helper iterates the unified stream so condition_groups patterns are checked the same way. Tests cover both pure-condition_groups and mixed-shape rules, and explicitly assert exactly one error per offending pattern.
|
Thanks @capy-ai — confirmed the gap, fixed in 15e7720. Both SDKs now factor a tiny |
mypy strict on CI flagged the new generator helper for a missing return annotation. Add Iterator[Any] from typing.
✅ Live integration test — all greenBeyond the unit tests, ran a real-usage harness inside Docker that builds an actual Image: Unit-test counts (post review-comment fixes)
Suggested merge order
Verified via a 3-way merge of all three branches into a temporary working tree — no conflicts, all suites green. |
…202) * fix(parity): align TS expression-DSL `matches` with Python; dedupe length checks Audit follow-up — third of three PRs (after #200 stream-logger, #201 validation hardening; see #196 for the original audit). Cross-SDK regex parity ---------------------- The expression-DSL `matches` operator behaved differently across SDKs: * Python (`evaluate_legacy_condition`) compiles with `re.IGNORECASE`. * TS (`compiler/evaluator.ts:167`) compiled with no flags. Same policy expression therefore returned different decisions across SDKs — a tested-against-Python rule could silently fail in TS prod (or vice versa). The TS condition-evaluator path was already correct (`createSafeRegex(expected, 'i')` at line 523); only the expression compiler was off. Pinned both to **case-insensitive by default**. Opt back into case-sensitive matching with an inline `(?-i)` prefix at the start of the pattern, e.g. `args.s matches '(?-i)Exact'`. JS RegExp doesn't honour PCRE-style inline flags, so the prefix is parsed and stripped before construction. Length-check dedup ------------------ `is_safe_pattern` / `isSafePattern` already enforces the `MAX_PATTERN_LENGTH = 256` cap. The wrappers in `condition_evaluator.py:create_safe_regex` and `condition-evaluator.ts:createSafeRegex` redundantly checked the same cap, so a future bump to one constant could drift the other. Removed the duplicate checks. Tests ----- * `packages/sdk/tests/compiler/regex-parity.test.ts` — 3 cases pinning default insensitivity, the `(?-i)` opt-out, and character-class behaviour under the default flag. * Full suites: TS 1385/1385, Python 328/328. Behavioural notes for downstream -------------------------------- * TS expression-DSL `matches` is now case-insensitive by default. Any TS-only policy that relied on `'Hello' matches 'Hello'` returning false now returns true. Add `(?-i)` to recover the previous behaviour. * chore: add changeset for cross-SDK regex parity
Audit follow-up — second of three PRs (after #200; see #196 for the original audit).
Findings closed
1. Validator
priority=0silently sorted as 100 (Python)_sort_validators(validator.py:305) andnormalize_validator(config.py:160) both usedpriority or 100, evaluating0as falsy. A user setting "run me first, ahead of everything" got the opposite. Replaced with explicitpriority is not Nonechecks. (TS already used??so this is Python-only.)2. Audit-trail mismatch on
decision: 'modify'When a validator returned
modifywithmodified_arguments, the engine forwarded the modified context downstream and the tool ran with the new args, butHistoryTracker.record(...)was called with the originalcall.arguments. Net effect: incident review showed "user said X" while the tool actually executed "Y". Fixed in both SDKs — history now reflects the args the tool actually saw.3. Silent fail-open on rejected
matchesregexWhen
is_safe_patternrejected a pattern (length cap, ReDoS heuristic, etc.) the condition silently evaluated tofalse. Ablockrule meant "stop dangerous commands"; in reality those commands sailed past. Added a load-time pass that emits a louderror-level log per offender so misconfigured rules surface in startup output._warn_about_unsafe_rule_patterns(Python) andwarnAboutUnsafeRulePatterns(TS).4. Budget double-refund silently zeroed
spentTwo refunds of the same reservation dropped tracked spend to 0; subsequent calls then bypassed the limit. Added a token-based API on
BudgetTracker:The interceptor and browser veto switched to the token API. The legacy
reserve(toolName, args): number/refund(amount: number)pair stays for backward compatibility with downstream callers.5.
require_approvalpermanently held the budget reservationThe interceptor only released on
decision === 'deny', so an approval that was later rejected left budget committed forever. Now releases on bothdenyandrequire_approval— the call doesn't actually run; on retry, the caller reserves again.Tests
packages/sdk-python/tests/test_validation_hardening.py— 5 cases (priority=0 ordering, modify→history, unsafe-regex error log, safe-regex no-false-positive, helper-direct).packages/sdk/tests/core/validation-hardening.test.ts— 3 cases (token idempotency, null-reservation tolerance, legacy API preserved).Behavioural notes for downstream
BudgetTracker.reserveCall/releaseReservationare additive APIs. Existingreserve/refund/record/checkkeep their original signatures and behaviour.decision: 'modify'calls now contain the modified args. Anything that asserted onentry.argumentsmatchingcall.argumentsfor these specific calls will need to update.Test plan
(rm.*|wget.*)) and confirm anERRORline appears at startup naming the rule.priority=0(Python) and confirm it runs before validators with priority 5/10.decision: 'modify'; confirmhistory.getAll()shows the modified args.releaseReservation; confirmgetStatus().spentonly drops once.