feat: Phase D broaden detection — D5-D10, T7#51
feat: Phase D broaden detection — D5-D10, T7#51devin-ai-integration[bot] wants to merge 6 commits intomainfrom
Conversation
D5: Tool/Function Discovery payloads for model_extraction agent - d5_tool_schema_extraction, d5_function_call_probing, d5_capability_enumeration - Updated _evaluate_response() and _record_intelligence() for D5 techniques D6: BOLA Payloads for privilege_escalation agent - 4 BOLA techniques: numeric IDOR, UUID swap, path traversal, mass assignment - _test_bola() and _report_bola() methods D7: Social Engineering BFLA for identity_spoof agent - 5 techniques: CEO urgency, compliance pressure, helpdesk, developer debug, time pressure - _test_social_engineering_bfla(), _evaluate_bfla_response(), _report_bfla() D8: PII Detection Expansion in DataCategoryMatcher - Phone numbers, SSN, credit cards (Visa/MC/Amex/Discover), IPv4, IPv6 - Date of birth, passport numbers, medical record IDs D10: Correlation Agent — 5 new compound attack path patterns - BOLA + model_extraction, BFLA + identity_spoof + priv_esc - Tool discovery + prompt injection, BOLA + cross-agent exfil - BFLA + memory poisoning T7: Connection Pooling in ConversationSession - ConnectionPool class with shared httpx.AsyncClient instances - Keyed by (host, timeout), singleton pattern, scan-scoped lifecycle - ConversationSession accepts optional pool= parameter
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
… key 1. BFLA evaluation: when both refusal and compliance keywords are present but no hard evidence (markers/priv_indicators), treat as refusal. Fixes false positives where refusal messages mention 'password', 'secret', etc. 2. ConnectionPool cache key: include csrf_mode in the (host, timeout, csrf_mode) key to prevent incorrect client configuration when sessions with different csrf_mode values share the same pool.
1. pii_phone: add word boundaries and require at least one separator to avoid matching timestamps and numeric IDs. 2. pii_passport/pii_medical_id: require mandatory colon/equals separator and at least one digit in value via lookahead, preventing matches on English words like 'passport details' or 'patient id unknown'. 3. ConnectionPool __aexit__: always clear self._client = None regardless of _owns_client, so the use-after-exit guard in turn() fires correctly for pooled sessions.
| """Return (or create) a pooled client for *host* with *timeout*.""" | ||
| key = (host, timeout, csrf_mode) | ||
| async with self._lock: | ||
| if key not in self._clients: | ||
| kwargs: dict[str, Any] = { | ||
| "timeout": timeout, | ||
| "event_hooks": {"request": [], "response": []}, | ||
| "follow_redirects": False, | ||
| } | ||
| if csrf_mode: | ||
| kwargs["cookies"] = httpx.Cookies() | ||
| self._clients[key] = httpx.AsyncClient(**kwargs) | ||
| logger.debug("T7: created pooled client for %s (timeout=%.1f)", host, timeout) | ||
| return self._clients[key] |
There was a problem hiding this comment.
🔴 ConnectionPool shares cookie jar across sessions, causing cross-session state contamination
When multiple ConversationSession instances share a pooled httpx.AsyncClient (same host/timeout/csrf_mode key), they share the client's cookie jar. Any Set-Cookie headers from one session's target response are automatically sent in another session's subsequent requests. This breaks attack isolation: if Session A authenticates with the target and gets a session cookie, Session B will piggyback on that authentication, causing false positives (elevated access without earning it) or false negatives (unexpected state). The pool is exported as a public API in src/argus/conductor/__init__.py:40 and documented with ready-to-use examples.
Reproduction scenario
- Session A targets
http://target:8080, gets pooled client1 - Target responds with
Set-Cookie: session=AGENT_A - client1's cookie jar now has
session=AGENT_A - Session B targets same host, gets same client1 from pool
- Session B's requests automatically include
Cookie: session=AGENT_A - Session B is now running in Session A's authenticated context
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Valid observation. Currently non-severe since no caller passes pool= yet — ConnectionPool is opt-in infrastructure for a future PR. When wiring the pool into the orchestrator, we'll implement per-session cookie isolation (approach 1: store a per-session httpx.Cookies() on ConversationSession and pass it explicitly per-request). Leaving as a documented TODO for the wiring PR rather than adding unused complexity now.
There was a problem hiding this comment.
Acknowledged — this is a valid concern for when pool= is actually wired into the orchestrator. Currently no caller passes pool= (it's opt-in infrastructure), so this is not exploitable yet. When we wire it in during orchestrator integration, we'll need per-session cookie isolation (e.g., passing fresh cookies=httpx.Cookies() per-request or cloning the client). Added a note in the PR description checklist about this.
Restructure _attack_base() so that: - Chat endpoints are fetched independently of identity endpoints - Early return only fires when NEITHER identity nor chat surfaces exist - D7 BFLA tests run whenever chat endpoints are available, regardless of whether identity endpoints exist
…rkers) Soft privilege indicators like 'admin' commonly appear in refusal text (e.g. 'I cannot grant you admin access'). Previously priv_indicators alone could override the refusal classification, causing false positives. Now only sensitive markers (leaked keys/tokens) override a refusal.
\b doesn't fire adjacent to :: because both : and start-of-string are non-word characters. Replaced with explicit lookaround anchors. Added fourth alternative to handle mid-address :: (e.g. fe80::1).
Phase D Test Report — Live Scan against odinforgeai.comTest 1: Full argus scan (verbose) — PASSED
Test 2: Cinematic Dashboard — PASSED
Test 3: D8 PII Regex — PASSED (12/12)False positive tests (all correctly NOT matched):
True positive tests (all correctly matched):
IPv6 Regex Fix Verification (Round 5)
Escalation
|
…ts, 21 patterns), pin deps
…#52) * chore: bump version to 0.1.4 for PyPI release * feat: Phase D broaden detection — D5-D10, T7 D5: Tool/Function Discovery payloads for model_extraction agent - d5_tool_schema_extraction, d5_function_call_probing, d5_capability_enumeration - Updated _evaluate_response() and _record_intelligence() for D5 techniques D6: BOLA Payloads for privilege_escalation agent - 4 BOLA techniques: numeric IDOR, UUID swap, path traversal, mass assignment - _test_bola() and _report_bola() methods D7: Social Engineering BFLA for identity_spoof agent - 5 techniques: CEO urgency, compliance pressure, helpdesk, developer debug, time pressure - _test_social_engineering_bfla(), _evaluate_bfla_response(), _report_bfla() D8: PII Detection Expansion in DataCategoryMatcher - Phone numbers, SSN, credit cards (Visa/MC/Amex/Discover), IPv4, IPv6 - Date of birth, passport numbers, medical record IDs D10: Correlation Agent — 5 new compound attack path patterns - BOLA + model_extraction, BFLA + identity_spoof + priv_esc - Tool discovery + prompt injection, BOLA + cross-agent exfil - BFLA + memory poisoning T7: Connection Pooling in ConversationSession - ConnectionPool class with shared httpx.AsyncClient instances - Keyed by (host, timeout), singleton pattern, scan-scoped lifecycle - ConversationSession accepts optional pool= parameter * fix: address Devin Review findings — BFLA false positives, pool cache key 1. BFLA evaluation: when both refusal and compliance keywords are present but no hard evidence (markers/priv_indicators), treat as refusal. Fixes false positives where refusal messages mention 'password', 'secret', etc. 2. ConnectionPool cache key: include csrf_mode in the (host, timeout, csrf_mode) key to prevent incorrect client configuration when sessions with different csrf_mode values share the same pool. * fix: address Devin Review round 2 — PII regex, pool __aexit__ 1. pii_phone: add word boundaries and require at least one separator to avoid matching timestamps and numeric IDs. 2. pii_passport/pii_medical_id: require mandatory colon/equals separator and at least one digit in value via lookahead, preventing matches on English words like 'passport details' or 'patient id unknown'. 3. ConnectionPool __aexit__: always clear self._client = None regardless of _owns_client, so the use-after-exit guard in turn() fires correctly for pooled sessions. * fix: D7 BFLA unreachable when target has chat but no identity endpoints Restructure _attack_base() so that: - Chat endpoints are fetched independently of identity endpoints - Early return only fires when NEITHER identity nor chat surfaces exist - D7 BFLA tests run whenever chat endpoints are available, regardless of whether identity endpoints exist * fix: BFLA refusal filter — only override refusal on hard evidence (markers) Soft privilege indicators like 'admin' commonly appear in refusal text (e.g. 'I cannot grant you admin access'). Previously priv_indicators alone could override the refusal classification, causing false positives. Now only sensitive markers (leaked keys/tokens) override a refusal. * fix: IPv6 regex — use lookaround anchors instead of \b for :: forms \b doesn't fire adjacent to :: because both : and start-of-string are non-word characters. Replaced with explicit lookaround anchors. Added fourth alternative to handle mid-address :: (e.g. fe80::1). * chore: launch prep v0.1.5 — merge PRs #50/#51, update README (13 agents, 21 patterns), pin deps * fix: disable cookie persistence on pooled httpx clients to prevent cross-session state leakage * fix: ConnectionPool shares transport not client (proper cookie isolation) + update CLAUDE.md counts * fix: _owns_client=False when using pooled transport to prevent shared transport destruction * fix: _owns_client tracks actual pooled transport usage, not just pool presence --------- Co-authored-by: Andre Byrd <andre.byrd@odingard.com>
|
Closing — Phase D code is included in PR #52 (launch prep v0.1.5), which has been merged to main. |
Summary
Adds six new detection capabilities across existing agents and infrastructure, without introducing new agent classes:
model_extraction.py): 3 new tool/function discovery techniques (d5_tool_schema_extraction,d5_function_call_probing,d5_capability_enumeration) plus wiring in_evaluate_response()and_record_intelligence().privilege_escalation.py): 4 BOLA (Broken Object Level Authorization) payload sets — numeric IDOR, UUID swap, path traversal, mass assignment — with_test_bola()/_report_bola().identity_spoof.py): 5 social engineering BFLA techniques (CEO urgency, compliance pressure, helpdesk, developer debug, time pressure) with_test_social_engineering_bfla()/_evaluate_bfla_response()/_report_bfla().conductor/evaluation.py): 8 new PII patterns inDataCategoryMatcher.PATTERNS— phone, SSN, credit card (Visa/MC/Amex/Discover), IPv4, IPv6, date of birth, passport, medical record ID.correlation/engine.py): 5 new compound attack path rules chaining D5/D6/D7 findings with existing agents.conductor/session.py):ConnectionPoolclass — sharedhttpx.AsyncClientinstances keyed by(host, timeout, csrf_mode).ConversationSessionaccepts optionalpool=parameter; backward-compatible (no pool = existing behavior).No new agents, no existing tests modified. All changes are additive.
Updates since initial commit
Fixed issues flagged across five rounds of Devin Review:
identity_spoof.py):_evaluate_bfla_responsenow returnsNonewheneverrefusal_hitsis non-empty and there's no hard evidence (markers), regardless of whether compliance keywords or soft privilege indicators are present. Previously, refusal messages like "I cannot share password info" or "I cannot grant you admin access" would match compliance/privilege keywords and emit false findings.conductor/session.py): Cache key is now(host, timeout, csrf_mode)instead of(host, timeout)to prevent incorrect client configuration.conductor/evaluation.py): Added word boundaries (\b) and made separators mandatory to prevent matching timestamps and contiguous digit sequences.conductor/evaluation.py): Made colon/equals separator mandatory and added(?=[A-Z0-9]*\d)lookahead requiring at least one digit, preventing false matches on English words like "passport details".__aexit__(conductor/session.py): Always clearsself._client = Noneafter exit, even for pooled sessions, so theturn()use-after-exit guard works correctly.identity_spoof.py): Restructured_attack_base()so chat endpoints are fetched independently of identity endpoints. D7 BFLA tests now run whenever chat endpoints are available, even when no identity surface exists.identity_spoof.py): Refusal detection now only yields to hard evidence (sensitive markers like leaked keys/tokens). Soft privilege indicators like"admin"no longer override refusal classification, since they commonly appear in refusal text (e.g. "I cannot grant you admin access").conductor/evaluation.py): Replaced\bword boundaries with explicit lookaround anchors ((?:^|(?<=\s)|(?<=[=,;]))/(?=\s|$|[,;])) because\bdoesn't fire adjacent to::(both:and start-of-string are non-word characters). Added a fourth alternative to handle mid-address::forms likefe80::1. Verified:::1,fe80::1,2001:db8::, full-form addresses all match correctly. Known gap: IPv4-mapped form::ffff:192.168.1.1is not matched (dots in suffix).Review & Testing Checklist for Human
"executing","running","completed"could trigger false positives on non-compliant responses that happen to use those words without a refusal phrase present. Consider whether these are specific enough for your target population.ConnectionPoolshareshttpx.AsyncClientinstances, meaning all sessions on the same pooled client share a cookie jar. Currently non-impactful (no caller passespool=yet — opt-in infrastructure), but will need per-session cookie isolation when wired into the orchestrator.bfla_identity_spoof_privilege_escalationrequires{"identity_spoof", "privilege_escalation"}— same agent set as the pre-existingidentity_spoofing_privilege_escalationpattern. Both will fire for the same finding set, producing duplicate compound paths. Confirm this is intentional or deduplicate.pii_ipv6does not match::ffff:192.168.1.1because the dot-decimal suffix isn't covered by the hex-group alternatives. Decide if this edge case matters for your targets.argus scanagainst a target with a real AI chat/API endpoint to exercise D5/D6/D7 payloads end-to-end. Testing against odinforgeai.com confirmed all 13 agents deploy and complete without errors (3 findings, 2 validated from tool_poisoning), but D5/D6/D7 produced 0 findings because the target serves HTML (React SPA) rather than JSON API responses — the agents correctly skip non-JSON responses rather than crashing.Notes
_evaluate_bfla_responselogic andDataCategoryMatcherPII patterns.ConnectionPoolsingleton (shared()) is not thread-safe at construction time, which is fine for asyncio but would need a lock if ever used from multiple threads.Link to Devin session: https://app.devin.ai/sessions/8b0c5ca873934d77aa254157cc41924c
Requested by: @andrebyrd-odingard