Skip to content

feat: Phase D broaden detection — D5-D10, T7#51

Open
devin-ai-integration[bot] wants to merge 6 commits intomainfrom
devin/1776227588-phase-d-broaden-detection
Open

feat: Phase D broaden detection — D5-D10, T7#51
devin-ai-integration[bot] wants to merge 6 commits intomainfrom
devin/1776227588-phase-d-broaden-detection

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 15, 2026

Summary

Adds six new detection capabilities across existing agents and infrastructure, without introducing new agent classes:

  • D5 (model_extraction.py): 3 new tool/function discovery techniques (d5_tool_schema_extraction, d5_function_call_probing, d5_capability_enumeration) plus wiring in _evaluate_response() and _record_intelligence().
  • D6 (privilege_escalation.py): 4 BOLA (Broken Object Level Authorization) payload sets — numeric IDOR, UUID swap, path traversal, mass assignment — with _test_bola() / _report_bola().
  • D7 (identity_spoof.py): 5 social engineering BFLA techniques (CEO urgency, compliance pressure, helpdesk, developer debug, time pressure) with _test_social_engineering_bfla() / _evaluate_bfla_response() / _report_bfla().
  • D8 (conductor/evaluation.py): 8 new PII patterns in DataCategoryMatcher.PATTERNS — phone, SSN, credit card (Visa/MC/Amex/Discover), IPv4, IPv6, date of birth, passport, medical record ID.
  • D10 (correlation/engine.py): 5 new compound attack path rules chaining D5/D6/D7 findings with existing agents.
  • T7 (conductor/session.py): ConnectionPool class — shared httpx.AsyncClient instances keyed by (host, timeout, csrf_mode). ConversationSession accepts optional pool= parameter; backward-compatible (no pool = existing behavior).

No new agents, no existing tests modified. All changes are additive.

Updates since initial commit

Fixed issues flagged across five rounds of Devin Review:

  1. BFLA false-positive fix (identity_spoof.py): _evaluate_bfla_response now returns None whenever refusal_hits is non-empty and there's no hard evidence (markers), regardless of whether compliance keywords or soft privilege indicators are present. Previously, refusal messages like "I cannot share password info" or "I cannot grant you admin access" would match compliance/privilege keywords and emit false findings.
  2. ConnectionPool cache key (conductor/session.py): Cache key is now (host, timeout, csrf_mode) instead of (host, timeout) to prevent incorrect client configuration.
  3. PII phone regex (conductor/evaluation.py): Added word boundaries (\b) and made separators mandatory to prevent matching timestamps and contiguous digit sequences.
  4. PII passport/medical_id regex (conductor/evaluation.py): Made colon/equals separator mandatory and added (?=[A-Z0-9]*\d) lookahead requiring at least one digit, preventing false matches on English words like "passport details".
  5. Pooled session __aexit__ (conductor/session.py): Always clears self._client = None after exit, even for pooled sessions, so the turn() use-after-exit guard works correctly.
  6. D7 BFLA unreachable code (identity_spoof.py): Restructured _attack_base() so chat endpoints are fetched independently of identity endpoints. D7 BFLA tests now run whenever chat endpoints are available, even when no identity surface exists.
  7. BFLA refusal filter hardened (identity_spoof.py): Refusal detection now only yields to hard evidence (sensitive markers like leaked keys/tokens). Soft privilege indicators like "admin" no longer override refusal classification, since they commonly appear in refusal text (e.g. "I cannot grant you admin access").
  8. IPv6 regex fix (conductor/evaluation.py): Replaced \b word boundaries with explicit lookaround anchors ((?:^|(?<=\s)|(?<=[=,;])) / (?=\s|$|[,;])) because \b doesn't fire adjacent to :: (both : and start-of-string are non-word characters). Added a fourth alternative to handle mid-address :: forms like fe80::1. Verified: ::1, fe80::1, 2001:db8::, full-form addresses all match correctly. Known gap: IPv4-mapped form ::ffff:192.168.1.1 is not matched (dots in suffix).

Review & Testing Checklist for Human

  • D7 BFLA compliance keyword breadth: Generic compliance keywords like "executing", "running", "completed" could trigger false positives on non-compliant responses that happen to use those words without a refusal phrase present. Consider whether these are specific enough for your target population.
  • T7 cookie isolation: ConnectionPool shares httpx.AsyncClient instances, meaning all sessions on the same pooled client share a cookie jar. Currently non-impactful (no caller passes pool= yet — opt-in infrastructure), but will need per-session cookie isolation when wired into the orchestrator.
  • D10 duplicate compound patterns: bfla_identity_spoof_privilege_escalation requires {"identity_spoof", "privilege_escalation"} — same agent set as the pre-existing identity_spoofing_privilege_escalation pattern. Both will fire for the same finding set, producing duplicate compound paths. Confirm this is intentional or deduplicate.
  • IPv6 mapped-IPv4 gap: pii_ipv6 does not match ::ffff:192.168.1.1 because the dot-decimal suffix isn't covered by the hex-group alternatives. Decide if this edge case matters for your targets.
  • Run argus scan against a target with a real AI chat/API endpoint to exercise D5/D6/D7 payloads end-to-end. Testing against odinforgeai.com confirmed all 13 agents deploy and complete without errors (3 findings, 2 validated from tool_poisoning), but D5/D6/D7 produced 0 findings because the target serves HTML (React SPA) rather than JSON API responses — the agents correctly skip non-JSON responses rather than crashing.

Notes

Link to Devin session: https://app.devin.ai/sessions/8b0c5ca873934d77aa254157cc41924c
Requested by: @andrebyrd-odingard


Open with Devin

D5: Tool/Function Discovery payloads for model_extraction agent
  - d5_tool_schema_extraction, d5_function_call_probing, d5_capability_enumeration
  - Updated _evaluate_response() and _record_intelligence() for D5 techniques

D6: BOLA Payloads for privilege_escalation agent
  - 4 BOLA techniques: numeric IDOR, UUID swap, path traversal, mass assignment
  - _test_bola() and _report_bola() methods

D7: Social Engineering BFLA for identity_spoof agent
  - 5 techniques: CEO urgency, compliance pressure, helpdesk, developer debug, time pressure
  - _test_social_engineering_bfla(), _evaluate_bfla_response(), _report_bfla()

D8: PII Detection Expansion in DataCategoryMatcher
  - Phone numbers, SSN, credit cards (Visa/MC/Amex/Discover), IPv4, IPv6
  - Date of birth, passport numbers, medical record IDs

D10: Correlation Agent — 5 new compound attack path patterns
  - BOLA + model_extraction, BFLA + identity_spoof + priv_esc
  - Tool discovery + prompt injection, BOLA + cross-agent exfil
  - BFLA + memory poisoning

T7: Connection Pooling in ConversationSession
  - ConnectionPool class with shared httpx.AsyncClient instances
  - Keyed by (host, timeout), singleton pattern, scan-scoped lifecycle
  - ConversationSession accepts optional pool= parameter
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

… key

1. BFLA evaluation: when both refusal and compliance keywords are present
   but no hard evidence (markers/priv_indicators), treat as refusal. Fixes
   false positives where refusal messages mention 'password', 'secret', etc.

2. ConnectionPool cache key: include csrf_mode in the (host, timeout, csrf_mode)
   key to prevent incorrect client configuration when sessions with different
   csrf_mode values share the same pool.
devin-ai-integration[bot]

This comment was marked as resolved.

1. pii_phone: add word boundaries and require at least one separator
   to avoid matching timestamps and numeric IDs.

2. pii_passport/pii_medical_id: require mandatory colon/equals separator
   and at least one digit in value via lookahead, preventing matches on
   English words like 'passport details' or 'patient id unknown'.

3. ConnectionPool __aexit__: always clear self._client = None regardless
   of _owns_client, so the use-after-exit guard in turn() fires correctly
   for pooled sessions.
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 15 additional findings in Devin Review.

Open in Devin Review

Comment thread src/argus/agents/identity_spoof.py
Comment on lines +88 to +101
"""Return (or create) a pooled client for *host* with *timeout*."""
key = (host, timeout, csrf_mode)
async with self._lock:
if key not in self._clients:
kwargs: dict[str, Any] = {
"timeout": timeout,
"event_hooks": {"request": [], "response": []},
"follow_redirects": False,
}
if csrf_mode:
kwargs["cookies"] = httpx.Cookies()
self._clients[key] = httpx.AsyncClient(**kwargs)
logger.debug("T7: created pooled client for %s (timeout=%.1f)", host, timeout)
return self._clients[key]
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ConnectionPool shares cookie jar across sessions, causing cross-session state contamination

When multiple ConversationSession instances share a pooled httpx.AsyncClient (same host/timeout/csrf_mode key), they share the client's cookie jar. Any Set-Cookie headers from one session's target response are automatically sent in another session's subsequent requests. This breaks attack isolation: if Session A authenticates with the target and gets a session cookie, Session B will piggyback on that authentication, causing false positives (elevated access without earning it) or false negatives (unexpected state). The pool is exported as a public API in src/argus/conductor/__init__.py:40 and documented with ready-to-use examples.

Reproduction scenario
  1. Session A targets http://target:8080, gets pooled client1
  2. Target responds with Set-Cookie: session=AGENT_A
  3. client1's cookie jar now has session=AGENT_A
  4. Session B targets same host, gets same client1 from pool
  5. Session B's requests automatically include Cookie: session=AGENT_A
  6. Session B is now running in Session A's authenticated context
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation. Currently non-severe since no caller passes pool= yet — ConnectionPool is opt-in infrastructure for a future PR. When wiring the pool into the orchestrator, we'll implement per-session cookie isolation (approach 1: store a per-session httpx.Cookies() on ConversationSession and pass it explicitly per-request). Leaving as a documented TODO for the wiring PR rather than adding unused complexity now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is a valid concern for when pool= is actually wired into the orchestrator. Currently no caller passes pool= (it's opt-in infrastructure), so this is not exploitable yet. When we wire it in during orchestrator integration, we'll need per-session cookie isolation (e.g., passing fresh cookies=httpx.Cookies() per-request or cloning the client). Added a note in the PR description checklist about this.

Restructure _attack_base() so that:
- Chat endpoints are fetched independently of identity endpoints
- Early return only fires when NEITHER identity nor chat surfaces exist
- D7 BFLA tests run whenever chat endpoints are available, regardless
  of whether identity endpoints exist
devin-ai-integration[bot]

This comment was marked as resolved.

…rkers)

Soft privilege indicators like 'admin' commonly appear in refusal text
(e.g. 'I cannot grant you admin access'). Previously priv_indicators
alone could override the refusal classification, causing false positives.
Now only sensitive markers (leaked keys/tokens) override a refusal.
devin-ai-integration[bot]

This comment was marked as resolved.

\b doesn't fire adjacent to :: because both : and start-of-string are
non-word characters. Replaced with explicit lookaround anchors. Added
fourth alternative to handle mid-address :: (e.g. fe80::1).
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Phase D Test Report — Live Scan against odinforgeai.com

Devin session

Test 1: Full argus scan (verbose) — PASSED
  • All 13 agents deployed and completed — no timeouts, no crashes
  • 3 findings from tool_poisoning, 2 validated (CRITICAL: schema_enum_injection, schema_description_override)
  • Scan JSON output valid: 3 findings, 13 agent_results, 0 compound_paths
  • No Python tracebacks
  • Duration: 115.8s
Test 2: Cinematic Dashboard — PASSED
  • Dashboard rendered correctly with agent grid and attack stream
  • Agent status transitions visible (deployed → complete)
  • Final summary: "3 findings · 2 validated · 129 signals"
  • Completed in 72s without crash
Test 3: D8 PII Regex — PASSED (12/12)

False positive tests (all correctly NOT matched):

  • pii_phone vs timestamp "1713158400" ✓
  • pii_phone vs numeric ID "9876543210" ✓
  • pii_passport vs "passport details" ✓
  • pii_passport vs "passport information" ✓
  • pii_medical_id vs "patient id unknown" ✓
  • pii_medical_id vs "medical record required" ✓

True positive tests (all correctly matched):

  • pii_phone vs "(555) 123-4567" ✓
  • pii_phone vs "555-123-4567" ✓
  • pii_ssn vs "123-45-6789" ✓
  • pii_passport vs "passport: AB123456" ✓
  • pii_medical_id vs "patient id: MRN12345" ✓
  • pii_credit_card vs Visa "4111-1111-1111-1111" ✓
IPv6 Regex Fix Verification (Round 5)
  • ::1 (loopback) — PASSED
  • 2001:db8:: — PASSED
  • Full form 2001:0db8:85a3:0000:0000:8a2e:0370:7334 — PASSED
  • fe80::1 (link-local) — PASSED
  • ::ffff:192.168.1.1 (mapped IPv4) — NOT MATCHED (edge case)
  • Plain text — correctly not matched

Escalation

  • Phase D agents (D5/D6/D7) produced 0 findings against this target because odinforgeai.com serves HTML (React SPA) for all paths, not JSON API responses. Agents correctly detect non-JSON and gracefully skip. A target with a real AI chat endpoint returning JSON would fully exercise D5/D6/D7 payloads.

devin-ai-integration bot pushed a commit that referenced this pull request Apr 15, 2026
andrebyrd-odingard added a commit that referenced this pull request Apr 15, 2026
…#52)

* chore: bump version to 0.1.4 for PyPI release

* feat: Phase D broaden detection — D5-D10, T7

D5: Tool/Function Discovery payloads for model_extraction agent
  - d5_tool_schema_extraction, d5_function_call_probing, d5_capability_enumeration
  - Updated _evaluate_response() and _record_intelligence() for D5 techniques

D6: BOLA Payloads for privilege_escalation agent
  - 4 BOLA techniques: numeric IDOR, UUID swap, path traversal, mass assignment
  - _test_bola() and _report_bola() methods

D7: Social Engineering BFLA for identity_spoof agent
  - 5 techniques: CEO urgency, compliance pressure, helpdesk, developer debug, time pressure
  - _test_social_engineering_bfla(), _evaluate_bfla_response(), _report_bfla()

D8: PII Detection Expansion in DataCategoryMatcher
  - Phone numbers, SSN, credit cards (Visa/MC/Amex/Discover), IPv4, IPv6
  - Date of birth, passport numbers, medical record IDs

D10: Correlation Agent — 5 new compound attack path patterns
  - BOLA + model_extraction, BFLA + identity_spoof + priv_esc
  - Tool discovery + prompt injection, BOLA + cross-agent exfil
  - BFLA + memory poisoning

T7: Connection Pooling in ConversationSession
  - ConnectionPool class with shared httpx.AsyncClient instances
  - Keyed by (host, timeout), singleton pattern, scan-scoped lifecycle
  - ConversationSession accepts optional pool= parameter

* fix: address Devin Review findings — BFLA false positives, pool cache key

1. BFLA evaluation: when both refusal and compliance keywords are present
   but no hard evidence (markers/priv_indicators), treat as refusal. Fixes
   false positives where refusal messages mention 'password', 'secret', etc.

2. ConnectionPool cache key: include csrf_mode in the (host, timeout, csrf_mode)
   key to prevent incorrect client configuration when sessions with different
   csrf_mode values share the same pool.

* fix: address Devin Review round 2 — PII regex, pool __aexit__

1. pii_phone: add word boundaries and require at least one separator
   to avoid matching timestamps and numeric IDs.

2. pii_passport/pii_medical_id: require mandatory colon/equals separator
   and at least one digit in value via lookahead, preventing matches on
   English words like 'passport details' or 'patient id unknown'.

3. ConnectionPool __aexit__: always clear self._client = None regardless
   of _owns_client, so the use-after-exit guard in turn() fires correctly
   for pooled sessions.

* fix: D7 BFLA unreachable when target has chat but no identity endpoints

Restructure _attack_base() so that:
- Chat endpoints are fetched independently of identity endpoints
- Early return only fires when NEITHER identity nor chat surfaces exist
- D7 BFLA tests run whenever chat endpoints are available, regardless
  of whether identity endpoints exist

* fix: BFLA refusal filter — only override refusal on hard evidence (markers)

Soft privilege indicators like 'admin' commonly appear in refusal text
(e.g. 'I cannot grant you admin access'). Previously priv_indicators
alone could override the refusal classification, causing false positives.
Now only sensitive markers (leaked keys/tokens) override a refusal.

* fix: IPv6 regex — use lookaround anchors instead of \b for :: forms

\b doesn't fire adjacent to :: because both : and start-of-string are
non-word characters. Replaced with explicit lookaround anchors. Added
fourth alternative to handle mid-address :: (e.g. fe80::1).

* chore: launch prep v0.1.5 — merge PRs #50/#51, update README (13 agents, 21 patterns), pin deps

* fix: disable cookie persistence on pooled httpx clients to prevent cross-session state leakage

* fix: ConnectionPool shares transport not client (proper cookie isolation) + update CLAUDE.md counts

* fix: _owns_client=False when using pooled transport to prevent shared transport destruction

* fix: _owns_client tracks actual pooled transport usage, not just pool presence

---------

Co-authored-by: Andre Byrd <andre.byrd@odingard.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Closing — Phase D code is included in PR #52 (launch prep v0.1.5), which has been merged to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant