Skip to content

feat: close 3 MAAC coverage gaps — model routing, task hijacking, OAuth pivoting#56

Merged
andrebyrd-odingard merged 4 commits intomainfrom
devin/1776385180-maac-gap-closure
Apr 17, 2026
Merged

feat: close 3 MAAC coverage gaps — model routing, task hijacking, OAuth pivoting#56
andrebyrd-odingard merged 4 commits intomainfrom
devin/1776385180-maac-gap-closure

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 17, 2026

Summary

Adds new attack techniques to 3 existing agents to close the remaining MAAC (Mythos-Aligned Attack Chain) coverage gaps, moving from 6/9 phases fully covered to 8/9 full + 1 strong partial (~95%).

Gap 1 — Model Routing Exploitation (MAAC 320.3) → model_extraction.py
New Phase 4 with 4 probe techniques that detect multi-model deployments by comparing response patterns across identical/varying prompts. Uses difflib.SequenceMatcher similarity ratios and model-name regex extraction to identify routing shifts.

Gap 2 — Task Decomposition Hijacking (MAAC 320.6) → persona_hijacking.py
4 new entries in the existing _DRIFT_ATTACKS list: task_plan_injection, workflow_prerequisite_hijack, subtask_proliferation, goal_priority_inversion. Multi-turn conversations that inject adversary sub-goals into the agent's task planning.

Gap 3 — OAuth/Session Pivoting (MAAC 320.8) → privilege_escalation.py
New Phase 6 with 6 probe categories (token storage, session cookies, credential files, OAuth redirect manipulation, cloud metadata endpoints, session hijacking). Detection uses 10 regex patterns for real credential artifacts (JWTs, AWS temp creds, GCP tokens, bearer tokens, etc.). Findings are emitted as CRITICAL with token values redacted.

All 352 existing tests pass. Lint clean (ruff + ruff-format).

Review & Testing Checklist for Human

  • False positive risk in OAuth detection (privilege_escalation.py:1505-1515): The regex patterns (e.g. credential_file, private_key_path) will match if the target mentions these paths in a refusal like "I cannot read ~/.aws/credentials". There is no refusal gate before the regex check — found_tokens triggers a CRITICAL finding even if the response is a refusal. Verify this is acceptable or add a refusal pre-check.
  • Model routing similarity threshold of 0.25 (model_extraction.py ~line 870): Very low threshold — same-model responses with high temperature could produce different outputs with similarity < 0.25. May cause false positives on non-deterministic targets. Consider whether this needs tuning or a configurable threshold.
  • No new unit tests: 577 lines of new attack logic with no dedicated tests for _check_routing_consistency, _check_routing_variation, _test_oauth_session, or the 4 new drift attacks. Recommend running a scan against a real target (e.g. Arena medical agent) and verifying each new phase executes and evaluates correctly.
  • Inline import re as _re inside _test_oauth_session (line 1479) — re is already imported at module level. Harmless but inconsistent; should use the existing module-level import.

Notes

  • The existing credential harvesting phase in model_extraction.py was renumbered from Phase 3 → Phase 5 to accommodate the new Phase 4 (routing). Verify no external references to "Phase 3" break.
  • The session_id in OAuth probes is hardcoded to a single value across all messages within a probe, which means multi-turn context won't accumulate if the target is session-aware. This is intentional (each message is an independent probe), but worth knowing.

Link to Devin session: https://app.devin.ai/sessions/8b0c5ca873934d77aa254157cc41924c
Requested by: @andrebyrd-odingard


Open with Devin

…th pivoting

MAAC 320.3 — Model Routing Exploitation (ME-10):
- Added _ROUTING_PROBES with 4 techniques (consistency check, complexity
  escalation, model identity, capability probe)
- Implemented _test_model_routing(), _check_routing_consistency(),
  _check_routing_variation(), _report_routing()
- Wired into _attack_base() as Phase 4
- Detects multi-model deployments via response similarity and identity
  contradictions

MAAC 320.6 — Task Decomposition Hijacking (PH-11):
- Added 4 new drift attacks to _DRIFT_ATTACKS list
- Techniques: task_plan_injection, workflow_prerequisite_hijack,
  subtask_proliferation, goal_priority_inversion
- Each uses multi-turn conversation to inject adversary sub-goals
  into legitimate task workflows

MAAC 320.8 — OAuth/Session Pivoting (PE-07):
- Added _OAUTH_SESSION_PROBES with 6 techniques (token storage,
  session cookies, credential files, OAuth redirect, cloud metadata,
  session hijacking)
- Implemented _test_oauth_session() with 10 credential/token regex
  patterns for real exposure detection
- Implemented _report_oauth_session() with token value redaction
- Wired into _attack_base() as Phase 6
- Findings emitted as CRITICAL with full attack chain

Security: Mark
Architecture: James
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

…uting contradiction logic

1. privilege_escalation.py: Removed credential_file and private_key_path
   from _OAUTH_SESSION_PATTERNS — these match bare filenames in refusal
   responses (e.g. 'I cannot access ~/.aws/credentials'), producing false
   positive CRITICAL findings. Only patterns matching actual credential
   VALUES are retained (8 patterns).

2. model_extraction.py: Fixed _check_routing_variation to compare model
   sets ACROSS responses instead of unioning all into one set. A single
   response mentioning multiple models ('I am Claude, not GPT-4') no
   longer triggers a false positive. Now requires at least two responses
   with disjoint model sets to confirm routing contradiction.

QA/QC: Jamie
devin-ai-integration[bot]

This comment was marked as resolved.

@andrebyrd-odingard andrebyrd-odingard merged commit 3056cba into main Apr 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant