feat: close 3 MAAC coverage gaps — model routing, task hijacking, OAuth pivoting#56
Merged
andrebyrd-odingard merged 4 commits intomainfrom Apr 17, 2026
Merged
Conversation
…th pivoting MAAC 320.3 — Model Routing Exploitation (ME-10): - Added _ROUTING_PROBES with 4 techniques (consistency check, complexity escalation, model identity, capability probe) - Implemented _test_model_routing(), _check_routing_consistency(), _check_routing_variation(), _report_routing() - Wired into _attack_base() as Phase 4 - Detects multi-model deployments via response similarity and identity contradictions MAAC 320.6 — Task Decomposition Hijacking (PH-11): - Added 4 new drift attacks to _DRIFT_ATTACKS list - Techniques: task_plan_injection, workflow_prerequisite_hijack, subtask_proliferation, goal_priority_inversion - Each uses multi-turn conversation to inject adversary sub-goals into legitimate task workflows MAAC 320.8 — OAuth/Session Pivoting (PE-07): - Added _OAUTH_SESSION_PROBES with 6 techniques (token storage, session cookies, credential files, OAuth redirect, cloud metadata, session hijacking) - Implemented _test_oauth_session() with 10 credential/token regex patterns for real exposure detection - Implemented _report_oauth_session() with token value redaction - Wired into _attack_base() as Phase 6 - Findings emitted as CRITICAL with full attack chain Security: Mark Architecture: James
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…uting contradiction logic
1. privilege_escalation.py: Removed credential_file and private_key_path
from _OAUTH_SESSION_PATTERNS — these match bare filenames in refusal
responses (e.g. 'I cannot access ~/.aws/credentials'), producing false
positive CRITICAL findings. Only patterns matching actual credential
VALUES are retained (8 patterns).
2. model_extraction.py: Fixed _check_routing_variation to compare model
sets ACROSS responses instead of unioning all into one set. A single
response mentioning multiple models ('I am Claude, not GPT-4') no
longer triggers a false positive. Now requires at least two responses
with disjoint model sets to confirm routing contradiction.
QA/QC: Jamie
QA/QC: Jamie
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds new attack techniques to 3 existing agents to close the remaining MAAC (Mythos-Aligned Attack Chain) coverage gaps, moving from 6/9 phases fully covered to 8/9 full + 1 strong partial (~95%).
Gap 1 — Model Routing Exploitation (MAAC 320.3) →
model_extraction.pyNew Phase 4 with 4 probe techniques that detect multi-model deployments by comparing response patterns across identical/varying prompts. Uses
difflib.SequenceMatchersimilarity ratios and model-name regex extraction to identify routing shifts.Gap 2 — Task Decomposition Hijacking (MAAC 320.6) →
persona_hijacking.py4 new entries in the existing
_DRIFT_ATTACKSlist:task_plan_injection,workflow_prerequisite_hijack,subtask_proliferation,goal_priority_inversion. Multi-turn conversations that inject adversary sub-goals into the agent's task planning.Gap 3 — OAuth/Session Pivoting (MAAC 320.8) →
privilege_escalation.pyNew Phase 6 with 6 probe categories (token storage, session cookies, credential files, OAuth redirect manipulation, cloud metadata endpoints, session hijacking). Detection uses 10 regex patterns for real credential artifacts (JWTs, AWS temp creds, GCP tokens, bearer tokens, etc.). Findings are emitted as CRITICAL with token values redacted.
All 352 existing tests pass. Lint clean (ruff + ruff-format).
Review & Testing Checklist for Human
credential_file,private_key_path) will match if the target mentions these paths in a refusal like "I cannot read ~/.aws/credentials". There is no refusal gate before the regex check —found_tokenstriggers a CRITICAL finding even if the response is a refusal. Verify this is acceptable or add a refusal pre-check._check_routing_consistency,_check_routing_variation,_test_oauth_session, or the 4 new drift attacks. Recommend running a scan against a real target (e.g. Arena medical agent) and verifying each new phase executes and evaluates correctly.import re as _reinside_test_oauth_session(line 1479) —reis already imported at module level. Harmless but inconsistent; should use the existing module-level import.Notes
model_extraction.pywas renumbered from Phase 3 → Phase 5 to accommodate the new Phase 4 (routing). Verify no external references to "Phase 3" break.session_idin OAuth probes is hardcoded to a single value across all messages within a probe, which means multi-turn context won't accumulate if the target is session-aware. This is intentional (each message is an independent probe), but worth knowing.Link to Devin session: https://app.devin.ai/sessions/8b0c5ca873934d77aa254157cc41924c
Requested by: @andrebyrd-odingard