Proposal: Three new requirements addressing AI agent security gaps
Problem
The current Manipulation Resistance (MR) domain addresses single-turn prompt injection (MR-001, MR-002, MR-018) and treats the agent runtime as untrusted (MR-023). However, three adjacent threat vectors remain unspecified:
- Multi-turn jailbreak sequences — no single message constitutes an injection, but the cumulative conversational sequence achieves scope override, instruction bypass, or data exfiltration. This is the conversational analogue of TOCTOU attacks.
- Tool parameter abuse and chaining — SC-020 enforces an external tool allowlist, but does not validate the parameters passed to allowed tools or detect sequences of allowed tools that combine to achieve a disallowed outcome (e.g., a scan tool with zero rate limit causing DoS, or recon → credential-extraction → lateral-movement achieving unauthorized pivot).
- Silent model behavioral drift — TP-022 requires re-attestation on material model changes, but provider-side silent updates (inference engine optimizations, safety filter tuning, quantization changes) can shift behavior below the materiality threshold without triggering re-attestation. No canary mechanism exists to detect these shifts before they affect a customer engagement.
Proposed Requirements
APTS-MR-024: Multi-Turn Jailbreak Detection and Response (MUST | Tier 2)
Extends MR to multi-turn interaction patterns. Covers:
- Conversation state isolation between engagements
- Obfuscation detection (encoding chains, homoglyphs, split-message assembly, synonym substitution)
- Maintained adversarial jailbreak corpus (50+ patterns, quarterly execution, refreshed on model change)
- Decision consistency enforcement: if rejected when stated directly, must also be rejected when obfuscated or distributed across turns
Cross-references: MR-001, MR-018, MR-023, TP-022
APTS-MR-025: Tool Invocation Parameter Validation and Chaining Prevention (MUST | Tier 2)
Extends SC-020's external allowlist to parameter-level enforcement. Covers:
- Parameter schema enforcement for every allowlisted tool (types, ranges, constraints)
- Semantic validation of safety-critical parameters (rate limits, target identifiers, payload sizes, credential sources)
- Tool chaining detection: monitoring invocation sequences within a sliding window, with 10+ documented chaining patterns
- Parameter drift detection for recurring/long-running engagements
Cross-references: SC-020, SE-006, SC-004, SE-023, MR-023
APTS-TP-023: Foundation Model Behavioral Stability Verification (SHOULD | Tier 2)
Fills the gap between TP-022 material changes. Covers:
- Behavioral test suite (30+ cases across instruction-following fidelity, refusal stability, output format compliance, decision calibration)
- Execution before every engagement + weekly minimum + on provider API change
- Drift detection with engagement-blocking threshold
- Provider API changelog monitoring for silent changes
Cross-references: TP-021, TP-022, TP-002, AR-019
Rationale
These three requirements form a cohesive package addressing the same threat category: the AI agent runtime as an evolving, externally-influenced attack surface. MR-024 catches adversarial manipulation of the agent through conversation. MR-025 catches the agent using allowed tools in disallowed ways. TP-023 catches the agent's underlying model silently changing behavior. Together, they close the gap between current single-turn injection defenses and the reality of multi-turn, tool-wielding, provider-updated AI agents.
Affected Sections
standard/6_Manipulation_Resistance/README.md — two new requirements appended
standard/7_Supply_Chain_Trust/README.md — one new requirement appended
standard/appendix/Checklists.md — three new checklist entries
standard/README.md, standard/Introduction.md, README.md, index.md, standard/Getting_Started.md — requirement count updates (173 → 176, Tier 2: 85 → 88)
Style Compliance
All three requirements follow APTS conventions:
- RFC 2119 normative language consistent with Classification
- Verification subsections with specific, testable criteria
- Cross-references using
> **See also:** format
- Rationale sections explaining why the requirement exists
I have a complete draft ready to submit as a PR once this proposal is reviewed.
AI Disclosure
This proposal was drafted with assistance from Claude (Anthropic). The contributor has reviewed all content for accuracy and consistency with the APTS standard and takes full ownership per CONTRIBUTING.md.
Proposal: Three new requirements addressing AI agent security gaps
Problem
The current Manipulation Resistance (MR) domain addresses single-turn prompt injection (MR-001, MR-002, MR-018) and treats the agent runtime as untrusted (MR-023). However, three adjacent threat vectors remain unspecified:
Proposed Requirements
APTS-MR-024: Multi-Turn Jailbreak Detection and Response (MUST | Tier 2)
Extends MR to multi-turn interaction patterns. Covers:
Cross-references: MR-001, MR-018, MR-023, TP-022
APTS-MR-025: Tool Invocation Parameter Validation and Chaining Prevention (MUST | Tier 2)
Extends SC-020's external allowlist to parameter-level enforcement. Covers:
Cross-references: SC-020, SE-006, SC-004, SE-023, MR-023
APTS-TP-023: Foundation Model Behavioral Stability Verification (SHOULD | Tier 2)
Fills the gap between TP-022 material changes. Covers:
Cross-references: TP-021, TP-022, TP-002, AR-019
Rationale
These three requirements form a cohesive package addressing the same threat category: the AI agent runtime as an evolving, externally-influenced attack surface. MR-024 catches adversarial manipulation of the agent through conversation. MR-025 catches the agent using allowed tools in disallowed ways. TP-023 catches the agent's underlying model silently changing behavior. Together, they close the gap between current single-turn injection defenses and the reality of multi-turn, tool-wielding, provider-updated AI agents.
Affected Sections
standard/6_Manipulation_Resistance/README.md— two new requirements appendedstandard/7_Supply_Chain_Trust/README.md— one new requirement appendedstandard/appendix/Checklists.md— three new checklist entriesstandard/README.md,standard/Introduction.md,README.md,index.md,standard/Getting_Started.md— requirement count updates (173 → 176, Tier 2: 85 → 88)Style Compliance
All three requirements follow APTS conventions:
> **See also:**formatI have a complete draft ready to submit as a PR once this proposal is reviewed.
AI Disclosure
This proposal was drafted with assistance from Claude (Anthropic). The contributor has reviewed all content for accuracy and consistency with the APTS standard and takes full ownership per CONTRIBUTING.md.