feat: Add AI agent security requirements (multi-turn jailbreak, tool-use safety, model behavioral stability)

## Proposal: Three new requirements addressing AI agent security gaps

### Problem

The current Manipulation Resistance (MR) domain addresses single-turn prompt injection (MR-001, MR-002, MR-018) and treats the agent runtime as untrusted (MR-023). However, three adjacent threat vectors remain unspecified:

1. **Multi-turn jailbreak sequences** — no single message constitutes an injection, but the cumulative conversational sequence achieves scope override, instruction bypass, or data exfiltration. This is the conversational analogue of TOCTOU attacks.
2. **Tool parameter abuse and chaining** — SC-020 enforces an external tool allowlist, but does not validate the *parameters* passed to allowed tools or detect *sequences* of allowed tools that combine to achieve a disallowed outcome (e.g., a scan tool with zero rate limit causing DoS, or recon → credential-extraction → lateral-movement achieving unauthorized pivot).
3. **Silent model behavioral drift** — TP-022 requires re-attestation on *material* model changes, but provider-side silent updates (inference engine optimizations, safety filter tuning, quantization changes) can shift behavior below the materiality threshold without triggering re-attestation. No canary mechanism exists to detect these shifts before they affect a customer engagement.

### Proposed Requirements

#### APTS-MR-024: Multi-Turn Jailbreak Detection and Response (MUST | Tier 2)

Extends MR to multi-turn interaction patterns. Covers:
- Conversation state isolation between engagements
- Obfuscation detection (encoding chains, homoglyphs, split-message assembly, synonym substitution)
- Maintained adversarial jailbreak corpus (50+ patterns, quarterly execution, refreshed on model change)
- Decision consistency enforcement: if rejected when stated directly, must also be rejected when obfuscated or distributed across turns

Cross-references: MR-001, MR-018, MR-023, TP-022

#### APTS-MR-025: Tool Invocation Parameter Validation and Chaining Prevention (MUST | Tier 2)

Extends SC-020's external allowlist to parameter-level enforcement. Covers:
- Parameter schema enforcement for every allowlisted tool (types, ranges, constraints)
- Semantic validation of safety-critical parameters (rate limits, target identifiers, payload sizes, credential sources)
- Tool chaining detection: monitoring invocation sequences within a sliding window, with 10+ documented chaining patterns
- Parameter drift detection for recurring/long-running engagements

Cross-references: SC-020, SE-006, SC-004, SE-023, MR-023

#### APTS-TP-023: Foundation Model Behavioral Stability Verification (SHOULD | Tier 2)

Fills the gap between TP-022 material changes. Covers:
- Behavioral test suite (30+ cases across instruction-following fidelity, refusal stability, output format compliance, decision calibration)
- Execution before every engagement + weekly minimum + on provider API change
- Drift detection with engagement-blocking threshold
- Provider API changelog monitoring for silent changes

Cross-references: TP-021, TP-022, TP-002, AR-019

### Rationale

These three requirements form a cohesive package addressing the same threat category: **the AI agent runtime as an evolving, externally-influenced attack surface**. MR-024 catches adversarial manipulation of the agent through conversation. MR-025 catches the agent using allowed tools in disallowed ways. TP-023 catches the agent's underlying model silently changing behavior. Together, they close the gap between current single-turn injection defenses and the reality of multi-turn, tool-wielding, provider-updated AI agents.

### Affected Sections

- `standard/6_Manipulation_Resistance/README.md` — two new requirements appended
- `standard/7_Supply_Chain_Trust/README.md` — one new requirement appended
- `standard/appendix/Checklists.md` — three new checklist entries
- `standard/README.md`, `standard/Introduction.md`, `README.md`, `index.md`, `standard/Getting_Started.md` — requirement count updates (173 → 176, Tier 2: 85 → 88)

### Style Compliance

All three requirements follow APTS conventions:
- RFC 2119 normative language consistent with Classification
- Verification subsections with specific, testable criteria
- Cross-references using `> **See also:**` format
- Rationale sections explaining why the requirement exists

I have a complete draft ready to submit as a PR once this proposal is reviewed.

### AI Disclosure

This proposal was drafted with assistance from Claude (Anthropic). The contributor has reviewed all content for accuracy and consistency with the APTS standard and takes full ownership per CONTRIBUTING.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add AI agent security requirements (multi-turn jailbreak, tool-use safety, model behavioral stability) #35

Proposal: Three new requirements addressing AI agent security gaps

Problem

Proposed Requirements

APTS-MR-024: Multi-Turn Jailbreak Detection and Response (MUST | Tier 2)

APTS-MR-025: Tool Invocation Parameter Validation and Chaining Prevention (MUST | Tier 2)

APTS-TP-023: Foundation Model Behavioral Stability Verification (SHOULD | Tier 2)

Rationale

Affected Sections

Style Compliance

AI Disclosure

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Add AI agent security requirements (multi-turn jailbreak, tool-use safety, model behavioral stability) #35

Description

Proposal: Three new requirements addressing AI agent security gaps

Problem

Proposed Requirements

APTS-MR-024: Multi-Turn Jailbreak Detection and Response (MUST | Tier 2)

APTS-MR-025: Tool Invocation Parameter Validation and Chaining Prevention (MUST | Tier 2)

APTS-TP-023: Foundation Model Behavioral Stability Verification (SHOULD | Tier 2)

Rationale

Affected Sections

Style Compliance

AI Disclosure

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions