Skip to content

Revise C09 levels and terminology#151

Merged
jmanico merged 1 commit intoOWASP:mainfrom
ottosulin:feat/updateC09levels
Mar 15, 2026
Merged

Revise C09 levels and terminology#151
jmanico merged 1 commit intoOWASP:mainfrom
ottosulin:feat/updateC09levels

Conversation

@ottosulin
Copy link
Collaborator

Revise C09 levels and terminology

Rebalances C09 level assignments and minor terminology improvements. Reordered accordingly.

Level changes

Control Change Rationale
9.3.3 (was 9.3.4) L2 → L1 Validating tool outputs before downstream use is a basic trust boundary control; equivalent to C7's 7.1.3 (treat model output as untrusted input, L1) and C2's input validation principles (L1). OWASP LLM06 Excessive Agency lists output sanitization as a core prevention strategy
9.6.4 L3 → L2 "Access control decisions enforced by application logic, never by the AI model" is a core architectural principle, not a niche technique; OWASP LLM06 explicitly calls this out as "complete mediation." C10's 10.2.4 has the equivalent for MCP at L1; L2 is appropriate here as the broader orchestration scope requires more deliberate architectural commitment

Language and terminology changes

  • 9.3.4 (was 9.3.5): "integrity-protected and validated" → "integrity-verified (e.g., signatures, checksums)" and "tool binaries" → "tool binaries or packages" (generalizes beyond binaries to cover any loadable tool artifact)
  • 9.5.1: "modern protocols (e.g., TLS 1.3)" → "current recommended protocols (e.g., TLS 1.3 or later)" (avoids dating; consistent with C1's "current recommended cryptographic algorithms")

References added

  • OWASP LLM06:2025 Excessive Agency - LLM Top 10 entry covers tool misuse, excessive permissions, and excessive autonomy
  • OWASP LLM10:2025 Unbounded Consumption - covers resource exhaustion, denial-of-wallet, and unbounded agent execution
  • OWASP Agentic AI Threats and Mitigations - reference of agentic AI threats

@almogbhl
Copy link
Contributor

Some optional additions:

1. System prompt immutability:
Agent system prompts should be integrity-protected and require human approval before changes, same as any security-critical configuration. Without this, a modified system prompt silently corrupts the baseline for all downstream controls.
(ASI01: Agent Goal Hijack)

2. Manual kill switch
9.1.3 covers automatic circuit breakers on budget violations, but there's no control for a user or operator to immediately halt an agent that's behaving dangerously within budget.
(ASI10: Rogue Agents | AITG-APP-06: Agentic Behavior Limits)

3. Cascading failure containment
Nothing prevents one agent's corrupted output from becoming the next agent's trusted input. Inter-agent fault propagation is missing.
(ASI08: Cascading Failures)

4. Behavioral baseline and drift detection
No control establishes per-agent behavioral baselines (goal state, tool-use patterns, action sequences) or alerts on gradual drift from approved patterns.
(ASI01: Agent Goal Hijack | ASI09: Human-Agent Trust Exploitation | ASI10: Rogue Agents)

5. (9.6.2) concern
(correct me if I'm wrong but for my understanding, MCP delegates via external OAuth while non-MCP agents delegate via the system's native auth mechanism)

This control says "without using the user's credentials," but the recommended pattern is for downstream services to enforce the user's access level, which requires user-scoped auth (delegated tokens, token exchange, or the service's native auth).
Agent-auth (agent's own identity, e.g., service account) is only appropriate when all users share the same level of access. This also clarifies 9.6.4's "application logic", when users have different access levels, enforcement should be at the downstream service, not solely in the agent's runtime. Suggest rewording 9.6.2 to clarify that downstream calls must be scoped to the user's permissions when users have different access levels.
(Google ADK Safety: Agent-Auth vs User-Auth | ASI03: Identity & Privilege Abuse)

6. Minimize non-deterministic scope
No control requires that deterministic operations (validation, data retrieval, calculations, API calls) are implemented as standard code rather than delegated to the LLM. Without this, developers put entire workflows in the agentic context, expanding the attack surface for prompt injection unnecessarily.
(AWS Prescriptive Guidance: "Use deterministic execution logic unless AI is needed" | ASI01: Agent Goal Hijack)

@ottosulin
Copy link
Collaborator Author

ottosulin commented Mar 15, 2026

Thanks for the thorough review, these are well-researched points! My thoughts:

  1. System prompt immutability: Good catch. C3.4.2 already requires prompt templates and agent policies to be version-controlled with peer review approval (L1), which covers the change-approval side. The integrity-protection-at-load-time angle is a genuine gap worth adding though.
    -> Could you file an issue for that and send a PR once this one merges?

  2. Manual kill switch: Agreed this matters, and it's already covered by C14.1.1 (manual kill-switch to halt inference, L1) and C14.1.2 (override controls restricted to authorized personnel, L1). Those apply to all AI systems including agents, so adding a duplicate here would create redundancy.

  3. Cascading failure containment: Valid concern. The validation side is partly covered by 9.3.3 (validate tool outputs before downstream use, L1) and 9.5.2 (strict schema validation of all messages, L1), but the semantic corruption side, where a plausible-looking but subtly poisoned output passes schema checks and propagates through the chain, is a gap.
    -> Could you file an issue for that and send a PR once this one merges?

  4. Behavioral baseline and drift detection: This is covered across chapters: C13.8.3 analyzes agent behavior patterns for security implications (L2), C13.8.5 detects deviations indicating compromise (L3), and 9.8.2 monitors for unsafe emergent behavior like abnormal call graphs (L3). Monitoring controls are in C13 and enforcement in the domain chapters.

  5. 9.6.2 concern: The current wording is intentionally describing the standard delegation/token-exchange pattern; the agent authenticates with its own identity but carries delegation context containing the user's scopes, and downstream services enforce user-level permissions based on that context. "Without using the user's credentials" means the agent ideally never holds or forwards the user's actual authentication credentials (passwords, session cookies), not that it ignores the user's authorization scope. The control explicitly lists "user ID, tenant, session, scopes" as the propagated context.

Coincidentally, I think this would be now consistent with your PR's (#150) good suggestion (also now both L2): "10.2.9 | Verify that MCP servers do not pass through access tokens received from clients to downstream APIs and instead obtain a separate token scoped to the server's own identity (e.g., via on-behalf-of or client credentials flow)."

  1. Minimize non-deterministic scope: Great design principle and I agree with the intent. The AISVS captures the security properties behind it without prescribing internal architecture: 9.6.4 requires access control decisions to be enforced by application logic and never by the model (L2), and 9.7.1 requires deterministic policy gates (L1). Drawing a verifiable line between "AI needed here" and "code could do this" would be hard to audit consistently, so we focus on the security outcomes. To be clear I agree - it is just tricky to formalize as an auditable requirement.

@almogbhl
Copy link
Contributor

Great feedback. Clean and fair.
I will open a PR for both

@jmanico jmanico merged commit 18c8a55 into OWASP:main Mar 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants