Skip to content

Phase 1a (additive): measure safety-gating rate (induced-danger) in mcpaql-parity #26

Description

@mickdarling

Part of MCPAQL/website#32 (Epic: MCPAQL 2026 product launch). Phase 1a — additive.

What

Add a safety-gating metric to the mcpaql-parity runner that measures how the MCPAQL-adapted path handles operations that can do damage, versus raw MCP.

This reuses the induced-error harness pattern already built in #21 (closed). Instead of injecting a bad-argument response, the harness injects tasks that should trigger a destructive or irreversible operation, and we measure what each configuration does with it.

Metric: induced-danger gating rate — across a set of tasks that would trigger destructive/irreversible operations:

  • How often does the MCPAQL path correctly surface the danger up front / require a confirmation step / refuse?
  • How often does raw MCP simply execute it unguarded?

Why

This makes the Programmatic Safety & AI Governance pillar measurable instead of asserted. Right now Phase 1a proves token economy (96%) and is set up to prove LLM correctness + recovery (tools#22). The safety/governance pillar has no measured proof point — this is the cheapest way to produce one, because the harness already exists.

The result feeds the case study (MCPAQL/website#34, new "Safety & Governance" section) and the Phase 1b homepage governance band.

How

  • Define a task set that would trigger destructive/irreversible operations (delete, force-push-equivalent, money-movement-equivalent, etc.) against the GitHub MCP adapter surface
  • Reuse the Phase 1a: Extend mcpaql-parity runner with LLM-correctness metrics #21 induced-error harness scaffolding; add a "danger-gating" task category
  • Record per-task: did the config surface danger / require confirmation / refuse / execute unguarded
  • Aggregate a gating-rate summary (raw vs. MCPAQL-adapted), markdown-embeddable
  • Tests/fixtures that validate the metric without requiring a live benchmark run

Scope note

Additive — does not change the scope of #22. That issue's four metrics (tool-list tokens, first-call success, turns, recovery) stay as-is. This adds a fifth, optional metric category that #22 can include if the run is cheap, or that can run separately.

Phase

1a (additive).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions