Phase 1a (additive): measure safety-gating rate (induced-danger) in mcpaql-parity

Part of MCPAQL/website#32 (Epic: MCPAQL 2026 product launch). Phase 1a — additive.

## What

Add a **safety-gating metric** to the `mcpaql-parity` runner that measures how the MCPAQL-adapted path handles operations that can do damage, versus raw MCP.

This reuses the **induced-error harness pattern already built in #21** (closed). Instead of injecting a bad-argument response, the harness injects tasks that *should* trigger a destructive or irreversible operation, and we measure what each configuration does with it.

Metric: **induced-danger gating rate** — across a set of tasks that would trigger destructive/irreversible operations:

- How often does the MCPAQL path correctly **surface the danger up front** / **require a confirmation step** / **refuse**?
- How often does **raw MCP** simply execute it unguarded?

## Why

This makes the **Programmatic Safety & AI Governance** pillar *measurable* instead of asserted. Right now Phase 1a proves token economy (96%) and is set up to prove LLM correctness + recovery (tools#22). The safety/governance pillar has no measured proof point — this is the cheapest way to produce one, because the harness already exists.

The result feeds the case study (MCPAQL/website#34, new "Safety & Governance" section) and the Phase 1b homepage governance band.

## How

- [ ] Define a task set that would trigger destructive/irreversible operations (delete, force-push-equivalent, money-movement-equivalent, etc.) against the GitHub MCP adapter surface
- [ ] Reuse the #21 induced-error harness scaffolding; add a "danger-gating" task category
- [ ] Record per-task: did the config surface danger / require confirmation / refuse / execute unguarded
- [ ] Aggregate a gating-rate summary (raw vs. MCPAQL-adapted), markdown-embeddable
- [ ] Tests/fixtures that validate the metric without requiring a live benchmark run

## Scope note

**Additive — does not change the scope of MCPAQL/tools#22.** That issue's four metrics (tool-list tokens, first-call success, turns, recovery) stay as-is. This adds a fifth, optional metric category that #22 can include if the run is cheap, or that can run separately.

## Phase

1a (additive).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase 1a (additive): measure safety-gating rate (induced-danger) in mcpaql-parity #26

What

Why

How

Scope note

Phase

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Phase 1a (additive): measure safety-gating rate (induced-danger) in mcpaql-parity #26

Description

What

Why

How

Scope note

Phase

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions