feat: Sprint 7 — Safety Model (Phase 1)#213
Conversation
…rn (#205) New informative document defining: - The safety dongle deployment model — a standalone MCP-AQL server that functions as a universal safety layer for all tool calls - The execution safety loop protocol (plan → report → evaluate → act) - Minimal operation surface: 7 operations across 3 endpoints - Permission architecture: endpoint classification determines friction - Integration with Gatekeeper (pattern-based policy evaluation) - Integration with Danger Zone (out-of-band verification) - Opt-out mechanisms: enforcing, monitoring, logging, disabled modes Also adds cross-reference from gatekeeper.md (new Section 1.4). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…206) New normative Section 8.7 in v1.0.0-draft.md defining: - AutonomyDirective response contract (continue, factors, stopped, notifications) - 5-stage evaluation pipeline (step limit → outcome → patterns → safety tier → risk tolerance) - Agent notification system (permission_pending, autonomy_pause, danger_zone) - Configurable elements (step limits, risk tolerance, approval patterns) - Minimum viable implementation (step limit is only MUST stage) Also adds Section 7 (Autonomy Evaluator Integration) to gatekeeper.md describing how Gatekeeper blocks feed into the notification system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ement (#207) New normative Section 8.8 in v1.0.0-draft.md defines the challenge-response protocol for highest-risk actions during execution. Verification codes with 128-bit entropy are displayed through AI-inaccessible channels (OS dialogs, hardware tokens, SMS), creating cryptographic proof of human approval. Also adds Section 8 to danger-levels.md documenting how danger levels map to safety tiers during execution and how forbidden operations trigger out-of-band verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 4m 15s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is a substantial, well-architected specification PR. The safety model design is sound and the writing is clear. Below are findings across spec consistency, security, and completeness. All items are keyed to specific file locations. Strengths
Issues1. Risk score boundary inconsistency between
|
- Fix risk score boundary mismatch: cap destructive at 60, dangerous starts at 61 (aligns with normative confirm=31-60, verify=61-85) - Add `deny` pattern category to normative Stage 3 with `stopped: true` semantics (hard block, no confirmation path) - Add self-approval prevention requirement: entity confirming an operation MUST NOT be the same agent that triggered the block - Add `nextStepRisk` field to informative AutonomyDirective table - Clarify rate limit window reset behavior with progressive lockout - Update v1.0.0-draft.md Last Updated date to 2026-02-25 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 2m 48s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is a well-structured Safety Model PR. The polish commit ( Prior Review: All 6 Issues Resolved ✓The polish commit addressed:
New Findings1.
|
- Add `deny` to Section 8.7.4 Configurable Elements table - Define `record_execution_step` parameter table with `stepDescription`, `outcome`, `nextActionHint`, and `findings` fields - Update Stage 2 to reference the `outcome` field explicitly, with skip behavior when omitted - Fix imprecise anchor in execution-safety-loop.md security note (#87-autonomy-evaluation → #873-agent-notification-system) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 4m 29s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is a solid safety model PR. The latest polish commit ( Prior Review: All 3 Issues Resolved ✓
New Findings1.
|
| Operation | Endpoint |
|---|---|
introspect |
READ |
execute_agent |
EXECUTE |
record_execution_step |
CREATE |
complete_execution |
EXECUTE |
abort_execution |
EXECUTE |
Section 8.8 defines verify_challenge semantically (Lines 1273, 1289) but neither reserves the operation name in Section 8.5.4 nor assigns a CRUDE endpoint. The informative doc (execution-safety-loop.md:274) places it on CREATE (CONFIRM_SESSION), but this assignment has no normative backing.
The CRUDE endpoint assignment is conformance-relevant: it determines the default Gatekeeper permission tier. An implementer following only the normative spec will find no specification for verify_challenge's endpoint placement.
Fix: Add verify_challenge to the Section 8.5.4 reserved operations table with its CRUDE endpoint (CREATE, matching the informative doc's assignment), and note the Section 8.8 reference. Fix this →
2. Informative doc's record_execution_step section covers only nextActionHint (Minor)
execution-safety-loop.md:199-235 — Section 3.3 is titled "The nextActionHint Field" and documents only that parameter. The loop diagram at Line 155 also shows { nextActionHint } as the only payload.
The normative spec now defines a 5-parameter table for record_execution_step (element_name, stepDescription, outcome, nextActionHint, findings), two of which are SHOULD-level. An implementer consulting the informative doc for integration guidance will not discover outcome (which drives Stage 2 of the evaluation pipeline) without consulting the normative spec separately. Since Stage 2 is the fix for the prior round's "Stage 2 unimplementable" finding, discoverability matters here.
Fix: Either add a brief cross-reference in Section 3.3 pointing to the parameter table in Section 8.6.3, or rename the section to "The record_execution_step Parameters" and document all SHOULD-level fields. Fix this →
Strengths
- Three-review arc is complete. The spec has improved meaningfully across all three passes. The original 6 issues, then 3, then these 2 — net trajectory is strong.
outcomeparameter integration is well done. Adding it to the parameter table, updating Stage 2 to reference it directly, and noting "ifoutcomeis omitted, this stage is skipped" (Line 1126) makes Stage 2 fully implementable with no ambiguity.- Rate limiting now complete. Section 8.8.5 specifies window reset behavior and introduces progressive lockout as an option — addresses the prior gap without over-prescribing.
- Self-approval prevention is correctly placed. The normative requirement in Section 8.7.3 and the informative note in
execution-safety-loop.md:315-318are consistent and specific enough to implement. denyrow in Section 8.7.4 is correct and consistent with the pattern example in Stage 3.
Summary
Finding #1 (verify_challenge missing from reserved operations table) is medium priority — it leaves the normative CRUDE endpoint for a protocol-level operation unspecified. Finding #2 is a minor discoverability gap in the informative doc. Both are small, targeted fixes.
- Fix root-relative links in versioning.md (/CHANGELOG.md and /CONTRIBUTING.md → relative paths) that caused link-check CI failure - Add verify_challenge and confirm_operation to Section 8.5.4 reserved operations table with CRUDE endpoint assignments - Expand informative doc Section 3.3 from nextActionHint-only to full record_execution_step parameter overview with outcome field and cross-reference to normative parameter table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 53s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the fourth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Stage 3
|
- Stage 3 deny now references Section 8.8 out-of-band verification
- Section 8.8.1 hard-block trigger now references Stage 3 deny patterns
- Loop diagram updated to show { nextActionHint, outcome? } payload
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 5m 26s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the fifth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1.
|
- Clarify that verify and danger_zone tiers both use the Section 8.8 out-of-band verification protocol; verify pauses (continue: false), danger_zone hard-blocks (stopped: true) - Update Section 8.8.1 trigger conditions to include verify tier - Broaden verify_challenge reserved operation description to cover both tiers - Fix 1-character alignment in ASCII loop diagram Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 47s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the sixth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Section 8.8 intro paragraph contradicts Section 8.8.1 trigger conditions (Medium)
But
The intro and its own subsection directly contradict each other. An implementer reading the intro will conclude Section 8.8 is only relevant to Fix: Replace "assigns a 2. Section 8.8.3 "Blocking Semantics" doesn't cover
|
- Fix Section 8.8 intro to include verify tier (was danger_zone only) - Define verify pause-window behavior: subsequent record_execution_step calls return continue:false with pending challenge ID; verify pauses do not persist across restarts or block new executions - Update informative doc Section 6.1 table to reference verify_challenge - Rewrite Section 6.2 to cover both verify pause and danger_zone hard stop flows side by side Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 5m 45s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the seventh review pass on this PR. The polish commit ( Prior Review: All 3 Issues Resolved ✓
New Findings1. Initial
|
- verificationId metadata field now covers both verify and danger_zone - autonomy_pause notifications for verify tier SHOULD include verificationId; for confirm tier, cleared by confirm_operation - Initial AutonomyDirective MUST include challenge ID via notification (autonomy_pause for verify, danger_zone for danger_zone tier) - Broaden confirm_operation reserved ops entry to include Autonomy Evaluator confirm tier pauses alongside Gatekeeper blocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 23s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the eighth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. SHOULD vs MUST normative conflict for
|
| Type | Trigger | Action Required |
|---|---|---|
permission_pending |
Gatekeeper blocked an operation | Call confirm_operation to approve |
autonomy_pause |
Autonomy Evaluator returned continue: false |
Human review needed |
danger_zone |
Danger Zone hard block triggered | Out-of-band verification required (Section 6) |
The normative fix in Section 8.7.3 now specifies that confirm tier autonomy_pause notifications are cleared by confirm_operation, matching the pattern established for permission_pending. But the informative doc's table still says "Human review needed" for autonomy_pause — it doesn't distinguish between confirm tier (where confirm_operation unblocks) and verify tier (where verify_challenge is used). An integrator reading only the informative doc's table will not discover that confirm_operation is the correct clearing call for confirm tier pauses.
Fix: Update the autonomy_pause row's "Action Required" column to read: "For confirm tier: call confirm_operation; for verify tier: out-of-band verify_challenge (Section 6)". Fix this →
Strengths
The eight-review arc has produced a substantially stronger spec:
- 14 of 16 issues across seven prior rounds resolved. Each polish commit addressed findings precisely without over-engineering.
- Challenge ID transport is now fully specified for both tiers.
autonomy_pausewithmetadata.verificationIdforverify;danger_zonenotification withmetadata.verificationIdfordanger_zone. The agent-side contract is now complete. confirm_operationdual-scope is resolved. The reserved operations table and Section 8.7.3 are aligned —confirm_operationis documented as clearing both Gatekeeperpermission_pendingblocks and Autonomy Evaluatorconfirmtier pauses.- All 7 reserved operations have normative CRUDE endpoint assignments and purposes. The Section 8.5.4 table is complete.
- Section 8.8 intro, trigger conditions, blocking semantics, and informative Section 6.2 are internally consistent for both
verifyanddanger_zonetiers.
Summary
Finding #1 (SHOULD vs MUST conflict for verificationId in 8.7.3 vs 8.8.2) is medium priority — it gives implementations a normative out to omit the challenge ID despite it being required by 8.8.2, undermining the transport guarantee the prior round's fix was meant to establish. Finding #2 is a one-line table update in the informative doc. Both are small, targeted fixes.
- Fix SHOULD/MUST conflict: autonomy_pause for verify tier MUST include verificationId (aligns with Section 8.8.2 MUST requirement) - Update informative doc notification table to distinguish confirm tier (confirm_operation) from verify tier (verify_challenge) actions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 13s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the ninth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Section 4.1 operations table has outdated purpose descriptions for two operations (Medium)
Both of these are now outdated. The 7th-round polish commit expanded the scope of both operations:
An implementer reading Section 4.1 will conclude Fix: Update the Section 4.1 table's Purpose column:
2. Three "Danger Zone only" scope references not updated for
|
Systematic update of all locations in execution-safety-loop.md that narrowly scoped confirm_operation to "Gatekeeper only" or verify_challenge to "Danger Zone only" after normative scope was broadened in prior commits: - Section 4.1 operation table: both operation purposes updated - Security note: self-approval warning now covers both block types - Section 9.2 SHOULD requirements: both operations broadened - Section 9.3 MAY requirements: OOB verification includes verify tier - Related Specifications: Section 8.8 link description updated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @mickdarling's task in 3m 55s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the tenth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Section 8.8.3 "Danger Zone event" heading doesn't explicitly cover
|
Systematic audit-driven fixes to close all remaining specification gaps: Notification lifecycle completeness (Section 8.7.3): - permission_pending: add failed confirmation behavior (stays pending) - autonomy_pause (verify tier): add clearing condition (verify_challenge success or challenge expiration) - danger_zone: add clearing condition (verify_challenge or admin override), specify broadcast includes affected agent, scope includes deny pattern matches alongside danger_zone tier Blocking semantics (Section 8.8.3): - Replace "Danger Zone event" with explicit scope: any event that sets stopped:true (danger_zone tier, deny pattern match, or equivalent) - Add challenge expiration behavior for verify tier: pause lifts, re-evaluation occurs, new challenge generated if same action reported Normative consistency sweep: - Remove narrow "Danger Zone" from stopped:true comment and decision tree (now references Section 8.8 generically) - Update deny configurable element description to reference Section 8.8 - danger-levels.md Section 8.2: parallel documentation for both verify tier (pause) and danger_zone tier (hard block) enforcement flows - Informative doc danger_zone notification row: include deny patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 4m 20s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the eleventh review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1.
|
| Trigger | Section 8.8.1 normative strength | Section 8.8.2 step 2 coverage |
|---|---|---|
verify tier |
— | MUST via autonomy_pause with metadata.verificationId ✓ |
danger_zone tier |
SHOULD include notification | MUST via danger_zone with metadata.verificationId ✓ |
deny pattern match |
SHOULD include notification (same line as danger_zone) |
Not mentioned |
The relevant text: v1.0.0-draft.md:1252 — "SHOULD include a danger_zone notification broadcast to all executing agents" applies equally to both danger_zone and deny triggers. But Section 8.8.2 step 2 mentions only "danger_zone tier" by name — deny is not extended to the MUST transport guarantee.
The consequence: an implementation that lawfully skips the SHOULD for a deny-blocked agent provides stopped: true but no verificationId. The blocked agent has no challenge ID to call verify_challenge with. The Section 8.8.2 MUST explicitly closes this gap for danger_zone tier but leaves it open for deny pattern matches — creating a recovery-path gap for the most restricted class of hard blocks.
Fix: Update v1.0.0-draft.md:1274 to extend the MUST to deny pattern matches: replace "or danger_zone with metadata.verificationId for danger_zone tier" with "or danger_zone with metadata.verificationId for danger_zone tier or deny pattern matches". Alternatively, update the SHOULD in line 1252 to MUST for both danger_zone and deny triggers (since neither can be meaningfully unblocked without the notification). Fix this →
2. danger_zone notification description uses definitive language for verificationId but notification itself is SHOULD (Minor)
v1.0.0-draft.md:1207 (Section 8.7.3) reads:
"
danger_zone: Broadcast to ALL executing agents (including the affected agent) when any agent triggers a hard block (danger_zonetier ordenypattern match). Includes theverificationIdfor the affected agent. Cleared when the blocked agent is unblocked via successfulverify_challengeor admin override."
The notification is SHOULD-level (Section 8.8.1, line 1252). The description says the notification "Includes the verificationId" — a definitive statement without a normative qualifier. Since the notification may lawfully be omitted (SHOULD), the conditional "when sent" is missing. The intended meaning is clear from context, but the mismatch between "SHOULD send" and "Includes" (always-true) introduces a semantic gap: an implementer who omits the notification reads the description as though they've also omitted a required verificationId delivery.
Fix: Replace "Includes the verificationId for the affected agent" with "When sent, MUST include the verificationId for the affected agent." This tightens the conditional and ensures that any sent notification always delivers the challenge ID. Fix this →
Strengths
The eleven-review arc has converged to a substantially complete spec:
- 20 of 22 issues across ten prior rounds resolved. The polish commits have addressed every finding precisely without over-engineering.
- Section 8.8.3 blocking semantics are now explicit and complete. The heading covers all
stopped: truetriggers; theverifypause expiration behavior (lift, re-evaluate, regenerate) is fully specified. - All three notification types have complete lifecycle descriptions.
permission_pending(cleared byconfirm_operation, failed attempts don't clear),autonomy_pause(dual clearing paths forconfirmvs.verifytier),danger_zone(broadcast scope,verificationIdincluded, cleared byverify_challengeor admin override). denypattern matches are now first-class throughout the spec. Section 8.8.1 trigger conditions, Section 8.8.3 heading, Section 8.7.3danger_zonenotification description, and the informative doc notification table all correctly includedenypattern matches alongsidedanger_zonetier.danger-levels.mdSection 8.2 now covers both enforcement tiers in parallel. Theverify(pause) anddanger_zone(hard block) flows are documented symmetrically with accurate step-by-step descriptions.
Summary
Finding #1 (deny challenge ID transport weaker than other triggers) is medium priority — it creates a path where a deny-blocked agent is permanently stuck with stopped: true and no verificationId to call verify_challenge, violating the recovery-path guarantee that Section 8.8 establishes. Finding #2 is a one-sentence wording fix to add a normative qualifier to the danger_zone notification's verificationId inclusion.
The danger_zone notification (for both danger_zone tier and deny pattern triggers) is upgraded from SHOULD to MUST because stopped:true without a verificationId creates an unrecoverable dead end — the agent is permanently blocked with no challenge ID to call verify_challenge. - Section 8.8.1: SHOULD → MUST for danger_zone notification - Section 8.8.2: extend MUST challenge ID transport to deny matches - Section 8.7.3: add MUST qualifier to verificationId inclusion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 12s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the twelfth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Informative doc compliance summary lists hard-block state persistence as MAY — normative spec says MUST (Medium)
But
These directly conflict. The normative MUST was justified by the hard-block security guarantee — a blocked agent that becomes unblocked on restart defeats the purpose of the An implementer using Section 9 as a compliance checklist will categorize persistence as optional, build accordingly, pass internal reviews, and then have a server restart silently unblock every hard-blocked agent. The normative MUST is actionable (even prescribes the mechanism: file-based or database storage) but isn't represented in the informative doc's compliance table. Fix: Move "persist blocked agent state across server restarts" from Section 9.3 (MAY) to Section 9.1 (MUST), updating the wording to match the normative spec's intent: "MUST persist hard-blocked agent state across server restarts when implementing 2.
|
- Move blocked state persistence from Section 9.3 MAY to Section 9.1 MUST (normative spec Section 8.8.3 requires it) - Add danger_zone notification broadcast to Section 9.1 MUST (normative spec Section 8.8.1 now requires it for all stopped:true events) - Narrow Section 9.2 SHOULD for notifications to non-hard-block events since hard-block notifications are covered by the new MUST Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 16s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the thirteenth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Section 9.1 MUST list missing
|
…MUST - Section 8.7.1: qualify "All other fields are OPTIONAL" with cross-ref to Sections 8.8.1/8.8.2 where notifications field is mandatory - Section 9.1: add verify tier autonomy_pause notification with metadata.verificationId as a MUST requirement - Section 9.2: qualify SHOULD for non-hard-block notifications to exclude verify tier (already covered as MUST in Section 9.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 4m 3s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
|
…215) - Move step limits (maxAutonomousSteps) from Section 9.3 MAY to Section 9.1 MUST, matching normative Section 8.7.5 requirement - Narrow Section 9.2 verify tier exclusion to specifically reference the autonomy_pause notification with metadata.verificationId, preserving SHOULD coverage for other notification types in verify tier responses - Remove duplicate MAY bullet for risk tolerance thresholds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 3m 53s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the fifteenth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Section 9.3 "MAY support out-of-band verification" conflicts with Section 9.2 "SHOULD implement
|
| Section 8.7.5 Stage | Requirement | Section 9 entry |
|---|---|---|
| Step Limit | MUST | Section 9.1: "MUST implement the Step Limit stage" ✓ |
| Previous Outcome | SHOULD | Absent from Section 9.2 ✗ |
| Pattern Matching | SHOULD | Section 9.2: "SHOULD support configurable policy patterns" ✓ |
| Safety Tier | MAY | Section 9.3: "MAY implement pattern-based automatic danger level classification" ✓ |
| Risk Tolerance | MAY | Section 9.3: "MAY support configurable risk tolerance thresholds" ✓ |
Stage 2 — evaluating the outcome field to return continue: false on failures — is a SHOULD in v1.0.0-draft.md:1234. The normative spec at line 1125 says: "If the outcome field... reports "failure", the evaluator SHOULD return continue: false". An implementer checking Section 9.2 as a complete SHOULD checklist will not know this stage exists.
Fix: Add to Section 9.2: "SHOULD evaluate the \outcome` field in `record_execution_step` calls and return `continue: false` on reported failures (Stage 2, Section 8.7.5)"`. Fix this →
Strengths
The fifteen-review arc has produced a substantially complete spec:
- 28 of 30 issues across fourteen prior rounds resolved. Each polish commit addressed findings precisely without over-engineering.
- Section 9.1 MUST list is now complete for all protocol obligations. Step Limit, persistence,
danger_zonenotification broadcast,autonomy_pause+verificationId— all correctly placed and worded. - Section 9.2
verifytier carve-out is now precise. The parenthetical correctly identifies that only theautonomy_pause+verificationIdcombination is MUST — other notification types inverifytier responses retain SHOULD coverage. - All three notification types have symmetric lifecycle descriptions.
permission_pending,autonomy_pause(bothconfirmandverifysub-cases), anddanger_zoneall have trigger conditions, clearing conditions, and normative strength defined. - The
denytier is fully first-class throughout. Section 8.8.1 trigger conditions, 8.8.3 heading, 8.7.3 notification description, Section 9.1 notification MUST, informative notification table — all consistent. - Section 8.8.3 blocking semantics cover all cases. Hard blocks (
stopped: true),verifypauses (pending challenge, expiration, re-evaluation), and restart persistence are all specified.
Summary
Finding #1 (Section 9.3 MAY vs Section 9.2 SHOULD for out-of-band verification) is medium priority — the redundant MAY gives implementers an escape hatch below the SHOULD strength, and is inconsistent with the MUSTs that presuppose challenge generation. Finding #2 is minor — the Section 9.2 SHOULD list is missing a Stage 2 entry that corresponds directly to a SHOULD row in the Section 8.7.5 minimum viable implementation table.
- Replace Section 9.3 "MAY support out-of-band verification" (conflicts with Section 9.2 SHOULD for verify_challenge) with MAY for display channel choice (OS dialogs, hardware tokens, SMS/email) - Add Section 9.2 bullet for Stage 2 Previous Outcome evaluation, completing the Section 8.7.5 coverage in the compliance checklist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review this PR. |
|
Claude finished @mickdarling's task in 6m 9s —— View job PR Review: feat: Sprint 7 — Safety Model (Phase 1)
This is the sixteenth review pass on this PR. The polish commit ( Prior Review: Both Issues Resolved ✓
New Findings1. Section 8.8.5 rate limiting requirements entirely absent from Section 9 compliance checklist (Medium)
An implementer who builds Fix: Add to Section 9.2 SHOULD requirements: 2. Section 9.2 Stage 2 bullet cross-references 8.7.5 (requirement table) but not 8.7.2 (behavioral definition) (Minor)
Section 8.7.5 is the minimum viable implementation table — it shows the requirement level (SHOULD) but not the behavioral spec. The actual Stage 2 definition, including the "if Section 3.3 of the informative doc does already cross-reference Section 8.7.2 for Stage 2 ( Fix: Update the link to reference Section 8.7.2 rather than 8.7.5: replace StrengthsThe sixteen-review arc has converged to a near-complete spec:
SummaryFinding #1 (Section 8.8.5 rate limiting absent from Section 9.2) is medium priority — it's a security-relevant gap where an implementer who builds |

Summary
Sprint 7 delivers the complete Safety Model foundation for MCP-AQL:
docs/security/execution-safety-loop.mddefining the safety dongle as a standalone MCP-AQL server for safety enforcement, with minimal operation surface (7 ops across 3 endpoints), permission architecture, and deployment examplesAutonomyDirectiveresponse contract, 5-stage evaluation pipeline, agent notification system, and configurable risk toleranceFiles Changed
docs/versions/v1.0.0-draft.mddocs/security/execution-safety-loop.mddocs/security/gatekeeper.mddocs/adapter/danger-levels.mdCHANGELOG.mdKey Concepts
nextActionHintand receivesAutonomyDirectivego/no-go responsescontinue,factors,stopped,reason,stepsRemaining,notificationsTest plan
npx markdownlint-cli2 "docs/**/*.md"— 0 errorsnpx cspell "docs/**/*.md" "CHANGELOG.md"— 0 issuesnode scripts/validate-schema-examples.mjs— 27/27 passed🤖 Generated with Claude Code