feat: enforcement and intent assessor improvements (ADR A.5, A.8)#484
Conversation
Reprioritize DeterministicEnforcementAssessor scoring so agent hooks (60 pts) outrank bypassable git hooks (40 pts), and add design doc enforcement detection to DesignIntentAssessor (advisory 10 pts, deterministic 15 pts). Also adds recommended starter hooks to .claude/settings.json and updates test-assess skill cleanup to avoid triggering the new destructive-command blocker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 22 minutes and 18 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds deterministic agent hooks (formatting + destructive-command guard), reprioritizes enforcement scoring toward ChangesAgent Hook Prioritization and Design Intent Enforcement
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📈 Test Coverage Report
Coverage calculated from unit tests only |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/agentready/assessors/patterns.py`:
- Around line 355-356: The current check in patterns.py awards
deterministic_score when hooks exist and doc_ref_pattern.search(hooks_str)
matches, which gives +15 for mere mentions; change the condition to require both
a design-doc reference and an enforcement-intent keyword in hooks_str (e.g.,
"must", "must not", "require", "required", "ensure", "update", "check",
"verify", "enforce") by creating/intending an intent_pattern and using
intent_pattern.search(hooks_str.lower()) &&
doc_ref_pattern.search(hooks_str.lower()) before setting deterministic_score
(refer to hooks, hooks_str, doc_ref_pattern, deterministic_score).
In `@src/agentready/assessors/testing.py`:
- Around line 797-806: The current check in the claude_settings block awards 60
points if the "hooks" key exists even when it's empty; change the condition that
awards points (the if that checks "hooks" in content) to verify the hooks object
is non-empty (e.g., content.get("hooks") is truthy and, for dict/list, has len()
> 0) before incrementing score and appending evidence; update the check around
claude_settings.exists(), content = json.loads(...), and the branch that
modifies score and evidence to only run when content["hooks"] contains at least
one entry.
- Around line 893-895: Update the remediation example and guidance text that
mentions .claude/settings.json so the PostToolUse entries use the nested hooks
schema expected by the repo: each PostToolUse should be an object like
{"hooks":[{"type":"command","command": ...}]} (i.e., include the hooks array and
type key), not a flat {"command": ...} entry; modify the example config
string(s) and the lines mentioning PostToolUse in the assessor (the example
block in testing.py that currently lists the two hooks) to show the nested
structure and note the "type":"command" wrapper so users copy a valid
.claude/settings.json configuration.
In `@tests/unit/test_assessors_patterns.py`:
- Around line 448-571: Add a new unit that mirrors
test_deterministic_enforcement_bonus but uses a .claude/settings.json hooks
entry that references "docs/design" without enforcement wording (e.g., command
"check-design-doc.sh" or a benign matcher) and assert that
DesignIntentAssessor().assess(repo) does NOT grant the +15 deterministic bonus:
check finding.score remains 50.0 (or equals the dir-only baseline) and that no
evidence contains "deterministic"; place the test next to other tests like
test_deterministic_enforcement_bonus and reference DesignIntentAssessor,
.claude/settings.json, and PreToolUse in the test name and docstring for
clarity.
In `@tests/unit/test_assessors_testing.py`:
- Around line 976-994: The test test_agent_hooks_score_higher_than_precommit
currently reuses the same repo so precommit_finding inherits the agent hooks;
update the test to measure pre-commit in isolation by creating a fresh repo (or
removing the .claude settings) before calling _make_repo for precommit_finding:
i.e., ensure DeterministicEnforcementAssessor.assess is invoked on a repo that
only has .pre-commit-config.yaml (no .claude/settings.json) so agent_finding and
precommit_finding are independent and the assertions reflect the intended 60 vs
40 comparison.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: f301a2ef-3a06-4b7d-a0da-02ee22ca0dd1
📒 Files selected for processing (7)
.claude/settings.json.claude/skills/test-assess/SKILL.mddocs/attributes.mdsrc/agentready/assessors/patterns.pysrc/agentready/assessors/testing.pytests/unit/test_assessors_patterns.pytests/unit/test_assessors_testing.py
- Require non-empty hook entries before awarding 60 pts (not just key presence) - Require both design-doc reference AND enforcement verb for deterministic bonus - Fix remediation example to use correct nested hook schema - Isolate agent vs pre-commit scoring test with separate repo paths - Add negative test for hooks mentioning design docs without enforcement verbs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ReviewThe scoring reprioritization (A.5) and design intent enforcement detection (A.8) are well-implemented. The two fix commits addressed CodeRabbit's round-1 feedback (enforcement verb pattern, empty hooks handling, nested hook schema in remediation). Test coverage is thorough with 12 new tests covering advisory, deterministic, precedence, skill-based detection, and edge cases. Validated against real repos:
One suggestion (non-blocking): The Posted by Bill Murdock with assistance from Claude Code. |
# [2.46.0](v2.45.0...v2.46.0) (2026-05-29) ### Features * enforcement and intent assessor improvements (ADR A.5, A.8) ([#484](#484)) ([efe7507](efe7507))
|
🎉 This PR is included in version 2.46.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Summary
.claude/settings.json) now score 60 pts (up from 30), git hooks (pre-commit/Husky) score 40 pts (down from 60), pass threshold lowered to 40.claude/settings.json: auto-format on edit (black + isort) and destructive command blockerfind -deleteinstead ofrm -rf(which the new hook blocks)Implements Proposals A.5 and A.8 from the accepted ADR. Fourth of six implementation PRs.
Self-score change: 74.8 -> 75.5 Gold (agent hooks bring deterministic_enforcement from 50 to 100, crossing the Gold threshold).
A.5: Hook scoring reprioritization
The BP insight: "context file instructions are advisory; hooks are deterministic." Agent hooks always execute during agent workflows and cannot be bypassed, while git hooks can be skipped with
--no-verify. New scoring:.claude/settings.jsonwith hooks.pre-commit-config.yaml.huskywith hook scriptsRepos with only pre-commit still pass (40 >= 40). Repos with only agent hooks now pass (60 >= 40, was 30 < 60).
A.8: Design intent enforcement detection
The assessor now checks whether design doc updates are enforced, not just whether design docs exist. Two levels:
The higher of the two is awarded (not additive).
Starter hooks
Added two hooks from the BP recommended starters to
.claude/settings.json:blackandisortafter every Edit/Writerm -rf,DROP TABLE,--forceNote on the destructive command blocker: This hook is a lightweight guardrail, not a comprehensive safety solution. An agent can achieve the same destructive effect through alternative commands that don't match the pattern (e.g.,
find -deleteinstead ofrm -rf). In practice, many agents won't work around the restriction, so it's a net positive for catching obvious destructive operations, but it should not be relied on as a security boundary.Related issues
Test plan
black . && isort . && ruff check .passespytest tests/unit/passes (1122 passed, 17 skipped)agentready assess .runs successfully (75.5/100 Gold)Closes #461
Posted by Bill Murdock with assistance from Claude Code.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests
Chores