feat: sentinel-based disable guards and session-start-context hook#220
Conversation
…gate Ambient mode over-classifies simple tasks as ORCHESTRATED, wasting tokens on full agent pipelines for work that only needs GUIDED treatment. This separates intent classification from depth classification: the router determines WHAT kind of work; a new per-intent triage skill determines HOW BIG the work is, using actual context (file counts, issue content, commit count). Default-to-GUIDED bias inverts the previous default-to-ORCHESTRATED behavior. Additionally, implement and resolve workflows now verify CI passes after code changes via a new check-ci-status Git agent operation with polling and fix loop. Change A — Triage Layer: - 7 new triage skills (implement, debug, explore, plan, review, research, release) - Router dispatch table simplified to intent-only (single column) - Classification rules: depth criteria removed, QUICK criteria preserved - Preamble: intent-only classification trigger - Tests updated for new router format and triage skill chain Change B — CI Status Gate: - New check-ci-status operation in Git agent - Phase 7 CI gate in implement:orch (phases renumbered 7→8, 8→9) - Phase 7 CI gate in resolve:orch, resolve.md, resolve-teams.md - Polls pending CI (max 10 min), attempts fixes (max 2), graceful fallback
PR Review: Blocking Issues SummaryPhase 3 Synthesis — Findings from 9 Reviewers (≥80% confidence)Thank you for this well-architected triage layer. The reviewers identified 17 findings, with 13 at ≥80% confidence. Here's the consolidated list: BLOCKING ISSUES (Must fix before merge)1. Consistency: 4 Stale Phase Cross-References (95% confidence)
2. Regression: 2 Stale Phase References in pipeline:orch (92% confidence)
3. Architecture: 2 Blocking Issues with CI Status Gate (82-85% confidence)
4. Reliability: 2 CI Polling Issues (85-92% confidence)
5. Security: 1 Shell Injection Issue (82% confidence)
6. Complexity: 1 DRY Violation (88% confidence)
7. TypeScript: 1 Dead Code Issue (82% confidence)
8. Testing: 1 Missing Negative Test (85% confidence)
SHOULD-FIX ISSUES (Recommended but not blocking)1. Regression: Stale Format References in Adjacent Skills (85-90% confidence)
LOWER CONFIDENCE (60-79%)Four additional findings below 80% confidence are documented in individual review reports:
Summary
Recommendation: Generated by: Devflow Code Review (9 reviewers, synthesized 2026-05-14) See individual review reports in |
…EPTH format CHAT is always QUICK and never emits the 'Devflow: INTENT.' marker, so it was a dead branch in CLASSIFICATION_PATTERN. Remove it for clarity. Add negative test asserting the old INTENT/DEPTH slash format is rejected — applies ADR-001 (clean break philosophy: explicitly verify old format is not matched). Co-Authored-By: Claude <noreply@anthropic.com>
…hase 7 (CI Status Gate) was inserted into implement:orch and resolve:orch, shifting downstream phases by one. This commit corrects all stale phase references across resolve.md, resolve-teams.md, implement:orch, pipeline:orch, plan:orch, and test-driven-development to reflect the updated numbering. Also updates the Integration with Ambient Mode section in test-driven-development from the old INTENT/DEPTH format to the new parenthetical INTENT (DEPTH) format, and replaces the stale PIPELINE/ORCHESTRATED cost label in pipeline:orch with the canonical PIPELINE label. Co-Authored-By: Claude <noreply@anthropic.com>
…CI status classification rules in the check-ci-status operation to be more explicit about priority order: PENDING state takes precedence over FAILURE, which takes precedence over SUCCESS. This matches the actual implementation behavior and makes the intent clearer. Co-Authored-By: Claude <noreply@anthropic.com>
Review Comments - helpers.tsLine 5: CHAT removal from CLASSIFICATION_PATTERNGood catch: CHAT removal is correct. CHAT intents are always QUICK (never emit the 'Devflow: INTENT.' marker per classification-rules.md line 20), making CHAT a dead branch in this regex. This applies ADR-001 (clean break -- dead code removed without backward-compat). Confidence: 95% ✓ |
Review Comments - ambient.test.tsLines 405-409: Negative test for old INTENT/DEPTH formatExcellent negative test. The test correctly verifies rejection of the old slash format (e.g., 'Devflow: IMPLEMENT/ORCHESTRATED'). This applies ADR-001 (clean break philosophy -- explicitly verify old format is rejected rather than maintaining backward compatibility). Confidence: 95% ✓ |
Code Review Summary - PR #220APPROVED WITH CONDITIONSThe PR introduces a well-bounded CI Status Gate (Phase 7) with explicit iteration limits and correctly applies ADR-001 (clean break) by removing dead CHAT variant and migrating INTENT/DEPTH format. The changes are sound overall. Should-Fix Issues (80%+ Confidence)1. CI Status Gate duplicated across 4 files without enforcement (85%)Files:
Issue: The CI Status Gate logic (poll cadence, budget limits, fix attempt count) is replicated verbatim across 4 files. The Fix: Add a build-time verification script (scripts/verify-sync-markers.sh) that extracts content between SYNC markers and asserts byte-identical content. This converts the convention into an enforced invariant. 2. CI Status Gate poll/fix loop budget interaction is ambiguous (82%)Files:
Issue: Step 4 says "max 10 iterations" for a single PENDING poll loop, but step 6 says "max 10 polls across all check/fix cycles combined." If a fix attempt triggers new PENDING, does budget reset per cycle or is it shared? An agent interpreting step 4 literally could consume 10 polls per cycle, violating the global cap intent. Fix: Clarify that steps 4-5 describe per-cycle behavior but step 6 is the authoritative global cap. Reword step 4: "poll every 60 seconds (budget-limited, see step 6)." 3. Missing negative test for extractIntent with old INTENT/DEPTH format (82%)File: tests/ambient.test.ts Issue: The negative test at line 405 verifies hasClassification rejects old INTENT/DEPTH format. However, extractIntent uses the same CLASSIFICATION_PATTERN but lacks explicit verification it also rejects the old format. Fix: Add companion test case: it('returns null for old INTENT/DEPTH format', () => {
expect(extractIntent(textResult('Devflow: IMPLEMENT/ORCHESTRATED'))).toBeNull();
expect(extractIntent(textResult('Devflow: DEBUG/GUIDED'))).toBeNull();
});4. Missing SYNC marker content drift detection test (80%)File: tests/ambient.test.ts Issue: The CI Status Gate SYNC markers were added to prevent drift, but no test validates that content between Fix: Add a structural validation test that extracts content and asserts byte-identical matches across implement:orch, resolve:orch, and command files. 5. Missing extractDepth edge case test coverage (80%)File: tests/ambient.test.ts:420-423 Issue: The extractDepth tests only check canonical outputs. No coverage for casing/whitespace variations or verification that old slash format is not matched. Fix: Add edge case tests: it('extractDepth handles casing and whitespace variations', () => {
expect(extractDepth(textResult('scope: guided'))).toBe('GUIDED');
expect(extractDepth(textResult('Scope: ORCHESTRATED'))).toBe('ORCHESTRATED');
});
it('extractDepth does not match old slash format', () => {
expect(extractDepth(textResult('Devflow: IMPLEMENT/ORCHESTRATED'))).toBeNull();
});Lower-Confidence Suggestions (60-79%)SYNC marker scope could be formalized (70%)The CI Status Gate variations are not truly identical despite SYNC markers (65%)The SYNC markers imply content equivalence, but there are intentional context-specific differences (implement:orch vs resolve:orch have different Requires/skip conditions). Consider either: (a) narrowing the SYNC markers to wrap only the shared algorithm (steps 1-6), or (b) renaming to CI polling spawns new Git agent per iteration (65%)The CI Status Gate re-spawns a full Git agent on each of max 10 poll iterations. By design (agents handle all git operations per project conventions), and the 60-second interval makes per-poll overhead negligible. Total of ~12 agent spawns max is reasonable. No action required. Deduplication Notes
Overall Score: 8/10 Recommendation: APPROVED FOR MERGE with follow-up PRs to address the 5 should-fix items above. Review conducted by Devflow code-review system. Multiple specialized reviewers: security, architecture, complexity, performance, consistency, regression, testing, reliability, typescript. |
…e feature knowledge base for the Rules System CLI, documenting the architecture, build pipeline, install flow, manifest tracking, and component interactions. Includes constraints, anti-patterns, and gotchas for developers working on rules functionality. Co-Authored-By: Claude <noreply@anthropic.com>
Rename SYNC→PATTERN markers across all 4 copies of the CI Status Gate (implement:orch, resolve:orch, resolve.md, resolve-teams.md) to accurately reflect that the copies are the same pattern adapted to context, not byte-identical content. Context-specific Requires and skip conditions are moved outside the marker scope. Remove "max 10 iterations" from step 4 (PENDING branch) in all 4 files to eliminate the ambiguity where an agent could treat 10 as a per-cycle cap rather than the global budget declared in step 6. Replace with "global budget, see step 6" to make the reference explicit. Also replace hardcoded "proceed to Phase N" references in steps 2-3 with "proceed to next phase" so the marker body stays valid when surrounded by differently-numbered phases. Co-Authored-By: Claude <noreply@anthropic.com>
README.md showed 'Devflow: IMPLEMENT/ORCHESTRATED' as example model output. The new format since the triage layer introduction is 'Devflow: IMPLEMENT. Loading: devflow:implement:triage.' with triage emitting 'Scope: ORCHESTRATED'. Update the demo block accordingly so docs match the actual model output format. Co-Authored-By: Claude <noreply@anthropic.com>
…sections, and ci-status-gate drift - extractIntent: add negative test asserting old INTENT/DEPTH slash format returns null, mirroring the existing hasClassification negative test (applies ADR-001: clean break — explicitly verify old format is not matched) - extractDepth: add edge cases for casing/whitespace variations (SCOPE_PATTERN uses /i flag and \s* so these already pass) and old slash format rejection - Triage structural: extend existing frontmatter check to also assert each :triage skill has '## Scope Assessment' and '## Orchestration Hint Override' sections - ci-status-gate drift: add structural test extracting the PATTERN block from both implement:orch and resolve:orch and asserting steps 2-6 (shared polling logic) are identical; step 1 is intentionally context-specific and excluded Co-Authored-By: Claude <noreply@anthropic.com>
Add runtime sentinel files (.working-memory-disabled, .learning-disabled) so hooks respect enable/disable state independently of hook registration. Extract cross-feature context injection from session-start-memory into a new always-on session-start-context hook. Fix decisions scanner gap in stop-update-memory to gate on the decisions sentinel. Wire CLI enable/disable/status to manage sentinels. - New hook: session-start-context (always-on, never removed by --disable) - All 4 memory hooks: check .working-memory-disabled sentinel at startup - session-end-learning: check .learning-disabled sentinel at startup - decisions-usage-scan.cjs: check decisions/.disabled sentinel - init.ts: register session-start-context, manage both sentinels on init - memory.ts/learn.ts: --enable removes sentinel, --disable creates sentinel - uninstall.ts: remove session-start-context on uninstall - 35 new sentinel tests + session-start-context integration tests task-2026-05-15_1200
…entinel tests - Add missing removeDecisionsHook() call in uninstall.ts (pre-existing gap: decisions hook would be left in settings.json after uninstall) - Remove redundant init sentinel tests that only tested fs primitives (writeFileSync/unlinkSync idempotency), not application logic
| return 'missing'; | ||
| } | ||
|
|
||
| // ─── Context hook utilities ──────────────────────────────────────────────── |
There was a problem hiding this comment.
Architecture Issue (85% confidence): Context hook utilities break Single Responsibility
The addContextHook, removeContextHook, and hasContextHook functions are defined here in init.ts instead of a dedicated module. Every other hook feature (ambient.ts, memory.ts, learn.ts, etc.) follows a consistent pattern: utilities live in their own module, then init imports them.
Suggested fix: Extract to src/cli/commands/context.ts following the same pattern as memory.ts (~80 lines).
Reviewed by devflow code-review agent
| **Build-time asset distribution**: Skills and agents are stored once in `shared/skills/` and `shared/agents/`, then copied to each plugin at build time based on `plugin.json` manifests. This eliminates duplication in git. | ||
|
|
||
| **Working Memory**: Four shell-script hooks (`scripts/hooks/`) provide automatic session continuity. Toggleable via `devflow memory --enable/--disable/--status` or `devflow init --memory/--no-memory`. UserPromptSubmit (`prompt-capture-memory`) captures user prompt to `.memory/.pending-turns.jsonl` queue. Stop hook captures `response_text` (on `end_turn` only) to same queue, then spawns throttled background `claude -p --model haiku` updater (skips if triggered <2min ago; concurrent sessions serialize via mkdir-based lock). Background updater uses `mv`-based atomic handoff to process all pending turns in batch (capped at 10 most recent), with crash recovery via `.pending-turns.processing` file. Updates `.memory/WORKING-MEMORY.md` with structured sections (`## Now`, `## Progress`, `## Decisions`, `## Modified Files`, `## Context`, `## Session Log`). SessionStart hook → injects previous memory + git state as `additionalContext` on `/clear`, startup, or compact (warns if >1h stale; injects pre-compact memory snapshot when compaction happened mid-session). PreCompact hook → saves git state + WORKING-MEMORY.md snapshot + bootstraps minimal WORKING-MEMORY.md if none exists. Disabling memory removes all four hooks. Use `devflow memory --clear` to clean up pending queue files across projects. Zero-ceremony context preservation. | ||
| **Working Memory**: Five shell-script hooks (`scripts/hooks/`) provide automatic session continuity. Toggleable via `devflow memory --enable/--disable/--status` or `devflow init --memory/--no-memory`. Runtime sentinel: `.memory/.working-memory-disabled` — all four memory hooks check this sentinel at startup and exit immediately if present; `devflow memory --enable` removes it, `devflow memory --disable` creates it. UserPromptSubmit (`prompt-capture-memory`) captures user prompt to `.memory/.pending-turns.jsonl` queue. Stop hook captures `response_text` (on `end_turn` only) to same queue, then spawns throttled background `claude -p --model haiku` updater (skips if triggered <2min ago; concurrent sessions serialize via mkdir-based lock). Background updater uses `mv`-based atomic handoff to process all pending turns in batch (capped at 10 most recent), with crash recovery via `.pending-turns.processing` file. Updates `.memory/WORKING-MEMORY.md` with structured sections (`## Now`, `## Progress`, `## Decisions`, `## Modified Files`, `## Context`, `## Session Log`). SessionStart hook (`session-start-memory`) → injects previous memory + git state as `additionalContext` on `/clear`, startup, or compact (warns if >1h stale; injects pre-compact memory snapshot when compaction happened mid-session); gated by `.working-memory-disabled` sentinel. Always-on SessionStart hook (`session-start-context`) → injects cross-feature context (decisions TL;DR, learned behaviors) independently of working memory; sections internally gated by `decisions/.disabled` and `.learning-disabled` sentinels respectively. PreCompact hook → saves git state + WORKING-MEMORY.md snapshot + bootstraps minimal WORKING-MEMORY.md if none exists. Disabling memory removes the four memory hooks (session-start-context is always-on and never removed). Use `devflow memory --clear` to clean up pending queue files across projects. Zero-ceremony context preservation. |
There was a problem hiding this comment.
Documentation Issue (85% confidence): Hook count inconsistency
The text says "Five shell-script hooks" but then says "all four memory hooks check this sentinel". With the new session-start-context hook, there are now 6 total hooks (5 memory hooks + 1 context hook).
Suggested fix: Clarify the count:
- Five memory hooks (working memory layer)
- One session-start-context hook (cross-feature context injection)
- Total: Six hooks
Reviewed by devflow code-review agent
| ├── .decisions-runs-today # Daily run counter for decisions agent (date + count) | ||
| ├── .decisions-batch-ids # Session IDs for current decisions batch run | ||
| ├── .decisions.lock # Lock directory for decisions background agent (transient) | ||
| ├── .working-memory-disabled # Runtime sentinel — all 4 memory hooks exit immediately if present |
There was a problem hiding this comment.
Documentation Issue (88% confidence): Missing decisions/.disabled sentinel in file tree
The .memory/ directory tree documentation shows:
- .working-memory-disabled
- .learning-disabled
But it's missing:
- decisions/.disabled (the sentinel for the decisions agent)
Suggested fix: Add .memory/decisions/.disabled to the file tree list after the decisions-log.jsonl entry.
Reviewed by devflow code-review agent
| hasMemoryHooks, | ||
| } from '../src/cli/commands/memory.js'; | ||
| import { | ||
| addLearningHook, |
There was a problem hiding this comment.
Code Quality Issue (95% confidence): Unused imports
addLearningHook and hasLearningHook are imported but never used in this test file.
Suggested fix: Remove lines 17-19 to keep imports clean.
Reviewed by devflow code-review agent
| const result = addContextHook('{}', '/home/user/.devflow'); | ||
| const settings = JSON.parse(result); | ||
| expect(settings.hooks?.SessionStart).toBeDefined(); | ||
| const hasContextHook = settings.hooks.SessionStart.some( |
There was a problem hiding this comment.
Code Quality Issue (85% confidence): Variable name shadows imported function
The local variable hasContextHook shadows the imported function of the same name. This makes the code confusing — someone reading the test might think the variable and the function are the same thing.
Suggested fix: Rename the variable to contextHookExists or hasHook to avoid shadowing.
Reviewed by devflow code-review agent
Code Review Summary — Lower Confidence FindingsInline Comments Created (5 total)
Lower Confidence Findings (60-79%) — Deferred to SummaryThese suggestions scored 60-79% confidence and were not posted as inline comments per protocol:
Pre-existing / Out of Scope
Reviewed by devflow code-review agent |
…e-adds shared/rules/reliability.md to the referencedFiles array in .features/index.json so staleness detection for the cli-rules knowledge base correctly tracks changes to that file. Co-Authored-By: Claude <noreply@anthropic.com>
- Working Memory paragraph: clarify 'Four shell-script hooks' (not five) with parenthetical noting session-start-context is the always-on fifth - Decisions agent paragraph: add Runtime sentinel sentence for .memory/decisions/.disabled, matching Learning agent paragraph symmetry - .memory/ file tree: add .disabled sentinel entry to decisions/ subtree Co-Authored-By: Claude <noreply@anthropic.com>
- Create src/cli/commands/context.ts with addContextHook/removeContextHook/hasContextHook extracted from init.ts (SRP: each hook feature in its own module) - Create src/cli/utils/sentinel.ts with manageSentinel() to eliminate 6 copies of the identical enable/disable sentinel pattern across init.ts, memory.ts, learn.ts - Update init.ts: import from context.ts and sentinel.ts; 4 sentinel blocks collapsed to a single manageSentinel call each; re-export context utilities for backward compat - Update memory.ts: use manageSentinel; add best-effort queue drain on --disable (removes .pending-turns.jsonl and .pending-turns.processing orphans) - Update learn.ts: use manageSentinel for --enable and --disable sentinel management - Update uninstall.ts: import removeContextHook from context.ts (canonical source) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove unused imports (addMemoryHooks, removeMemoryHooks, hasMemoryHooks, addLearningHook, hasLearningHook); new CLI sentinel tests use manageSentinel directly per architecture context - Extract parseHookOutput() helper with structural assertions before property access, replacing bare JSON.parse().property chains throughout - Rename local 'hasContextHook' variables to 'hookPresent' to eliminate shadowing of the imported function of the same name - Add side-effect assertion to background-memory-update sentinel test (lock directory must not be created when disabled) plus a positive-path test - Fix conditional assertion in 'skips learned behaviors' test — add explicit else branch so the test never silently passes; fix same pattern in 'session-start-memory no longer outputs decisions TL;DR' - Add Part E: manageSentinel utility tests covering create, parent-dir creation, remove, idempotent enable, idempotent disable, and disable→enable lifecycle - Add Part F: sentinel existence check tests that verify the boolean logic backing the runtime-disabled warnings in memory --status and learn --status Co-Authored-By: Claude <noreply@anthropic.com>
…s - Remove unused gitRoot parameter from manageSentinel — only sentinelPath was ever used; dead parameter added noise to all 8 call sites - Remove Part F test block (4 tests) that tested fs.promises.access() behavior, duplicating Part E coverage via fs.existsSync Co-Authored-By: Claude <noreply@anthropic.com>
## Summary Implements ambient mode for Devflow — a zero-overhead session enhancement system with plan auto-detection and command awareness. Ambient mode consists of two independent, toggleable components: 1. **Plan Auto-Detection**: UserPromptSubmit hook detects structured implementation plans (marked with `## Goal`, `## Steps`, `## Files`) and emits a directive to invoke `devflow:implement` via the Skill tool. 2. **Command Awareness**: Always-on rule installed to `~/.claude/rules/devflow/commands.md` documents available `/devflow:<name>` commands and the auto-execution trigger. ## Key Features - **Zero Overhead**: Plan detection hook only outputs when a plan is detected; normal prompts are unaffected - **Toggleable**: Enable/disable via `devflow ambient --enable/--disable` or `devflow init` - **Independent Components**: Can be configured separately - **Passive Reference**: Commands rule provides documentation without changing behavior ## Implementation Details - Ambient mode state stored in manifest `features.ambient: boolean` - Hook implementation via `preamble` hook for plan detection - Commands rule managed directly by `ambient.ts` command (not through rules plugin system) - SessionStart hook injects plan marker context if plan is detected in current session - Feature toggleable separately from other Devflow systems (memory, learning, decisions, knowledge, rules) ## Testing Added comprehensive tests covering: - Hook addition/removal lifecycle - Rule file creation with stable content hash - Ambient mode enable/disable transitions - Edge cases (already enabled, already disabled, invalid states) - Cleanup of stale hooks/rules Closes #220 Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Dean Sharon <deanshrn@gmain.com> Co-authored-by: Claude <noreply@anthropic.com>
Summary
.memory/.working-memory-disabled,.memory/.learning-disabled) so hooks respect enable/disable state at the shell level, independently of whether hooks are registered in settings.jsonsession-start-memoryinto a new always-onsession-start-contexthook — memory can be disabled without losing context injectionstop-update-memoryran even when decisions were disabledChanges
New hook (
scripts/hooks/session-start-context):devflow memory --disabledecisions/.disabled; learning sections check.learning-disabledSentinel guards added to hooks:
prompt-capture-memory,stop-update-memory,background-memory-update,pre-compact-memory: exit on.working-memory-disabledsession-end-learning: exit on.learning-disableddecisions-usage-scan.cjs: exit ondecisions/.disabledstop-update-memory: decisions scanner invocation now gated on decisions sentinelCLI wiring (
init.ts,memory.ts,learn.ts,uninstall.ts):devflow init: registerssession-start-context(always-on, remove-then-add for idempotency); manages both sentinels based on feature statedevflow memory --enable: removes.working-memory-disabledsentinel;--disable: creates it;--status: warns if sentinel presentdevflow learn --enable: removes.learning-disabledsentinel;--disable: creates it;--status: warns if sentinel presentdevflow uninstall: removessession-start-contexthook from settings.jsonTests: 35 new sentinel tests in
tests/sentinel.test.tscovering all sentinel guard paths, scanner gating, context hook registration, CLI sentinel management, and shell syntax. Memory test suite updated:session-start-memorytests adjusted to reflect extraction; newsession-start-contextintegration test block added.Breaking Changes
None. The sentinel files are additive. Existing installs with hooks registered continue to work — sentinels only exist if explicitly created by
--disable.Reviewer Focus Areas
scripts/hooks/session-start-context: verify sections match what was insession-start-memoryexactly (1.4, 1.5, 1.75)src/cli/commands/init.ts:addContextHook/removeContextHook/hasContextHookfollow the same pattern as memory/learning hook utilitiesmemory.tsandlearn.tsenable/disable: verify both "already enabled" and "not yet enabled" paths still work correctly after the earlyreturnrefactoringtests/sentinel.test.ts: 35 tests — confirm coverage is complete for all guard paths