feat(learning): v2 unified self-learning + project knowledge#181
feat(learning): v2 unified self-learning + project knowledge#181
Conversation
…ng + project knowledge
Implements the four-component detection pipeline:
1. Channel-based transcript filter (D1, D2):
- scripts/hooks/lib/transcript-filter.cjs — testable CJS module
- Rejects isMeta, sourceToolUseID/toolUseResult, framework XML wrappers,
tool_result arrays, empty turns (<5 chars)
- Produces USER_SIGNALS (workflow/procedural) and DIALOG_PAIRS (decision/pitfall)
2. Per-type thresholds + 4-type detection (D3, D4):
- THRESHOLDS constant: workflow(req=3,spread=3d), procedural(req=4,spread=5d),
decision(req=2,spread=0), pitfall(req=2,spread=0)
- calculateConfidence(count, type) — per-type required count
- process-observations extended: all 4 types valid, quality_ok stored/honoured,
per-type promotion gate requires quality_ok===true
- filter-observations updated to include all 4 types
3. Deterministic renderers (D5):
- render-ready <log> <baseDir> — dispatches by type:
workflow → .claude/commands/self-learning/<slug>.md
procedural → .claude/skills/self-learning:<slug>/SKILL.md
decision → .memory/knowledge/decisions.md#ADR-NNN (with lock, capacity=50)
pitfall → .memory/knowledge/pitfalls.md#PF-NNN (with lock, dedup, capacity=50)
- All writes atomic (tmp+rename), manifest schemaVersion=1 maintained
4. Reconciler + feedback loop (D6, D13):
- reconcile-manifest <cwd> — session-start op, detects deletions (confidence×0.3,
status=deprecated) and edits (hash update only, no penalty per D13)
- Anchor-based checks for knowledge file entries (ADR/PF missing = deletion)
- Stale manifest entries (no matching obs) silently dropped
- session-start-memory: reconciler call added before TL;DR injection
5. Additional ops:
- merge-observation <log> <obs>: in-place dedup/reinforce (D14), FIFO cap 10 (D12),
ID collision recovery _b suffix (D11), Levenshtein mismatch flagging
- knowledge-append: standalone knowledge file writer
- acquireLock()/releaseLock(): shared lock helper extracted from shell patterns
6. New background-learning pipeline:
- extract_batch_messages: uses transcript-filter.cjs, produces USER_SIGNALS + DIALOG_PAIRS
- build_sonnet_prompt: new 4-type prompt, no artifact sections (D10, D5)
- render_ready_observations: calls render-ready after process_observations
- check_staleness: grep-based code-ref staleness pass (D16)
Unit tests: 9 test files, 80 new tests (100% passing)
Known regressions: 2 tests in shell-hooks.test.ts encode old behaviour
(confidence formula, quality_ok promotion gate) — Coder 3 will update
Removed the Extraction Procedure and Loading sections. The SKILL.md now serves only as a format reference for on-disk knowledge files. Writing is performed exclusively by scripts/hooks/background-learning via json-helper.cjs render-ready. Added D9 comment explaining the change. Frontmatter updated to remove Write from allowed-tools (now read-only).
…ment/code-review/debug/resolve Phase 10 (implement), Phase 5 (code-review), Phase 6 (debug/resolve) previously recorded decisions/pitfalls by invoking knowledge-persistence SKILL. Removed in v2 because agent-summaries produced low-signal entries. Added D8 comment at top of each command file. Knowledge is now extracted from user transcripts by background-learning. Phase numbers renumbered in resolve.md to fill the gap.
…anifests Removed from devflow-implement, devflow-code-review, devflow-resolve plugin.json skills arrays and corresponding plugins.ts entries. debug and plan plugins retain knowledge-persistence as they still read knowledge for context (Phase 1). ambient and core-skills also retain it.
…tries + review attention Added learningCounts HUD component (15th component): - getLearningCounts() reads .memory/learning-log.jsonl, counts status=created entries by type (workflow/procedural/decision/pitfall) and attention flags (mayBeStale, needsReview, softCapExceeded) - Shows "Learning: 3 workflows, 2 skills, 8 decisions, 12 pitfalls" - Appends "⚠ N need review" when attention flags are set - Graceful fallback: returns null when log missing or unparseable - D15 comment: soft cap + attention counter, not auto-pruning
…mands --review: interactively reviews flagged observations (mayBeStale/needsReview/ softCapExceeded). User can deprecate (updates status + knowledge file Status field), keep (clears flags), or skip. Writes log atomically after review session. updateKnowledgeStatus() acquires mkdir lock before updating decisions/pitfalls Status. --purge-legacy-knowledge: one-time removal of low-signal v1 entries (ADR-002, PF-001, PF-003, PF-005) from knowledge files with confirmation prompt. Also updated LearningObservation type to include v2 fields: - type now accepts 'decision' | 'pitfall' - status now accepts 'deprecated' - Added mayBeStale, staleReason, needsReview, softCapExceeded, quality_ok fields isLearningObservation() type guard updated accordingly. formatLearningStatus() updated to show all 4 types + needReview count.
…tests hud-counts.test.ts (9 tests): - Counts created entries by type correctly - Counts needReview from attention flags regardless of status - Graceful null on missing log, parse error, empty file - Skips malformed lines and processes valid ones - Multiple flags on one entry count as 1 needReview review-command.test.ts (15 tests): - Validates v2 type support (decision, pitfall, deprecated status) - Validates attention flag detection and log mutation on deprecate/keep - Tests updateKnowledgeStatus against decisions.md and pitfalls.md - Tests graceful behavior when file/anchor missing end-to-end.test.ts (3 integration tests): - Full pipeline: 3 sessions → claude shim → all 4 observation types in log - reconcile-manifest marks deleted artifact observation as deprecated - Graceful exit with no batch IDs file
…ity_ok gate - confidence assertion: 0.40 → 0.66 (workflow req=3, count=2: floor(2*100/3)/100=0.66) - temporal spread promotion test: add quality_ok:true to fixture (new gate requires it) Co-Authored-By: Claude <noreply@anthropic.com>
- isLearningObservation: accept decision, pitfall, deprecated status, quality_ok field - formatLearningStatus: test all-4-type counts, decision/pitfall promoted entries Co-Authored-By: Claude <noreply@anthropic.com>
- CLAUDE.md Self-Learning: describes 4 observation types, channel-based filter, per-type thresholds, render-ready dispatch, reconcile-manifest feedback loop, quality_ok gate, new CLI flags (--review, --purge-legacy-knowledge), manifest file - CLAUDE.md Skills: remove knowledge-persistence from Write list (now read-only) - CLAUDE.md memory dir: add .learning-manifest.json entry; note render-ready writes decisions/pitfalls - README.md: expand self-learning feature description to mention all 4 types - README.md: annotate learn --enable with 4-type extraction note Co-Authored-By: Claude <noreply@anthropic.com>
- Full rewrite of self-learning.md: 4 observation types, channel-based filtering (USER_SIGNALS vs DIALOG_PAIRS), per-type thresholds table, status machine, render-ready 4-target dispatch, reconcile-manifest feedback loop, HUD row format, key design decisions (D8, D9, D13, D15, D16) - Added --review and --purge-legacy-knowledge to CLI reference - working-memory.md: add Self-Learning sibling system cross-reference paragraph Co-Authored-By: Claude <noreply@anthropic.com>
P0 fixes: - render-ready now sets softCapExceeded (was pendingCapacity) so HUD, --review, and SKILL spec all agree on the attention flag name. - updateKnowledgeStatus uses the canonical .memory/.knowledge.lock path instead of .memory/knowledge/.knowledge.lock so CLI updates actually serialize against json-helper.cjs render-ready / knowledge-append. - updateKnowledgeStatus and --purge-legacy-knowledge now write atomically via a .tmp + rename helper mirrored from json-helper.cjs writeFileAtomic. - json-helper.cjs knowledge-append acquires the shared knowledge lock (previously no lock at all despite the SKILL.md contract). - --review acquires .learning.lock around the interactive loop and persists the log after every deprecate/keep so a Ctrl-C never leaves the log out of sync with knowledge-file Status updates. - --purge-legacy-knowledge acquires .knowledge.lock around the purge loop and writes atomically. - Pitfall render template now emits `- **Status**: Active` so `devflow learn --review` deprecate can flip PF Status the same way it flips ADR Status. knowledge-append kept in sync. SKILL.md spec updated to document the new field + semantics. P1 fixes: - Added JSDoc `DESIGN: D4` annotation at the quality_ok promotion check in process-observations; previously only D3 was called out in-line. - Refactored shared CLI lock acquisition into acquireMkdirLock() so --review and --purge-legacy-knowledge share one implementation. - review-command.test.ts updateKnowledgeStatus tests now mirror the production .memory/knowledge/ layout so the lock lands inside tmpDir rather than polluting the shared system temp root. All 756 unit tests pass. Build is clean.
…cs, needsReview JSDoc - Add D7 greenfield migration to background-learning: on first v2 run, any learning-log.jsonl lacking quality_ok fields is renamed to .v1.jsonl.bak and replaced with an empty log. No dual-writer period needed. - Add migration.test.ts (4 tests): v1 log moves to .bak, v2 log untouched, no-op when log absent, mixed entries treated as v2. - Correct threshold table in docs/self-learning.md: workflow=3d/req=3, procedural=5d/req=4, decision=no spread/req=2, pitfall=no spread/req=2. - Fix needsReview JSDoc in learn.ts: set by merge-observation on Levenshtein ratio < 0.6, not by reconcile-manifest (which sets status=deprecated).
…e-legacy-knowledge Delete stale .memory/PROJECT-PATTERNS.md from devflow and alefy projects — nothing generates or reads this file anymore. Extend --purge-legacy-knowledge in learn.ts to also remove PROJECT-PATTERNS.md if present, so future orphans are cleaned atomically with the knowledge entry purge.
….md with propagation test Add citation instruction bounded by HTML markers to knowledge-persistence SKILL.md (canonical), coder.md, and reviewer.md — ensures agents cite ADR/PF IDs in summaries so usage can be tracked for capacity reviews. Update capacity limit from 50 to 100 (hard ceiling per D17). Add propagation test to skill-references.test.ts that verifies byte-identical sentence across all three files.
…er.cjs Add KNOWLEDGE_SOFT_START/HARD_CEILING/THRESHOLDS constants (D17), countActiveHeadings (D18, skips Deprecated/Superseded), readUsageFile/writeUsageFile, readNotifications/ writeNotifications, crossedThresholds (D22), registerUsageEntry (D20), and acquireKnowledgeUsageLock/releaseKnowledgeUsageLock helpers. Guard main CLI execution with require.main === module check so the file can be required as a module in tests. Export all helpers for unit testing via module.exports guard. Add 17 unit tests in capacity-thresholds.test.ts covering all new helpers.
…ons (D17-D22, D26) - Remove local CAPACITY=50 const; use KNOWLEDGE_HARD_CEILING (100) from module scope - D18: count only active (non-deprecated/superseded) headings for capacity check - D17: hard ceiling blocks append at 100 active entries; fires error-level notification - D20: register each new entry in .knowledge-usage.json with zero cite count - D21: first-run seed — if no .notifications.json exists and count >= KNOWLEDGE_SOFT_START, treat effective previous count as 0 so all relevant thresholds fire immediately - D22: per-append threshold crossing detection; fires notification at highest crossed - D24: severity escalates dim→warning (≥70) →error (≥90) - D26: TL;DR comment reflects active-only count; Key list includes only active IDs - D27: per-file notification keys (knowledge-capacity-decisions / knowledge-capacity-pitfalls) - D28: dismissed notifications re-fire when a higher threshold is crossed - Applies same capacity+notification logic to knowledge-append case (latent bug fix) - 7 new integration tests in capacity-thresholds.test.ts; updated render-decision.test.ts Co-Authored-By: Claude <noreply@anthropic.com>
… (D19, D29) Scans assistant messages for ADR-NNN/PF-NNN citations after queue append, incrementing cites + updating last_cited in .knowledge-usage.json under lock. Unregistered IDs are silently ignored. Scanner runs as supplementary pass — memory capture remains mission-critical and is never blocked by scan failures. Co-Authored-By: Claude <noreply@anthropic.com>
…24, D27) Reads .notifications.json and surfaces the worst active+undismissed notification in a new HUD component row. Severity-scaled colors: dim (50-69), yellow (70-89), red (90-100). Picks highest severity across all per-file entries (D27). Dismissed notifications are re-shown when a new threshold is crossed. Co-Authored-By: Claude <noreply@anthropic.com>
…E-23) Apply path.resolve() normalization to eliminate traversal sequences in the --cwd argument before constructing any filesystem paths. All legitimate callers already pass absolute paths; this guards against malformed inputs. Co-Authored-By: Claude <noreply@anthropic.com>
…capacity (D23, D28) - Add count-active operation to json-helper.cjs (D23: TS→CJS bridge for countActiveHeadings without duplicating logic) - Add --dismiss-capacity flag: sets dismissed_at_threshold on active capacity notifications so HUD silences them until next threshold crossing - Extend --review with mode picker (observations vs capacity) — lazy-loads observations only for the selected mode - Capacity mode: parses knowledge entries, filters 7-day-protected entries, sorts by least-used (cites ASC, last_cited ASC NULLS FIRST, created ASC), shows top-20 via p.multiselect, batch-deprecates selected, clears notifications if count drops below soft start (D28) - Update softCapExceeded JSDoc to reflect D17 (hard ceiling at 100) - Add tests: count-active op, notification dismissal persistence Co-Authored-By: Claude <noreply@anthropic.com>
- Fix P0 bug in countActiveHeadings: status lookup bled across entry boundaries (slice to end-of-file instead of current section), causing entries without a Status field to inherit a later entry's Deprecated status. Same fix applied to buildUpdatedTldr for TL;DR active-IDs. - Replace Record<string, any> with typed NotificationFileEntry interface in learn.ts (eliminates 2 `any` type violations per CLAUDE.md rules). - Add regression test for cross-entry status bleed edge case.
Add .notifications.json, .knowledge-usage.json, and .learning-manifest.json to transient files removed by --reset. Also clean up stale .knowledge-usage.lock directory (mkdir-based lock needs rmdir, not unlink).
Introduce src/cli/utils/migrations.ts with a typed Migration registry (D31) and runMigrations runner. State persists at ~/.devflow/migrations.json (D30, scope-independent). Migrations run at most once per machine (global scope) or per-project sweep (per-project scope, D35 parallel). Extracted purgeLegacyKnowledgeEntries into src/cli/utils/legacy-knowledge-purge.ts as a pure no-UI helper (D34), and migrateShadowOverridesRegistry into src/cli/utils/shadow-overrides-migration.ts. Wired runMigrations into devflow init after memory-dir creation (D32, always-run-unapplied). Failures are non-fatal and do not mark applied (D33). Shadow-overrides global migration retrofits the prior inline call (D36). Removed --purge-legacy-knowledge from devflow learn (now automated via migration purge-legacy-knowledge-v2). Closes task-2026-04-12_auto-migrate-legacy-knowledge
- Remove --purge-legacy-knowledge reference from docs/self-learning.md (flag was removed in f99588e but doc still advertised it). - Tighten MIGRATIONS and registryOverride types to readonly for consistency with FLAG_REGISTRY pattern in flags.ts. - Add D37 edge-case lock-in test: per-project migration with empty discoveredProjects is marked applied via vacuous truth.
|
CRITICAL — Install ordering regression Shadow-override migration is running AFTER
User impact: Customized shadows on V1→V2 upgraders are silently lost on first init. Fix: Move the |
|
HIGH — Busy-wait CPU spin in lock acquisition 95% confidence The mkdir-lock retry loop uses a synchronous CPU spin instead of sleeping: while (Date.now() < end) { /* empty loop burns CPU */ }Under contention with Fix: Replace with const SLEEP_BUF = new Int32Array(new SharedArrayBuffer(4));
function syncSleep(ms) { Atomics.wait(SLEEP_BUF, 0, 0, ms); }
// Replace lines 64-66
syncSleep(10); // Yield instead of burning CPUOr convert to async and use |
|
HIGH — Shadow migration warnings silently dropped 92% confidence The registry wrapper (lines 55-58) awaits The original inline code in init.ts produced:
These warnings are now silently dropped. Fix: Either extend run: async (ctx) => {
const { migrated, warnings } = await migrateShadowOverridesRegistry(ctx.devflowDir);
// Surface via console or return from run()
}Or add an optional |
|
HIGH — Unsafe JSON.parse assignment bypasses type 92% confidence
If the file is malformed but parses (array Fix: Add a type guard matching the pattern in function isNotificationMap(v: unknown): v is Record<string, NotificationFileEntry> {
return typeof v === 'object' && v !== null && !Array.isArray(v)
&& Object.values(v).every(e =>
typeof e === 'object' && e !== null
&& (e.active === undefined || typeof e.active === 'boolean')
// ... validate other fields
);
}
let notifications: Record<string, NotificationFileEntry> = {};
try {
const parsed: unknown = JSON.parse(await fs.readFile(notifPath, 'utf-8'));
if (isNotificationMap(parsed)) notifications = parsed;
} catch { /* no file */ } |
|
HIGH — Type assertion bypasses runtime validation (88% confidence) severity: (worst.entry.severity as NotificationData['severity']) ?? 'dim' The source is typed as string | undefined, but coerced to 'dim' | 'warning' | 'error' without runtime check. Fix: Use a type guard before assignment. |
|
HIGH — Per-project parallel sweep unbounded (82% confidence) Promise.allSettled(discoveredProjects.map(...)) has no concurrency limit. With 50-200 projects, this risks EMFILE (too many open files) on macOS (ulimit -n = 256). Fix: Use parallelMap helper with 8-16 worker limit to serialize filesystem access. |
|
CRITICAL — Teams-variant commands still invoke knowledge-persistence (95% confidence) Base .md commands removed their "Record Pitfalls/Decisions" phases (D8 refactor), but -teams.md variants were NOT updated. Teams users still get instructions to use a skill that no longer has Write capability. Affected: code-review-teams.md, resolve-teams.md, debug-teams.md, implement-teams.md Fix: Apply same D8 removals to all four teams variants. Users installing with --teams lose knowledge capture — behavioral regression. |
|
HIGH — Test isolation violation (95% confidence) E2E test writes to real ~/.claude/projects/ and ~/.devflow/logs/. If test aborts, artifacts leak into live user directory. Fix: Override HOME in beforeEach/afterEach to contain test artifacts in tmp directory (pattern already in tests/migrations.test.ts:136-148). |
|
SUMMARY: PR Comments Created Inline CommentsCreated 8 inline comments for CRITICAL/HIGH blocking issues (≥80% confidence):
Deduplicated Findings
Lower-Confidence Findings (60-79%)Reserved for summary comment (not posted as inline). Includes:
Rate Limiting
Next Steps: Review findings, apply fixes, and return for follow-up verification. |
…y-knowledge-purge Use O_EXCL (flag: 'wx') when writing the .tmp file so the kernel rejects the open if the path already exists — including an attacker-placed symlink. On EEXIST, unlink the stale/adversarial .tmp and retry once. Adds a regression test that places a symlink at the .tmp location and verifies the sentinel target is not overwritten after the purge completes. Co-Authored-By: Claude <noreply@anthropic.com>
…vation fields - Replace bare type assertion with isSeverity() guard on notification severity - Validate JSON.parse output shape before use in getActiveNotification - Extend isRawObservation to validate optional boolean flags (mayBeStale, needsReview, softCapExceeded) when present - Add never exhaustiveness check to ObservationType switch to catch future union extensions at compile time Co-Authored-By: Claude <noreply@anthropic.com>
…jection - Add isNotificationMap() type guard — validates .notifications.json is a plain object map before narrowing; malformed input falls back to empty map with a warn rather than corrupting state (fixes unsafe parse at both --review and --dismiss-capacity) - Add isCountActiveResult() type guard — validates count-active result has a numeric count field before narrowing; malformed output falls back to 0 - Add structural guard on .knowledge-usage.json parse — explicit typeof/Array.isArray checks before accessing .entries, matching the intent of the existing version check - Replace execSync() shell interpolation with execFileSync() argv array — eliminates shell metacharacter injection through cwd-derived jsonHelperPath and filePath - Harden writeFileAtomic() with flag:'wx' + EEXIST retry — detects stale .tmp files from prior crashes and unlinks before retrying, preventing silent TOCTOU overwrite Co-Authored-By: Claude <noreply@anthropic.com>
…onciler - CLAUDE.md: split "workflow/procedural: 3 required" into per-type values (workflow: 3, procedural: 4) to match THRESHOLDS in json-helper.cjs - docs/self-learning.md: correct promotion predicate from "observations >= required" to "confidence >= promote", include the confidence-clamping formula and effective observation counts at which each type promotes - docs/self-learning.md: correct reconciler "unchanged" case from "observation reinforced" to "counted in telemetry only (no confidence change)" — the code only increments a counter, no confidence write occurs Co-Authored-By: Claude <noreply@anthropic.com>
…s (D8) Base commands had Record Pitfalls/Decisions phases removed in the D8 refactor (knowledge-persistence skill is read-only), but their paired -teams.md variants were not updated, leaving silent no-ops for teams users. - Remove "Record Pitfalls" phase from code-review-teams.md (was Phase 6) and renumber cleanup phase 7→6; update Architecture diagram - Remove "Record Pitfalls" phase from resolve-teams.md (was Phase 6) and renumber simplify/debt/report phases 7→6, 8→7, 9→8; update diagram - Remove "Record Pitfall" phase from debug-teams.md (was Phase 9) and update Architecture diagram - Remove "Record Decisions" block from implement-teams.md Phase 10; update Architecture diagram - Fix stale "Phase 9" reference in resolve.md and resolve-teams.md Output Artifact sections (correct phase is now 8) Add D8 HTML comments at each removal site explaining the rationale. Co-Authored-By: Claude <noreply@anthropic.com>
Four security and reliability issues found in the hook scripts: - background-learning: pass stale_ref as process.argv[1] instead of interpolating into the node -e JS string. Eliminates shell/JS injection if the grep regex is ever relaxed and fixes apostrophes in path names corrupting staleReason strings. - knowledge-usage-scan: fix path traversal guard that was a no-op — path.resolve() unconditionally returns absolute, so the isAbsolute check after resolve never fired. Now rejects relative rawCwd before resolving (CWE-23 hardening is now effective). - knowledge-usage-scan: replace busy-wait CPU spin in acquireLock with Atomics.wait, eliminating 100% CPU usage during the 2-second lock timeout window on every Stop hook invocation. - json-helper.cjs + knowledge-usage-scan: add wx (O_EXCL) flag to all atomic .tmp writes, matching the pattern already established in legacy-knowledge-purge.ts. Prevents TOCTOU symlink attacks where an attacker places a symlink at the .tmp path between stat and open. Co-Authored-By: Claude <noreply@anthropic.com>
…8 refactor knowledge-persistence is now a format-spec-only skill (D9) — the background extractor is the sole writer. Remove it from plugin.json skills arrays in devflow-plan, devflow-debug, and devflow-ambient; remove from skimmer.md frontmatter; update skills-architecture.md to reflect format-spec-only role; fix devflow-implement README skill count; add FORMAT_SPEC_SKILLS exclusion in build.test.ts; remove stale ambient knowledge-persistence assertion from plugins.test.ts. All 845 tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
Addresses 9 issues found in the r1-init-migrations review batch: - #1: Move runMigrations block before installViaFileCopy so V1→V2 shadow renames complete before the installer looks for V2-named directories - #2: Extend Migration.run to return MigrationRunResult { infos, warnings }; both registry entries now surface migrated counts and conflict warnings to init.ts, which logs them via p.log.info / p.log.warn after the migration loop - #3 (ISP): Split MigrationContext into GlobalMigrationContext | PerProjectMigrationContext discriminated union; drop unused claudeDir field; empty-string sentinels removed - #4: Cap per-project Promise.allSettled concurrency at 16 via pooled() helper to avoid EMFILE on machines with 50-200 projects - #5: Accumulate newlyApplied in memory and write state once at end of runMigrations — eliminates O(N²) writeAppliedMigrations calls per run - #6: Use { flag: 'wx' } exclusive-create on .tmp file with unlink+retry on EEXIST to prevent TOCTOU symlink writes - #7: Add exhaustiveness assertion (never) on migration.scope dispatch so future union extensions cause a runtime throw instead of silent no-op - #8 (D37): Document vacuous-truth edge case in runMigrations comment block where discoveredProjects=[] marks per-project migration applied without sweeping any project - #9: Convert applied array to Set<string> before the migration loop for O(1) .has() lookups instead of O(N) .includes() per migration Co-Authored-By: Claude <noreply@anthropic.com>
Issue 1 (E2E HOME isolation): Override HOME via vi.stubEnv/vi.unstubAllEnvs in beforeEach/afterEach of the E2E learning test so session JSONL files are planted under a tmpdir fake home rather than the developer's real ~/.claude/projects/. Remove dead DEVFLOW_E2E_TEST env var (never read in codebase). Issue 2 (staleness reimplementation): Extract the staleness detection algorithm from background-learning's inline shell loop into scripts/hooks/lib/staleness.cjs — a proper CJS module that is both callable by the shell script and importable by tests. Update background-learning to delegate to it via node. Update staleness.test.ts to import the real implementation instead of a TypeScript reimplementation. Issue 3 (runMigrations seam): Add a runMigrations integration seam test suite in tests/init-logic.test.ts that exercises runMigrations with probe migrations injected via registryOverride, verifying correct devflowDir context, per-project root distribution, and idempotency. Co-Authored-By: Claude <noreply@anthropic.com>
- notifications.ts: drop redundant data intermediate after isNotificationMap narrows - init.ts: rename lambda params shadowing `p` clack namespace import - learn.ts: dot access over bracket notation in isCountActiveResult guard
PR Comments: Code Review FindingsOverviewThis is an incremental review of 10 commits (0dd9e24...HEAD) addressing prior review findings on security hardening, architecture refinement, and knowledge-system consistency. All three prior CRITICAL regressions are resolved. However, four new HIGH-severity findings block merge, and several MEDIUM findings should be addressed. HIGH-Severity Findings (Blocking)1. Race-Tolerance Divergence in Atomic-Write Pattern (Confidence: 95%)Files: Problem: The wx-flag atomic-write pattern spreads to 5 sites, but race-tolerance diverges between TypeScript and JavaScript implementations:
This is a correctness bug: the documented intent is "stale tmp — unlink and retry once"; the TS sites violate the documented contract by not handling the expected race condition. Fix: Extract a shared try { await fs.unlink(tmp); } catch { /* race — already removed */ }2.
|
| Category | CRITICAL | HIGH | MEDIUM | LOW |
|---|---|---|---|---|
| Blocking | 0 | 6 | - | - |
| Should Fix | - | 0 | 8 | - |
| Pre-existing | - | - | - | - |
Additional MEDIUM Findings (Should Fix)
- Architecture: Contract drift between
ctx.devflowDirparameter and internal state file location (migrations.ts:237-243) - Architecture:
writeFileAtomicduplicated 4× — extract shared helper (migrations.ts, learn.ts, legacy-knowledge-purge.ts, json-helper.cjs) - Architecture:
runMigrationsuses directos.homedir()hurting testability (migrations.ts:243) - Complexity:
runMigrationstouches 5 thresholds at once — extract scope branches into helpers (migrations.ts:236-347) - Complexity:
init.tsmigration-runner block adds 4 sequential output loops — extract reporter (init.ts:768-794) - Consistency:
staleness.cjswrites learning-log non-atomically, breaking atomic-write contract elsewhere (staleness.cjs:92) - Testing:
writeExclusivehelper in json-helper.cjs has no dedicated test - Testing: New runtime type guards have no direct test coverage (isSeverity, isNotificationMap, isCountActiveResult, isRawObservation)
- TypeScript:
const parsed = JSON.parse(raw)at learn.ts:1127 relies on implicitanyproperty access
Recommendation
CHANGES_REQUESTED — Merge blocked until all six HIGH findings are addressed. The fixes are narrowly scoped (extract helpers, remove casts, normalize tags, rewrite tests). Once addressed, the PR delivers substantial security hardening, architectural refinement, and testing improvements.
Incremental Review Context
This is an incremental review of PR #181 resolving prior findings. The commit range covers 10 commits addressing:
- PF-007 (migrations-after-installer) — resolved via install ordering fix ✓
- PF-008 (teams-variant drift) — resolved via synchronized phase removal ✓
- PF-009 (busy-wait in per-turn hooks) — resolved via Atomics.wait ✓
- PF-010 (unvalidated JSON.parse) — partially resolved via new runtime guards (remaining casts need fixing)
Full test suite: 848/848 passing. CLI build: clean.
Rename acquireLock → acquireMkdirLock in json-helper.cjs to match the name used in learn.ts and legacy-knowledge-purge.ts. Update all five call sites within the same file. Document why the bash acquire_lock in background-learning uses different timeout values (90 s / 300 s) than the Node helpers (30 s / 60 s): the bash lock guards the entire Sonnet analysis pipeline including a 180 s watchdog, not just file I/O. The deviation is intentional; the new comments make that explicit rather than leaving it as silent drift. Update knowledge-persistence/SKILL.md to distinguish the three lock paths (.knowledge.lock, .learning.lock, .knowledge-usage.lock) and document their separate timeout contracts. Co-Authored-By: Claude <noreply@anthropic.com>
Extract runMigrationsWithFallback from init.ts and rewrite the three D32/D35 seam tests to test the init-level D37 fallback rule directly via an injected runner spy, not runMigrations internals. Adds four targeted tests covering the non-empty, gitRoot-fallback, empty, and devflowDir-passthrough cases. Add three security tests to knowledge-usage-scan.test.ts covering relative-cwd rejection (CWE-23, exit code 2), symlink TOCTOU hardening (wx + EEXIST unlink-retry does not follow symlink to sentinel file), and Atomics.wait lock serialisation (both concurrent invocations complete; final count is 2 with no data loss). Co-Authored-By: Claude <noreply@anthropic.com>
…ision and Migration casts - Create src/cli/utils/fs-atomic.ts: canonical TS writeFileAtomicExclusive with race-tolerant unlink (try/catch before retry), matching CJS json-helper.cjs and knowledge-usage-scan.cjs. All 3 TS atomic-write call sites (learn.ts, legacy-knowledge-purge.ts, migrations.ts) now import from this single source. - Create src/cli/utils/notifications-shape.ts: consolidated NotificationEntry interface and isNotificationMap guard using the STRONGER definition (validates both top-level map and each entry value). Imported by learn.ts and notifications.ts, eliminating the two incompatible local definitions. - Rebase legacy-knowledge-purge.ts D35 → D39 (D35 was colliding with the per-project concurrency cap decision in migrations.ts). - Extract runGlobalMigration and runPerProjectMigration helpers from the 112-line runMigrations body. Each helper encapsulates one scope's dispatch logic including error handling; runMigrations now orchestrates loading/saving state + dispatches. - Remove direct as Migration<'global'>/'per-project' casts from the original inline dispatch (HIGH #4). Casts are re-introduced only at the new helper-call boundary with explicit comments explaining the generic-narrowing constraint. - MED #7 (init.ts warn/info level): confirmed already correct — no change needed. Co-Authored-By: Claude <noreply@anthropic.com>
…rd and TOCTOU tests MED #4: Extract reportMigrationResult() to migrations.ts (co-located with RunMigrationsResult and MigrationLogger types). Removes the 17-line, 5-branch reporting block from runMigrationsWithFallback in init.ts; init.ts now delegates to the extracted helper. MigrationLogger and reportMigrationResult are exported from migrations.ts; init.ts re-exports MigrationLogger for backward compat. MED #6+#8: Split isRawObservation into two phases. Introduce VALID_OBSERVATION_TYPES constant as the single source of truth for valid type values — drives both the guard (phase 1: required fields + includes check) and the exhaustiveness check in the switch (the const type union constrains ObservationType). Extract isOptBool() helper for the three optional flag checks (phase 2), eliminating the mixed-concern boolean chain. MED #10: Add adversarial-input tests for all four type guards. - isRawObservation: 7 cases via getLearningCounts (invalid type, missing required fields, null/array JSON, non-boolean optional flag, valid minimal entry) - isNotificationMap: 11 cases testing null/undefined/array/number/string/primitive-entry/ null-entry/array-entry/empty-map/valid-entry/multi-entry - isSeverity: 2 behavioral cases via getActiveNotification (unknown + null severity fall back to 'dim') - reportMigrationResult: 8 cases covering all branches (empty, failure with/without project, infos, warnings, newlyApplied, verbose on/off) MED #9: Add TOCTOU test for json-helper.cjs writeExclusive (via exported writeFileAtomic). 4 cases: basic write, overwrite, symlink pre-placed at .tmp path (sentinel unchanged), stale .tmp from prior crash. Mirrors pattern from legacy-knowledge-purge.test.ts:218-244. Tests: 884 total (852 prior + 32 new). Build: clean. Co-Authored-By: Claude <noreply@anthropic.com>
…lper
- learn.ts / legacy-knowledge-purge.ts: the "all lock holders interpret
staleness consistently" claim was false — bash uses 300 s (guards the
full Sonnet pipeline), Node uses 60 s. Update both comments to state
the actual per-holder values and the reason for the deviation.
- learn.ts: collapse the verbose JSDoc over `export type { NotificationFileEntry }`
into a one-line comment; the D-SEC1 runtime-guard description belongs in
notifications-shape.ts (where the guard actually lives), not at the
re-export site.
- knowledge-usage-scan.test.ts: remove the dead `run` wrapper function and
its `void run` suppression. The function wrapped spawnSync (synchronous)
in a Promise that resolved immediately and was never called — the two
direct spawnSync calls at resultA/resultB were doing the real work.
Both stop-update-memory and prompt-capture-memory exit at line 11
(DEVFLOW_BG_UPDATER=1 → exit 0) before reading stdin. Piping input
via execSync({ input }) races against this immediate exit — bash
closes the pipe before Node finishes flushing, producing
'spawnSync /bin/sh EPIPE' on Node 20.
Stop using { input, stdio: ['pipe', 'pipe', 'pipe'] } and use
stdio: 'ignore' instead. The hook never reads input on this code
path, so the test's intent is preserved and the race is eliminated.
Summary
decisions.md,pitfalls.md) is now written by the background learning hook viarender-ready, with per-type thresholds and feedback reconciliationdevflow learn --review/--purge-legacy-knowledgefor triage;knowledge-persistenceskill is now a format spec onlyCloses #177
What changed
Core infrastructure
scripts/hooks/lib/transcript-filter.cjs(new) — two-channel filter (USER_SIGNALS + DIALOG_PAIRS)scripts/hooks/json-helper.cjs— per-typeTHRESHOLDS, opsrender-ready/reconcile-manifest/merge-observation/knowledge-append, sharedacquireMkdirLockscripts/hooks/background-learning— unified 4-type extractor prompt + D7 v1→v2 migrationscripts/hooks/session-start-memory— reconciler call on each session start (deletion/edit feedback)Commands / plugins
/implement,/code-review,/debug,/resolveshared/skills/knowledge-persistence/SKILL.mdrewritten as format spec only (noWritetool)knowledge-persistencefromdevflow-implement,devflow-code-review,devflow-resolveplugin manifests (kept in debug/plan/ambient as a reader)HUD + CLI
src/cli/hud/learning-counts.ts+ component — promoted-knowledge row with attention flagdevflow learn --review— interactive triage with lock + atomic writesdevflow learn --purge-legacy-knowledge— removes pre-v2 command-phase-written entriesDocs + tests
CLAUDE.md,README.md,docs/self-learning.md,docs/working-memory.mdupdated for 4-type systemtests/integration/learning/tests/shell-hooks.test.ts+tests/learn.test.tsupdated for per-type thresholds andquality_okgateAll D1–D16 design decisions are documented as JSDoc at implementation sites.
Test plan
npm test— 760/760 passing (35 test files, 2.50s)npm run build— clean (tsc + plugin distribution + HUD)--reviewhappy path,--purge-legacy-knowledge, D7 migration, command files clean of writers, plugin manifests updated, SKILL is format spec)devflow learn --purge-legacy-knowledgeon projects with pre-v2 entriesKnown notes
tests/integration/learning/end-to-end.test.tsruns vianpm run test:integration(separate vitest config) — not in default suitetests/integration/ambient-activation.test.ts(main-branch issue, design-review router drift) is orthogonal to this PR