Skip to content

feat(learning): v2 unified self-learning + project knowledge#181

Merged
dean0x merged 42 commits intomainfrom
feat/177-revisit-project-knowledge-system---analy
Apr 13, 2026
Merged

feat(learning): v2 unified self-learning + project knowledge#181
dean0x merged 42 commits intomainfrom
feat/177-revisit-project-knowledge-system---analy

Conversation

@dean0x
Copy link
Copy Markdown
Owner

@dean0x dean0x commented Apr 11, 2026

Summary

  • Unified 4-type self-learning extractor (workflow, procedural, decision, pitfall) replaces the three retrospective knowledge writers previously embedded in command phases
  • Project knowledge (decisions.md, pitfalls.md) is now written by the background learning hook via render-ready, with per-type thresholds and feedback reconciliation
  • New HUD row + devflow learn --review / --purge-legacy-knowledge for triage; knowledge-persistence skill is now a format spec only

Closes #177

What changed

Core infrastructure

  • scripts/hooks/lib/transcript-filter.cjs (new) — two-channel filter (USER_SIGNALS + DIALOG_PAIRS)
  • scripts/hooks/json-helper.cjs — per-type THRESHOLDS, ops render-ready / reconcile-manifest / merge-observation / knowledge-append, shared acquireMkdirLock
  • scripts/hooks/background-learning — unified 4-type extractor prompt + D7 v1→v2 migration
  • scripts/hooks/session-start-memory — reconciler call on each session start (deletion/edit feedback)

Commands / plugins

  • Removed retrospective knowledge writer phases from /implement, /code-review, /debug, /resolve
  • shared/skills/knowledge-persistence/SKILL.md rewritten as format spec only (no Write tool)
  • Dropped knowledge-persistence from devflow-implement, devflow-code-review, devflow-resolve plugin manifests (kept in debug/plan/ambient as a reader)

HUD + CLI

  • src/cli/hud/learning-counts.ts + component — promoted-knowledge row with attention flag
  • devflow learn --review — interactive triage with lock + atomic writes
  • devflow learn --purge-legacy-knowledge — removes pre-v2 command-phase-written entries

Docs + tests

  • CLAUDE.md, README.md, docs/self-learning.md, docs/working-memory.md updated for 4-type system
  • 80+ new unit tests (filter, thresholds, 4 renderers, reconcile, merge-observation, staleness, HUD counts, review command, migration)
  • 3 new integration tests under tests/integration/learning/
  • Pre-existing tests/shell-hooks.test.ts + tests/learn.test.ts updated for per-type thresholds and quality_ok gate

All D1–D16 design decisions are documented as JSDoc at implementation sites.

Test plan

  • npm test — 760/760 passing (35 test files, 2.50s)
  • npm run build — clean (tsc + plugin distribution + HUD)
  • Tester scenarios 12/12 pass (reconciler deletion/edit, 4 render targets, pitfall dedup, HUD populated/empty, --review happy path, --purge-legacy-knowledge, D7 migration, command files clean of writers, plugin manifests updated, SKILL is format spec)
  • Scrutinizer (b1134ca) — 8 P0 + 3 P1 issues resolved
  • Evaluator (b2efaf5) — ALIGNED after D7 migration + threshold docs + needsReview JSDoc fixes
  • Post-merge: run devflow learn --purge-legacy-knowledge on projects with pre-v2 entries

Known notes

  • tests/integration/learning/end-to-end.test.ts runs via npm run test:integration (separate vitest config) — not in default suite
  • Pre-existing flaky tests/integration/ambient-activation.test.ts (main-branch issue, design-review router drift) is orthogonal to this PR

Dean Sharon and others added 26 commits April 11, 2026 15:50
…ng + project knowledge

Implements the four-component detection pipeline:

1. Channel-based transcript filter (D1, D2):
   - scripts/hooks/lib/transcript-filter.cjs — testable CJS module
   - Rejects isMeta, sourceToolUseID/toolUseResult, framework XML wrappers,
     tool_result arrays, empty turns (<5 chars)
   - Produces USER_SIGNALS (workflow/procedural) and DIALOG_PAIRS (decision/pitfall)

2. Per-type thresholds + 4-type detection (D3, D4):
   - THRESHOLDS constant: workflow(req=3,spread=3d), procedural(req=4,spread=5d),
     decision(req=2,spread=0), pitfall(req=2,spread=0)
   - calculateConfidence(count, type) — per-type required count
   - process-observations extended: all 4 types valid, quality_ok stored/honoured,
     per-type promotion gate requires quality_ok===true
   - filter-observations updated to include all 4 types

3. Deterministic renderers (D5):
   - render-ready <log> <baseDir> — dispatches by type:
     workflow → .claude/commands/self-learning/<slug>.md
     procedural → .claude/skills/self-learning:<slug>/SKILL.md
     decision → .memory/knowledge/decisions.md#ADR-NNN (with lock, capacity=50)
     pitfall → .memory/knowledge/pitfalls.md#PF-NNN (with lock, dedup, capacity=50)
   - All writes atomic (tmp+rename), manifest schemaVersion=1 maintained

4. Reconciler + feedback loop (D6, D13):
   - reconcile-manifest <cwd> — session-start op, detects deletions (confidence×0.3,
     status=deprecated) and edits (hash update only, no penalty per D13)
   - Anchor-based checks for knowledge file entries (ADR/PF missing = deletion)
   - Stale manifest entries (no matching obs) silently dropped
   - session-start-memory: reconciler call added before TL;DR injection

5. Additional ops:
   - merge-observation <log> <obs>: in-place dedup/reinforce (D14), FIFO cap 10 (D12),
     ID collision recovery _b suffix (D11), Levenshtein mismatch flagging
   - knowledge-append: standalone knowledge file writer
   - acquireLock()/releaseLock(): shared lock helper extracted from shell patterns

6. New background-learning pipeline:
   - extract_batch_messages: uses transcript-filter.cjs, produces USER_SIGNALS + DIALOG_PAIRS
   - build_sonnet_prompt: new 4-type prompt, no artifact sections (D10, D5)
   - render_ready_observations: calls render-ready after process_observations
   - check_staleness: grep-based code-ref staleness pass (D16)

Unit tests: 9 test files, 80 new tests (100% passing)
Known regressions: 2 tests in shell-hooks.test.ts encode old behaviour
(confidence formula, quality_ok promotion gate) — Coder 3 will update
Removed the Extraction Procedure and Loading sections. The SKILL.md
now serves only as a format reference for on-disk knowledge files.
Writing is performed exclusively by scripts/hooks/background-learning
via json-helper.cjs render-ready. Added D9 comment explaining the change.
Frontmatter updated to remove Write from allowed-tools (now read-only).
…ment/code-review/debug/resolve

Phase 10 (implement), Phase 5 (code-review), Phase 6 (debug/resolve) previously
recorded decisions/pitfalls by invoking knowledge-persistence SKILL. Removed in v2
because agent-summaries produced low-signal entries. Added D8 comment at top of each
command file. Knowledge is now extracted from user transcripts by background-learning.
Phase numbers renumbered in resolve.md to fill the gap.
…anifests

Removed from devflow-implement, devflow-code-review, devflow-resolve plugin.json
skills arrays and corresponding plugins.ts entries. debug and plan plugins retain
knowledge-persistence as they still read knowledge for context (Phase 1). ambient
and core-skills also retain it.
…tries + review attention

Added learningCounts HUD component (15th component):
- getLearningCounts() reads .memory/learning-log.jsonl, counts status=created entries
  by type (workflow/procedural/decision/pitfall) and attention flags (mayBeStale,
  needsReview, softCapExceeded)
- Shows "Learning: 3 workflows, 2 skills, 8 decisions, 12 pitfalls"
- Appends "⚠ N need review" when attention flags are set
- Graceful fallback: returns null when log missing or unparseable
- D15 comment: soft cap + attention counter, not auto-pruning
…mands

--review: interactively reviews flagged observations (mayBeStale/needsReview/
softCapExceeded). User can deprecate (updates status + knowledge file Status field),
keep (clears flags), or skip. Writes log atomically after review session.
updateKnowledgeStatus() acquires mkdir lock before updating decisions/pitfalls Status.

--purge-legacy-knowledge: one-time removal of low-signal v1 entries (ADR-002,
PF-001, PF-003, PF-005) from knowledge files with confirmation prompt.

Also updated LearningObservation type to include v2 fields:
- type now accepts 'decision' | 'pitfall'
- status now accepts 'deprecated'
- Added mayBeStale, staleReason, needsReview, softCapExceeded, quality_ok fields
isLearningObservation() type guard updated accordingly.
formatLearningStatus() updated to show all 4 types + needReview count.
…tests

hud-counts.test.ts (9 tests):
- Counts created entries by type correctly
- Counts needReview from attention flags regardless of status
- Graceful null on missing log, parse error, empty file
- Skips malformed lines and processes valid ones
- Multiple flags on one entry count as 1 needReview

review-command.test.ts (15 tests):
- Validates v2 type support (decision, pitfall, deprecated status)
- Validates attention flag detection and log mutation on deprecate/keep
- Tests updateKnowledgeStatus against decisions.md and pitfalls.md
- Tests graceful behavior when file/anchor missing

end-to-end.test.ts (3 integration tests):
- Full pipeline: 3 sessions → claude shim → all 4 observation types in log
- reconcile-manifest marks deleted artifact observation as deprecated
- Graceful exit with no batch IDs file
…ity_ok gate

- confidence assertion: 0.40 → 0.66 (workflow req=3, count=2: floor(2*100/3)/100=0.66)
- temporal spread promotion test: add quality_ok:true to fixture (new gate requires it)

Co-Authored-By: Claude <noreply@anthropic.com>
- isLearningObservation: accept decision, pitfall, deprecated status, quality_ok field
- formatLearningStatus: test all-4-type counts, decision/pitfall promoted entries

Co-Authored-By: Claude <noreply@anthropic.com>
- CLAUDE.md Self-Learning: describes 4 observation types, channel-based filter,
  per-type thresholds, render-ready dispatch, reconcile-manifest feedback loop,
  quality_ok gate, new CLI flags (--review, --purge-legacy-knowledge), manifest file
- CLAUDE.md Skills: remove knowledge-persistence from Write list (now read-only)
- CLAUDE.md memory dir: add .learning-manifest.json entry; note render-ready writes decisions/pitfalls
- README.md: expand self-learning feature description to mention all 4 types
- README.md: annotate learn --enable with 4-type extraction note

Co-Authored-By: Claude <noreply@anthropic.com>
- Full rewrite of self-learning.md: 4 observation types, channel-based filtering
  (USER_SIGNALS vs DIALOG_PAIRS), per-type thresholds table, status machine,
  render-ready 4-target dispatch, reconcile-manifest feedback loop, HUD row format,
  key design decisions (D8, D9, D13, D15, D16)
- Added --review and --purge-legacy-knowledge to CLI reference
- working-memory.md: add Self-Learning sibling system cross-reference paragraph

Co-Authored-By: Claude <noreply@anthropic.com>
P0 fixes:
- render-ready now sets softCapExceeded (was pendingCapacity) so HUD,
  --review, and SKILL spec all agree on the attention flag name.
- updateKnowledgeStatus uses the canonical .memory/.knowledge.lock path
  instead of .memory/knowledge/.knowledge.lock so CLI updates actually
  serialize against json-helper.cjs render-ready / knowledge-append.
- updateKnowledgeStatus and --purge-legacy-knowledge now write atomically
  via a .tmp + rename helper mirrored from json-helper.cjs writeFileAtomic.
- json-helper.cjs knowledge-append acquires the shared knowledge lock
  (previously no lock at all despite the SKILL.md contract).
- --review acquires .learning.lock around the interactive loop and
  persists the log after every deprecate/keep so a Ctrl-C never leaves
  the log out of sync with knowledge-file Status updates.
- --purge-legacy-knowledge acquires .knowledge.lock around the purge
  loop and writes atomically.
- Pitfall render template now emits `- **Status**: Active` so
  `devflow learn --review` deprecate can flip PF Status the same way
  it flips ADR Status. knowledge-append kept in sync. SKILL.md spec
  updated to document the new field + semantics.

P1 fixes:
- Added JSDoc `DESIGN: D4` annotation at the quality_ok promotion check
  in process-observations; previously only D3 was called out in-line.
- Refactored shared CLI lock acquisition into acquireMkdirLock() so
  --review and --purge-legacy-knowledge share one implementation.
- review-command.test.ts updateKnowledgeStatus tests now mirror the
  production .memory/knowledge/ layout so the lock lands inside tmpDir
  rather than polluting the shared system temp root.

All 756 unit tests pass. Build is clean.
…cs, needsReview JSDoc

- Add D7 greenfield migration to background-learning: on first v2 run, any
  learning-log.jsonl lacking quality_ok fields is renamed to .v1.jsonl.bak
  and replaced with an empty log. No dual-writer period needed.
- Add migration.test.ts (4 tests): v1 log moves to .bak, v2 log untouched,
  no-op when log absent, mixed entries treated as v2.
- Correct threshold table in docs/self-learning.md: workflow=3d/req=3,
  procedural=5d/req=4, decision=no spread/req=2, pitfall=no spread/req=2.
- Fix needsReview JSDoc in learn.ts: set by merge-observation on Levenshtein
  ratio < 0.6, not by reconcile-manifest (which sets status=deprecated).
…e-legacy-knowledge

Delete stale .memory/PROJECT-PATTERNS.md from devflow and alefy projects — nothing
generates or reads this file anymore. Extend --purge-legacy-knowledge in learn.ts
to also remove PROJECT-PATTERNS.md if present, so future orphans are cleaned atomically
with the knowledge entry purge.
….md with propagation test

Add citation instruction bounded by HTML markers to knowledge-persistence SKILL.md
(canonical), coder.md, and reviewer.md — ensures agents cite ADR/PF IDs in summaries
so usage can be tracked for capacity reviews. Update capacity limit from 50 to 100
(hard ceiling per D17). Add propagation test to skill-references.test.ts that
verifies byte-identical sentence across all three files.
…er.cjs

Add KNOWLEDGE_SOFT_START/HARD_CEILING/THRESHOLDS constants (D17), countActiveHeadings
(D18, skips Deprecated/Superseded), readUsageFile/writeUsageFile, readNotifications/
writeNotifications, crossedThresholds (D22), registerUsageEntry (D20), and
acquireKnowledgeUsageLock/releaseKnowledgeUsageLock helpers. Guard main CLI execution
with require.main === module check so the file can be required as a module in tests.
Export all helpers for unit testing via module.exports guard. Add 17 unit tests in
capacity-thresholds.test.ts covering all new helpers.
…ons (D17-D22, D26)

- Remove local CAPACITY=50 const; use KNOWLEDGE_HARD_CEILING (100) from module scope
- D18: count only active (non-deprecated/superseded) headings for capacity check
- D17: hard ceiling blocks append at 100 active entries; fires error-level notification
- D20: register each new entry in .knowledge-usage.json with zero cite count
- D21: first-run seed — if no .notifications.json exists and count >= KNOWLEDGE_SOFT_START,
  treat effective previous count as 0 so all relevant thresholds fire immediately
- D22: per-append threshold crossing detection; fires notification at highest crossed
- D24: severity escalates dim→warning (≥70) →error (≥90)
- D26: TL;DR comment reflects active-only count; Key list includes only active IDs
- D27: per-file notification keys (knowledge-capacity-decisions / knowledge-capacity-pitfalls)
- D28: dismissed notifications re-fire when a higher threshold is crossed
- Applies same capacity+notification logic to knowledge-append case (latent bug fix)
- 7 new integration tests in capacity-thresholds.test.ts; updated render-decision.test.ts

Co-Authored-By: Claude <noreply@anthropic.com>
… (D19, D29)

Scans assistant messages for ADR-NNN/PF-NNN citations after queue append,
incrementing cites + updating last_cited in .knowledge-usage.json under lock.
Unregistered IDs are silently ignored. Scanner runs as supplementary pass —
memory capture remains mission-critical and is never blocked by scan failures.

Co-Authored-By: Claude <noreply@anthropic.com>
…24, D27)

Reads .notifications.json and surfaces the worst active+undismissed notification
in a new HUD component row. Severity-scaled colors: dim (50-69), yellow (70-89),
red (90-100). Picks highest severity across all per-file entries (D27). Dismissed
notifications are re-shown when a new threshold is crossed.

Co-Authored-By: Claude <noreply@anthropic.com>
…E-23)

Apply path.resolve() normalization to eliminate traversal sequences in the
--cwd argument before constructing any filesystem paths. All legitimate
callers already pass absolute paths; this guards against malformed inputs.

Co-Authored-By: Claude <noreply@anthropic.com>
…capacity (D23, D28)

- Add count-active operation to json-helper.cjs (D23: TS→CJS bridge for
  countActiveHeadings without duplicating logic)
- Add --dismiss-capacity flag: sets dismissed_at_threshold on active
  capacity notifications so HUD silences them until next threshold crossing
- Extend --review with mode picker (observations vs capacity) — lazy-loads
  observations only for the selected mode
- Capacity mode: parses knowledge entries, filters 7-day-protected entries,
  sorts by least-used (cites ASC, last_cited ASC NULLS FIRST, created ASC),
  shows top-20 via p.multiselect, batch-deprecates selected, clears
  notifications if count drops below soft start (D28)
- Update softCapExceeded JSDoc to reflect D17 (hard ceiling at 100)
- Add tests: count-active op, notification dismissal persistence

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix P0 bug in countActiveHeadings: status lookup bled across entry
  boundaries (slice to end-of-file instead of current section), causing
  entries without a Status field to inherit a later entry's Deprecated
  status. Same fix applied to buildUpdatedTldr for TL;DR active-IDs.
- Replace Record<string, any> with typed NotificationFileEntry interface
  in learn.ts (eliminates 2 `any` type violations per CLAUDE.md rules).
- Add regression test for cross-entry status bleed edge case.
Add .notifications.json, .knowledge-usage.json, and .learning-manifest.json
to transient files removed by --reset. Also clean up stale .knowledge-usage.lock
directory (mkdir-based lock needs rmdir, not unlink).
Introduce src/cli/utils/migrations.ts with a typed Migration registry
(D31) and runMigrations runner. State persists at ~/.devflow/migrations.json
(D30, scope-independent). Migrations run at most once per machine (global
scope) or per-project sweep (per-project scope, D35 parallel).

Extracted purgeLegacyKnowledgeEntries into src/cli/utils/legacy-knowledge-purge.ts
as a pure no-UI helper (D34), and migrateShadowOverridesRegistry into
src/cli/utils/shadow-overrides-migration.ts.

Wired runMigrations into devflow init after memory-dir creation (D32,
always-run-unapplied). Failures are non-fatal and do not mark applied (D33).
Shadow-overrides global migration retrofits the prior inline call (D36).

Removed --purge-legacy-knowledge from devflow learn (now automated via
migration purge-legacy-knowledge-v2).

Closes task-2026-04-12_auto-migrate-legacy-knowledge
- Remove --purge-legacy-knowledge reference from docs/self-learning.md
  (flag was removed in f99588e but doc still advertised it).
- Tighten MIGRATIONS and registryOverride types to readonly for
  consistency with FLAG_REGISTRY pattern in flags.ts.
- Add D37 edge-case lock-in test: per-project migration with empty
  discoveredProjects is marked applied via vacuous truth.
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

CRITICAL — Install ordering regression ⚠️ 92% confidence

Shadow-override migration is running AFTER installViaFileCopy, not before. This breaks V1→V2 upgraders:

  1. installViaFileCopy looks for shadows at V2 names (~/.devflow/skills/software-design/) → not found
  2. Installs stock skill to ~/.claude/skills/devflow:software-design/
  3. runMigrations renames V1 shadow (core-patterns) → V2 name
  4. Migration marked applied; shadow now at correct path; but stock content already in ~/.claude/skills/
  5. Next init re-reads shadow at correct location but doesn't re-run installer

User impact: Customized shadows on V1→V2 upgraders are silently lost on first init.

Fix: Move the runMigrations block (current lines 888-912) above installViaFileCopy (line 762). Per-project migrations depend on discoveredProjects which is computed earlier, so no wiring changes needed.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

HIGH — Busy-wait CPU spin in lock acquisition 95% confidence

The mkdir-lock retry loop uses a synchronous CPU spin instead of sleeping:

while (Date.now() < end) { /* empty loop burns CPU */ }

Under contention with json-helper.cjs render-ready holding the lock, this pegs one core at 100% for up to 2 seconds. This runs on every Stop hook (every assistant turn), so impact is per-turn.

Fix: Replace with Atomics.wait:

const SLEEP_BUF = new Int32Array(new SharedArrayBuffer(4));
function syncSleep(ms) { Atomics.wait(SLEEP_BUF, 0, 0, ms); }

// Replace lines 64-66
syncSleep(10);  // Yield instead of burning CPU

Or convert to async and use await new Promise(r => setTimeout(r, 10)).

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

HIGH — Shadow migration warnings silently dropped 92% confidence

The registry wrapper (lines 55-58) awaits migrateShadowOverridesRegistry without destructuring the returned { migrated, warnings }. User-facing warnings about shadow name conflicts are lost.

The original inline code in init.ts produced:

  • Migrated N shadow override(s) to V2 names
  • Shadow 'X' found alongside 'Y' — keeping 'Y', old shadow at ... (per conflict)

These warnings are now silently dropped.

Fix: Either extend Migration.run to optionally return metadata:

run: async (ctx) => {
  const { migrated, warnings } = await migrateShadowOverridesRegistry(ctx.devflowDir);
  // Surface via console or return from run()
}

Or add an optional onSuccess hook to Migration entries so the runner can invoke it with the result.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

HIGH — Unsafe JSON.parse assignment bypasses type 92% confidence

notifications = JSON.parse(...) assigns directly to a variable typed as Record<string, NotificationFileEntry> with no runtime validation. JSON.parse returns any, so this coerces arbitrary JSON into the declared type.

If the file is malformed but parses (array [], primitive 42, null, or object with wrong structure), subsequent Object.entries(notifications) / writeback will corrupt the file further via line 1243's writeFileAtomic(... JSON.stringify(notifications)).

Fix: Add a type guard matching the pattern in applyConfigLayer (lines 275-288):

function isNotificationMap(v: unknown): v is Record<string, NotificationFileEntry> {
  return typeof v === 'object' && v !== null && !Array.isArray(v)
    && Object.values(v).every(e =>
      typeof e === 'object' && e !== null
      && (e.active === undefined || typeof e.active === 'boolean')
      // ... validate other fields
    );
}

let notifications: Record<string, NotificationFileEntry> = {};
try {
  const parsed: unknown = JSON.parse(await fs.readFile(notifPath, 'utf-8'));
  if (isNotificationMap(parsed)) notifications = parsed;
} catch { /* no file */ }

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

HIGH — Type assertion bypasses runtime validation (88% confidence)

severity: (worst.entry.severity as NotificationData['severity']) ?? 'dim'

The source is typed as string | undefined, but coerced to 'dim' | 'warning' | 'error' without runtime check.

Fix: Use a type guard before assignment.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

HIGH — Per-project parallel sweep unbounded (82% confidence)

Promise.allSettled(discoveredProjects.map(...)) has no concurrency limit. With 50-200 projects, this risks EMFILE (too many open files) on macOS (ulimit -n = 256).

Fix: Use parallelMap helper with 8-16 worker limit to serialize filesystem access.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

CRITICAL — Teams-variant commands still invoke knowledge-persistence (95% confidence)

Base .md commands removed their "Record Pitfalls/Decisions" phases (D8 refactor), but -teams.md variants were NOT updated. Teams users still get instructions to use a skill that no longer has Write capability.

Affected: code-review-teams.md, resolve-teams.md, debug-teams.md, implement-teams.md

Fix: Apply same D8 removals to all four teams variants. Users installing with --teams lose knowledge capture — behavioral regression.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

HIGH — Test isolation violation (95% confidence)

E2E test writes to real ~/.claude/projects/ and ~/.devflow/logs/. If test aborts, artifacts leak into live user directory.

Fix: Override HOME in beforeEach/afterEach to contain test artifacts in tmp directory (pattern already in tests/migrations.test.ts:136-148).

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 12, 2026

SUMMARY: PR Comments Created

Inline Comments

Created 8 inline comments for CRITICAL/HIGH blocking issues (≥80% confidence):

  1. CRITICAL — Install ordering regression (92%) — Shadow migration runs after install, losing user customizations
  2. HIGH — Busy-wait CPU spin (95%) — Lock acquisition pegs CPU, runs on every Stop hook
  3. HIGH — Shadow migration warnings dropped (92%) — User-facing conflict warnings silently lost
  4. HIGH — Unsafe JSON.parse in notifications (92%) — Malformed files corrupt without error
  5. HIGH — Type assertion bypasses severity validation (88%)
  6. HIGH — Unbounded per-project concurrency (82%) — EMFILE risk with 50+ projects
  7. CRITICAL — Teams-variant regression (95%) — Commands instruct skill use that no longer works
  8. HIGH — E2E test isolation violation (95%) — Leaks to live user ~/.claude/projects/

Deduplicated Findings

  • Teams-variant inconsistency: Flagged in both Regression and Documentation reviews
  • MigrationContext empty-string sentinels: Flagged in both Consistency and Architecture reviews
  • knowledge-persistence skill role change: Mentioned in Documentation and multiple command files

Lower-Confidence Findings (60-79%)

Reserved for summary comment (not posted as inline). Includes:

  • Migration O(N²) writes during cold init
  • Legacy-knowledge-purge nesting complexity
  • HUD render hot path re-reading files
  • Various test coverage gaps

Rate Limiting

  • 1 second delay between API calls ✓
  • Remaining API quota checked (not enforced in this run)

Next Steps: Review findings, apply fixes, and return for follow-up verification.

Dean Sharon and others added 10 commits April 13, 2026 14:30
…y-knowledge-purge

Use O_EXCL (flag: 'wx') when writing the .tmp file so the kernel rejects
the open if the path already exists — including an attacker-placed symlink.
On EEXIST, unlink the stale/adversarial .tmp and retry once.

Adds a regression test that places a symlink at the .tmp location and
verifies the sentinel target is not overwritten after the purge completes.

Co-Authored-By: Claude <noreply@anthropic.com>
…vation fields

- Replace bare type assertion with isSeverity() guard on notification severity
- Validate JSON.parse output shape before use in getActiveNotification
- Extend isRawObservation to validate optional boolean flags (mayBeStale,
  needsReview, softCapExceeded) when present
- Add never exhaustiveness check to ObservationType switch to catch
  future union extensions at compile time

Co-Authored-By: Claude <noreply@anthropic.com>
…jection

- Add isNotificationMap() type guard — validates .notifications.json is a plain
  object map before narrowing; malformed input falls back to empty map with a warn
  rather than corrupting state (fixes unsafe parse at both --review and --dismiss-capacity)
- Add isCountActiveResult() type guard — validates count-active result has a
  numeric count field before narrowing; malformed output falls back to 0
- Add structural guard on .knowledge-usage.json parse — explicit typeof/Array.isArray
  checks before accessing .entries, matching the intent of the existing version check
- Replace execSync() shell interpolation with execFileSync() argv array — eliminates
  shell metacharacter injection through cwd-derived jsonHelperPath and filePath
- Harden writeFileAtomic() with flag:'wx' + EEXIST retry — detects stale .tmp files
  from prior crashes and unlinks before retrying, preventing silent TOCTOU overwrite

Co-Authored-By: Claude <noreply@anthropic.com>
…onciler

- CLAUDE.md: split "workflow/procedural: 3 required" into per-type values
  (workflow: 3, procedural: 4) to match THRESHOLDS in json-helper.cjs
- docs/self-learning.md: correct promotion predicate from "observations >= required"
  to "confidence >= promote", include the confidence-clamping formula and
  effective observation counts at which each type promotes
- docs/self-learning.md: correct reconciler "unchanged" case from "observation
  reinforced" to "counted in telemetry only (no confidence change)" — the code
  only increments a counter, no confidence write occurs

Co-Authored-By: Claude <noreply@anthropic.com>
…s (D8)

Base commands had Record Pitfalls/Decisions phases removed in the D8
refactor (knowledge-persistence skill is read-only), but their paired
-teams.md variants were not updated, leaving silent no-ops for teams
users.

- Remove "Record Pitfalls" phase from code-review-teams.md (was Phase 6)
  and renumber cleanup phase 7→6; update Architecture diagram
- Remove "Record Pitfalls" phase from resolve-teams.md (was Phase 6)
  and renumber simplify/debt/report phases 7→6, 8→7, 9→8; update diagram
- Remove "Record Pitfall" phase from debug-teams.md (was Phase 9)
  and update Architecture diagram
- Remove "Record Decisions" block from implement-teams.md Phase 10;
  update Architecture diagram
- Fix stale "Phase 9" reference in resolve.md and resolve-teams.md
  Output Artifact sections (correct phase is now 8)

Add D8 HTML comments at each removal site explaining the rationale.

Co-Authored-By: Claude <noreply@anthropic.com>
Four security and reliability issues found in the hook scripts:

- background-learning: pass stale_ref as process.argv[1] instead of
  interpolating into the node -e JS string. Eliminates shell/JS injection
  if the grep regex is ever relaxed and fixes apostrophes in path names
  corrupting staleReason strings.

- knowledge-usage-scan: fix path traversal guard that was a no-op —
  path.resolve() unconditionally returns absolute, so the isAbsolute
  check after resolve never fired. Now rejects relative rawCwd before
  resolving (CWE-23 hardening is now effective).

- knowledge-usage-scan: replace busy-wait CPU spin in acquireLock with
  Atomics.wait, eliminating 100% CPU usage during the 2-second lock
  timeout window on every Stop hook invocation.

- json-helper.cjs + knowledge-usage-scan: add wx (O_EXCL) flag to all
  atomic .tmp writes, matching the pattern already established in
  legacy-knowledge-purge.ts. Prevents TOCTOU symlink attacks where an
  attacker places a symlink at the .tmp path between stat and open.

Co-Authored-By: Claude <noreply@anthropic.com>
…8 refactor

knowledge-persistence is now a format-spec-only skill (D9) — the background
extractor is the sole writer. Remove it from plugin.json skills arrays in
devflow-plan, devflow-debug, and devflow-ambient; remove from skimmer.md
frontmatter; update skills-architecture.md to reflect format-spec-only role;
fix devflow-implement README skill count; add FORMAT_SPEC_SKILLS exclusion in
build.test.ts; remove stale ambient knowledge-persistence assertion from
plugins.test.ts. All 845 tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses 9 issues found in the r1-init-migrations review batch:

- #1: Move runMigrations block before installViaFileCopy so V1→V2 shadow
  renames complete before the installer looks for V2-named directories
- #2: Extend Migration.run to return MigrationRunResult { infos, warnings };
  both registry entries now surface migrated counts and conflict warnings to
  init.ts, which logs them via p.log.info / p.log.warn after the migration loop
- #3 (ISP): Split MigrationContext into GlobalMigrationContext | PerProjectMigrationContext
  discriminated union; drop unused claudeDir field; empty-string sentinels removed
- #4: Cap per-project Promise.allSettled concurrency at 16 via pooled() helper to
  avoid EMFILE on machines with 50-200 projects
- #5: Accumulate newlyApplied in memory and write state once at end of
  runMigrations — eliminates O(N²) writeAppliedMigrations calls per run
- #6: Use { flag: 'wx' } exclusive-create on .tmp file with unlink+retry on
  EEXIST to prevent TOCTOU symlink writes
- #7: Add exhaustiveness assertion (never) on migration.scope dispatch so future
  union extensions cause a runtime throw instead of silent no-op
- #8 (D37): Document vacuous-truth edge case in runMigrations comment block where
  discoveredProjects=[] marks per-project migration applied without sweeping any project
- #9: Convert applied array to Set<string> before the migration loop for O(1)
  .has() lookups instead of O(N) .includes() per migration

Co-Authored-By: Claude <noreply@anthropic.com>
Issue 1 (E2E HOME isolation): Override HOME via vi.stubEnv/vi.unstubAllEnvs
in beforeEach/afterEach of the E2E learning test so session JSONL files are
planted under a tmpdir fake home rather than the developer's real
~/.claude/projects/. Remove dead DEVFLOW_E2E_TEST env var (never read in
codebase).

Issue 2 (staleness reimplementation): Extract the staleness detection
algorithm from background-learning's inline shell loop into
scripts/hooks/lib/staleness.cjs — a proper CJS module that is both callable
by the shell script and importable by tests. Update background-learning to
delegate to it via node. Update staleness.test.ts to import the real
implementation instead of a TypeScript reimplementation.

Issue 3 (runMigrations seam): Add a runMigrations integration seam test
suite in tests/init-logic.test.ts that exercises runMigrations with probe
migrations injected via registryOverride, verifying correct devflowDir
context, per-project root distribution, and idempotency.

Co-Authored-By: Claude <noreply@anthropic.com>
- notifications.ts: drop redundant data intermediate after isNotificationMap narrows
- init.ts: rename lambda params shadowing `p` clack namespace import
- learn.ts: dot access over bracket notation in isCountActiveResult guard
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 13, 2026

PR Comments: Code Review Findings

Overview

This is an incremental review of 10 commits (0dd9e24...HEAD) addressing prior review findings on security hardening, architecture refinement, and knowledge-system consistency. All three prior CRITICAL regressions are resolved. However, four new HIGH-severity findings block merge, and several MEDIUM findings should be addressed.

HIGH-Severity Findings (Blocking)

1. Race-Tolerance Divergence in Atomic-Write Pattern (Confidence: 95%)

Files: src/cli/commands/learn.ts:395-407, src/cli/utils/legacy-knowledge-purge.ts:50-61, src/cli/utils/migrations.ts:157-180, scripts/hooks/json-helper.cjs:137-146, scripts/hooks/knowledge-usage-scan.cjs:113-123

Problem: The wx-flag atomic-write pattern spreads to 5 sites, but race-tolerance diverges between TypeScript and JavaScript implementations:

  • TS sites (learn.ts, legacy-knowledge-purge.ts, migrations.ts): After EEXIST, call await fs.unlink(tmp) with no try/catch, so if another writer races ahead and unlinks the stale file, this operation throws ENOENT.
  • JS sites (json-helper.cjs, knowledge-usage-scan.cjs): Wrap the unlink in try { fs.unlinkSync(...) } catch { /* race */ }, tolerating the concurrent unlink.

This is a correctness bug: the documented intent is "stale tmp — unlink and retry once"; the TS sites violate the documented contract by not handling the expected race condition.

Fix: Extract a shared writeFileAtomicExclusive(filePath, content) helper in src/cli/utils/fs-atomic.ts that mirrors the JS race semantics, and import it from all three TS sites. The helper must wrap the unlink in try/catch:

try { await fs.unlink(tmp); } catch { /* race — already removed */ }

2. isNotificationMap Defined Twice with Divergent Strength (Confidence: 98%)

Files: src/cli/hud/notifications.ts:28-30 vs src/cli/commands/learn.ts:31-36

Problem: Both files define isNotificationMap(v): v is Record<string, NotificationEntry> for the same .memory/.notifications.json file, but enforce different invariants:

  • notifications.ts (weak): Accepts { foo: 42 } (primitive entry values) ✗
  • learn.ts (strong): Rejects non-object entries, walks Object.values(...).every(entry => typeof entry === 'object')

A file shape passing the HUD guard but failing learn.ts's guard will be treated inconsistently. This is the exact "said one thing in one place, another elsewhere" pattern the consistency review was designed to catch.

Fix: Extract both into a shared module src/cli/utils/notifications-shape.ts with the stronger guard and import from both sites.


3. Unnecessary as Migration<...> Casts Defeat Discriminated-Union Narrowing (Confidence: 92%)

File: src/cli/utils/migrations.ts:265, src/cli/utils/migrations.ts:307

Problem: Inside runMigrations, after narrowing on migration.scope === 'global' (line 256) and migration.scope === 'per-project' (line 280), the code casts to as Migration<'global'> and as Migration<'per-project'>. Empirically verified: TypeScript narrows correctly without the casts. Running npx tsc --noEmit passes cleanly. The casts teach future maintainers that the discriminated union is insufficient (it isn't) and silently hide bugs where mismatched contexts are passed through the cast.

Fix: Remove both casts and rely on the discriminant narrowing.


4. D-Tag Naming Inconsistent Across Security Commits (Confidence: 90%)

Files: src/cli/commands/learn.ts:27, 37, 125 (D-SEC1/2/3), src/cli/utils/legacy-knowledge-purge.ts:45, 48 (D35), src/cli/hud/notifications.ts:24-30 (no tag), src/cli/utils/migrations.ts:16, 74, 121, 223, 258, 282, 294 (D30–D38), scripts/hooks/json-helper.cjs:131-135 (no tag)

Problem: Per user feedback ("design-decisions-jsdoc"), every load-bearing decision must carry a D-series JSDoc comment. This PR mixes three schemes:

  1. D-SEC1/2/3 (learn.ts only)
  2. D30–D38 (migrations.ts, legacy-knowledge-purge.ts)
  3. No tag (notifications.ts, json-helper.cjs)

Additionally, D35 is used for two unrelated decisions (concurrency cap in migrations.ts:282 vs TOCTOU fix in legacy-knowledge-purge.ts:45), causing collision ambiguity.

Fix: Normalize to numeric scheme. Recommend: keep migrations.ts:282 as D35, renumber legacy-knowledge-purge.ts:45 to D39, and add D-tags to notifications.ts and json-helper.cjs guards using fresh unused numbers.


5. Lock Helpers Diverge: Three Names, Two Timeouts, One Contract (Confidence: 92%)

Files: src/cli/commands/learn.ts:357 (acquireMkdirLock), src/cli/utils/legacy-knowledge-purge.ts:68-90 (acquireMkdirLock), scripts/hooks/json-helper.cjs:429 (acquireLock), scripts/hooks/background-learning:70-80 (bash acquire_lock with timeout=90, STALE_THRESHOLD=300)

Problem: The documented contract (shared/skills/knowledge-persistence/SKILL.md:117-121) specifies "30 s timeout, 60 s stale". The TS code matches (both call acquireMkdirLock(30_000, 60_000)), but the bash hook uses very different values (timeout=90, STALE_THRESHOLD=300), and the function names drift (acquireMkdirLock vs acquireLock vs acquire_lock).

Fix:

  1. Rename scripts/hooks/json-helper.cjs::acquireLockacquireMkdirLock (unify naming)
  2. Update bash hook to use timeout=30, STALE_THRESHOLD=60, or document why it needs different values
  3. Cross-reference the skill's lock contract from all three implementations

6. runMigrations Integration Seam Tests Are Misleading Duplicates (Confidence: 93%)

File: tests/init-logic.test.ts:857-969

Problem: The added describe block claims to "test the integration between init and runMigrations," but the three tests import runMigrations directly and call it directly — they never invoke init.ts or any init-level function. All three tests are functional duplicates of existing coverage in tests/migrations.test.ts. The actual integration seam (init.ts:769-794 calling runMigrations with projectsForMigration fallback, running BEFORE installViaFileCopy per PF-007) is completely untested.

Fix: Either delete the duplicate tests and rely on migrations.test.ts for unit coverage, or rewrite them to actually test the integration: factor out the migration-execution block into an exported helper and test it end-to-end, including the ordering guarantee and the projectsForMigration fallback logic.


7. Security Hardening to knowledge-usage-scan.cjs Has No Test Coverage (Confidence: 90%)

File: scripts/hooks/knowledge-usage-scan.cjs (commit ab20b47)

Problem: Three security fixes in this file have zero test coverage:

  1. Reject relative --cwd input with process.exit(2) (CWE-23 path-traversal)
  2. Replace busy-spin with Atomics.wait (zero-CPU sleep)
  3. Write .tmp with wx flag + EEXIST retry (symlink TOCTOU)

A regression dropping the path.isAbsolute check would not fail any test — the scanner would silently accept ../../../etc/passwd as cwd.

Fix: Add three targeted tests mirroring tests/legacy-knowledge-purge.test.ts:218-244:

  • Reject relative cwd with exit code 2
  • Do not follow symlink at .knowledge-usage.json.tmp (TOCTOU)
  • Smoke-test that Atomics.wait does not peg CPU

Summary Table

Category CRITICAL HIGH MEDIUM LOW
Blocking 0 6 - -
Should Fix - 0 8 -
Pre-existing - - - -

Additional MEDIUM Findings (Should Fix)

  • Architecture: Contract drift between ctx.devflowDir parameter and internal state file location (migrations.ts:237-243)
  • Architecture: writeFileAtomic duplicated 4× — extract shared helper (migrations.ts, learn.ts, legacy-knowledge-purge.ts, json-helper.cjs)
  • Architecture: runMigrations uses direct os.homedir() hurting testability (migrations.ts:243)
  • Complexity: runMigrations touches 5 thresholds at once — extract scope branches into helpers (migrations.ts:236-347)
  • Complexity: init.ts migration-runner block adds 4 sequential output loops — extract reporter (init.ts:768-794)
  • Consistency: staleness.cjs writes learning-log non-atomically, breaking atomic-write contract elsewhere (staleness.cjs:92)
  • Testing: writeExclusive helper in json-helper.cjs has no dedicated test
  • Testing: New runtime type guards have no direct test coverage (isSeverity, isNotificationMap, isCountActiveResult, isRawObservation)
  • TypeScript: const parsed = JSON.parse(raw) at learn.ts:1127 relies on implicit any property access

Recommendation

CHANGES_REQUESTED — Merge blocked until all six HIGH findings are addressed. The fixes are narrowly scoped (extract helpers, remove casts, normalize tags, rewrite tests). Once addressed, the PR delivers substantial security hardening, architectural refinement, and testing improvements.


Incremental Review Context

This is an incremental review of PR #181 resolving prior findings. The commit range covers 10 commits addressing:

  • PF-007 (migrations-after-installer) — resolved via install ordering fix ✓
  • PF-008 (teams-variant drift) — resolved via synchronized phase removal ✓
  • PF-009 (busy-wait in per-turn hooks) — resolved via Atomics.wait ✓
  • PF-010 (unvalidated JSON.parse) — partially resolved via new runtime guards (remaining casts need fixing)

Full test suite: 848/848 passing. CLI build: clean.

Dean Sharon and others added 6 commits April 13, 2026 22:36
Rename acquireLock → acquireMkdirLock in json-helper.cjs to match the
name used in learn.ts and legacy-knowledge-purge.ts. Update all five
call sites within the same file.

Document why the bash acquire_lock in background-learning uses different
timeout values (90 s / 300 s) than the Node helpers (30 s / 60 s): the
bash lock guards the entire Sonnet analysis pipeline including a 180 s
watchdog, not just file I/O. The deviation is intentional; the new
comments make that explicit rather than leaving it as silent drift.

Update knowledge-persistence/SKILL.md to distinguish the three lock
paths (.knowledge.lock, .learning.lock, .knowledge-usage.lock) and
document their separate timeout contracts.

Co-Authored-By: Claude <noreply@anthropic.com>
Extract runMigrationsWithFallback from init.ts and rewrite the three
D32/D35 seam tests to test the init-level D37 fallback rule directly
via an injected runner spy, not runMigrations internals. Adds four
targeted tests covering the non-empty, gitRoot-fallback, empty, and
devflowDir-passthrough cases.

Add three security tests to knowledge-usage-scan.test.ts covering
relative-cwd rejection (CWE-23, exit code 2), symlink TOCTOU hardening
(wx + EEXIST unlink-retry does not follow symlink to sentinel file),
and Atomics.wait lock serialisation (both concurrent invocations
complete; final count is 2 with no data loss).

Co-Authored-By: Claude <noreply@anthropic.com>
…ision and Migration casts

- Create src/cli/utils/fs-atomic.ts: canonical TS writeFileAtomicExclusive with
  race-tolerant unlink (try/catch before retry), matching CJS json-helper.cjs
  and knowledge-usage-scan.cjs. All 3 TS atomic-write call sites (learn.ts,
  legacy-knowledge-purge.ts, migrations.ts) now import from this single source.

- Create src/cli/utils/notifications-shape.ts: consolidated NotificationEntry
  interface and isNotificationMap guard using the STRONGER definition (validates
  both top-level map and each entry value). Imported by learn.ts and
  notifications.ts, eliminating the two incompatible local definitions.

- Rebase legacy-knowledge-purge.ts D35 → D39 (D35 was colliding with the
  per-project concurrency cap decision in migrations.ts).

- Extract runGlobalMigration and runPerProjectMigration helpers from the 112-line
  runMigrations body. Each helper encapsulates one scope's dispatch logic including
  error handling; runMigrations now orchestrates loading/saving state + dispatches.

- Remove direct as Migration<'global'>/'per-project' casts from the original inline
  dispatch (HIGH #4). Casts are re-introduced only at the new helper-call boundary
  with explicit comments explaining the generic-narrowing constraint.

- MED #7 (init.ts warn/info level): confirmed already correct — no change needed.

Co-Authored-By: Claude <noreply@anthropic.com>
…rd and TOCTOU tests

MED #4: Extract reportMigrationResult() to migrations.ts (co-located with RunMigrationsResult
and MigrationLogger types). Removes the 17-line, 5-branch reporting block from
runMigrationsWithFallback in init.ts; init.ts now delegates to the extracted helper.
MigrationLogger and reportMigrationResult are exported from migrations.ts;
init.ts re-exports MigrationLogger for backward compat.

MED #6+#8: Split isRawObservation into two phases. Introduce VALID_OBSERVATION_TYPES
constant as the single source of truth for valid type values — drives both the guard
(phase 1: required fields + includes check) and the exhaustiveness check in the switch
(the const type union constrains ObservationType). Extract isOptBool() helper for the
three optional flag checks (phase 2), eliminating the mixed-concern boolean chain.

MED #10: Add adversarial-input tests for all four type guards.
- isRawObservation: 7 cases via getLearningCounts (invalid type, missing required fields,
  null/array JSON, non-boolean optional flag, valid minimal entry)
- isNotificationMap: 11 cases testing null/undefined/array/number/string/primitive-entry/
  null-entry/array-entry/empty-map/valid-entry/multi-entry
- isSeverity: 2 behavioral cases via getActiveNotification (unknown + null severity
  fall back to 'dim')
- reportMigrationResult: 8 cases covering all branches (empty, failure with/without
  project, infos, warnings, newlyApplied, verbose on/off)

MED #9: Add TOCTOU test for json-helper.cjs writeExclusive (via exported writeFileAtomic).
4 cases: basic write, overwrite, symlink pre-placed at .tmp path (sentinel unchanged),
stale .tmp from prior crash. Mirrors pattern from legacy-knowledge-purge.test.ts:218-244.

Tests: 884 total (852 prior + 32 new). Build: clean.

Co-Authored-By: Claude <noreply@anthropic.com>
…lper

- learn.ts / legacy-knowledge-purge.ts: the "all lock holders interpret
  staleness consistently" claim was false — bash uses 300 s (guards the
  full Sonnet pipeline), Node uses 60 s. Update both comments to state
  the actual per-holder values and the reason for the deviation.
- learn.ts: collapse the verbose JSDoc over `export type { NotificationFileEntry }`
  into a one-line comment; the D-SEC1 runtime-guard description belongs in
  notifications-shape.ts (where the guard actually lives), not at the
  re-export site.
- knowledge-usage-scan.test.ts: remove the dead `run` wrapper function and
  its `void run` suppression. The function wrapped spawnSync (synchronous)
  in a Promise that resolved immediately and was never called — the two
  direct spawnSync calls at resultA/resultB were doing the real work.
Both stop-update-memory and prompt-capture-memory exit at line 11
(DEVFLOW_BG_UPDATER=1 → exit 0) before reading stdin. Piping input
via execSync({ input }) races against this immediate exit — bash
closes the pipe before Node finishes flushing, producing
'spawnSync /bin/sh EPIPE' on Node 20.

Stop using { input, stdio: ['pipe', 'pipe', 'pipe'] } and use
stdio: 'ignore' instead. The hook never reads input on this code
path, so the test's intent is preserved and the race is eliminated.
@dean0x dean0x merged commit 4f26790 into main Apr 13, 2026
4 checks passed
@dean0x dean0x deleted the feat/177-revisit-project-knowledge-system---analy branch April 13, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: revisit project knowledge system — analysis and refinements for v2

1 participant