Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .context/LEARNINGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ DO NOT UPDATE FOR:
<!-- INDEX:START -->
| Date | Learning |
|----|--------|
| 2026-06-01 | An error-discard catalogue is an inventory, not a verdict — verify each site by reading before fixing |
| 2026-06-01 | Guard managed blocks before regenerating; don't trust the span to be machine-owned |
| 2026-05-30 | Capture golden fixtures from the live legacy code path before deleting it |
| 2026-05-30 | tpl package is magic-string-audit-exempt but its call sites are not |
| 2026-05-30 | New exported types must live in types.go or TestTypeFileConvention fails |
Expand Down Expand Up @@ -169,6 +171,26 @@ DO NOT UPDATE FOR:

---

## [2026-06-01-195111] An error-discard catalogue is an inventory, not a verdict — verify each site by reading before fixing

**Context**: Phase EH audited ~184 silent error-discard sites under internal/. The catalogue was built by grep + pattern/name classification (e.g. 'x, _ := SomethingMarshal' => B-marshal). When fixing, several name-inferred verdicts were wrong.

**Lesson**: Classifying a discard by the callee's name or a regex is a guess, not a fact. The discarded value's actual type and the call's role decide the category. Concrete false positives this pass: MergePublished returns (string, bool) — the discard is a 'markers missing' bool, not an error; LoadState returns a State value (not a pointer), so a 'nil-deref' was impossible; io/security's atomic writer already checked the meaningful close and only discarded error-path cleanup closes. All three would have been 'fixed' (churn or breakage) on name-inference alone.

**Application**: Treat any auto-generated audit/catalogue as a worklist of candidates, not findings. Before editing a flagged site, read the callee signature (is the discarded value even an error?) and the enclosing control flow (is it an already-failed path, a best-effort callback, or a data path?). Only then assign return-error vs logWarn vs annotate. This mirrors the Constitution's Context Integrity Invariants: never act on assumed content.

---

## [2026-06-01-174927] Guard managed blocks before regenerating; don't trust the span to be machine-owned

**Context**: ctx learning add silently deleted entry bodies that lived between INDEX:START/END markers: index.Update replaced the whole marker span with a regenerated table, and ParseHeaders scanning the full file made the result look complete, hiding the loss.

**Lesson**: Code that 'replaces the managed block' (index regen, KB managed blocks, moc.go) assumes the span between its markers is disposable and machine-owned. That assumption breaks the moment user content drifts inside the markers, and the regenerated output looks correct so the loss is invisible. The fix is a precondition guard that refuses to mutate when regeneration would lose data — not smarter parsing of the trapped content.

**Application**: Before any 'replace between markers' write, validate the span: refuse on entry/content found where only generated output belongs, and on malformed/duplicated/out-of-order markers. Fail loud and leave the file byte-identical rather than regenerate. Run the guard at the read-before-mutate choke point so nothing is written on refusal.

---

## [2026-05-30-212109] Capture golden fixtures from the live legacy code path before deleting it

**Context**: Behavior-preserving refactors of LoopScript composition and the recall <details>/<table> assembly had fragile whitespace where hand-transcribing the expected output risked silent drift from the original bytes.
Expand Down
96 changes: 87 additions & 9 deletions .context/TASKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,48 @@ TASK STATUS LABELS:

These have priority because other knowledge ingestion projects depend on them.

- [x] BUG (data loss): `ctx learning add` clobbers a dash-bullet-format
`LEARNINGS.md`. When the `INDEX:START/END` block uses the dash-bullet format
(what `ctx init` produces) rather than the pipe-table format, `ctx learning
add` (1) rewrites the index as a table, (2) **duplicates** the
`<!-- INDEX:START -->` marker, and (3) **drops every existing learning body** —
keeping only the newly added one. Observed live in `things-wtf-hub` (session
aa32f065): a `LEARNINGS.md` with 4 bodies collapsed to 1 (a -44-line commit,
2dc4d1a); recovered via `git show <good-sha>:.context/LEARNINGS.md`.
`ctx decision add` is UNAFFECTED because that repo's `DECISIONS.md` was already
table-format — so the bug is specifically the learning-add path's handling of
the dash-bullet index variant (likely it can't parse dash-bullet entries, so
it treats the file as empty and regenerates from only the new entry).
- Repro: `ctx init` a repo, confirm `LEARNINGS.md` index is dash-bullet
(`- entry`), add 2+ learnings by hand in that format, then run
`ctx learning add "x" --context … --lesson … --application …`. Expect: the
hand-authored bodies vanish + a duplicated INDEX:START marker.
- Likely fix: detect the existing index format (dash-bullet vs table) and
preserve it round-trip, OR parse dash-bullet entries before regenerating;
never emit a second INDEX:START; never drop bodies the parser didn't
recognize (fail loud instead of silently regenerating).
- Guard: a round-trip test for BOTH index formats (dash-bullet + table) that
asserts existing bodies survive an add and exactly one marker pair remains.
- Severity: HIGH — silent destruction of persisted memory, the one thing ctx
promises to protect; only git made it recoverable.
- Provenance: things-wtf-hub session aa32f065 wrap-up; full write-up in that
repo's LEARNINGS.md ("`ctx learning add` clobbers a dash-bullet-format
LEARNINGS.md"). #priority:high #added:2026-05-30
#completed:2026-06-01 #branch:fix/learning-add-index-data-loss
Shipped: new `index.Validate` precondition guard refuses to regenerate the
index when it would lose data — entry bodies (`## [ts]` headers) trapped
between the markers, or markers that are missing/duplicated/out-of-order.
Wired into the two read-before-mutate choke points (`entry.Write` and
`index.Reindex`), so add and all reindex commands fail loud and leave the
file byte-identical instead of clobbering it. `index.Update`'s signature is
untouched (kept the CRITICAL blast radius stable). New `internal/err/index`
package + i18n messages. Verified: the real repro now errors with the file
unchanged; well-formed adds still preserve all bodies and one marker pair.
Tests: `index.TestValidate` (7 shapes) + `entry` round-trip
(refused-untouched + well-formed-preserved). Chosen behavior is fail-loud +
manual fix; auto-repair (`reindex --repair`) considered and declined.
Spec: specs/fix-learning-add-index-data-loss.md.

- [x] Make 'ctx kb reindex' nesting-aware: scan topics/** not topics/* (grouped
topic folders currently blank the CTX:
KB:TOPICS block) #priority:medium #session:c3d2dcb1 #branch:
Expand Down Expand Up @@ -618,7 +660,7 @@ Many call sites use `_ =` or `_, _ =` to discard errors without
any feedback. Some are legitimate (best-effort cleanup), most are
lazy escapes that hide failures.

- [ ] EH.1: Catalogue all silent error discards — recursive walk of
- [x] EH.1: Catalogue all silent error discards — recursive walk of
`internal/`
for patterns: `_ = `, `_, _ = `, `//nolint:errcheck`, bare `return` after
error-producing calls. Group by category:
Expand All @@ -633,34 +675,70 @@ lazy escapes that hide failures.
DoD: every `_ =` in the codebase is categorised and has a
recommended action
#priority:high #added:2026-03-14

- [ ] EH.2: Address category (b) — file write/read discards. These risk silent
#completed:2026-06-01 #branch:fix/learning-add-index-data-loss
Done: `.context/audit/eh-silent-errors.md` catalogues all 184 non-test
discard sites with category + recommended action. Surfaced 4 high-priority
data-loss/crash findings (memory/publish MergePublished, pad/store
ReadEntriesWithIDs, hub/replicate Append, memory status nil-deref) plus 11
write-handle defer-closes. Real fix workload ≈52 sites (B/A/C/SURFACE/
NIL-DEREF); category (d) fmt.Fprint output is an accepted end-state per EH.5
DoD. Tests excluded from this pass.

- [x] EH.2: Address category (b) — file write/read discards. These risk silent
data loss. Fix: return the error, or at minimum emit to stderr with
`fmt.Fprintf(os.Stderr, "ctx: ...: %v\n", err)` following the pattern
established in `internal/log/event.go`.
DoD: no write/read error is silently discarded
#priority:high #added:2026-03-14

- [ ] EH.3: Address category (a) — file close in defer. Most are `defer func()
#completed:2026-06-01 #branch:fix/learning-add-index-data-loss
Done: pad ReadEntriesWithIDs + hub replicate Append (commit b66816cd);
marshal-error returns for the vscode/copilot/blocknonpathctx writers,
journal ScanDirectory + drift reload via logWarn (41e223f5). The
established sink turned out to be `internal/log/warn` (logWarn.Warn),
used throughout. Verified each site by reading before editing — two of
the catalogue's name-inferred B findings were false positives
(MergePublished bool, LoadState value type).

- [x] EH.3: Address category (a) — file close in defer. Most are `defer func()
{ _ = f.Close() }()`. For read-only files, close errors are rare but
should still surface. For write/append files, close can fail if the
final flush fails — these are data loss. Fix: `if err := f.Close();
err != nil { fmt.Fprintf(os.Stderr, "ctx: close %s: %v\n", path, err) }`.
DoD: all defer-close sites log failures to stderr
#priority:medium #added:2026-03-14

- [ ] EH.4: Address category (c) — os.Remove/Rename discards. These are state
#completed:2026-06-01 #branch:fix/learning-add-index-data-loss
Done: write/append-handle closes converted to a named-return merge so
a failed flush fails the op (6 kb appends, trace appendJSONL, skill
copy out, kb note Run — commit 9d07da57); read-handle and gRPC-client
closes surfaced via logWarn (commit 06109734). io/security
SafeWriteFileAtomic was already correct (meaningful close checked;
error-path closes annotated).

- [x] EH.4: Address category (c) — os.Remove/Rename discards. These are state
operations (rotation, pruning, temp file cleanup). Silent failure leaves
stale state. Fix: stderr warning at minimum; for rotation/rename, consider
returning the error.
DoD: no Remove/Rename error is silently discarded
#priority:medium #added:2026-03-14

- [ ] EH.5: Validate — `grep -rn '_ =' internal/` returns only category (d)
#completed:2026-06-01 #branch:fix/learning-add-index-data-loss
Done (commit 06a88416): os.Remove/RemoveAll surfaced via logWarn where
failure leaves stale state with real consequences — partial skill
install, violations file (duplicate alerts), sync lock (blocks sync),
hub pid file, trace marker. io/security temp-file cleanup discards are
on already-failed paths and annotated as acceptable.

- [x] EH.5: Validate — `grep -rn '_ =' internal/` returns only category (d)
entries (fmt.Fprint to stderr) and entries explicitly annotated as
acceptable. Run `make lint && make test` to confirm no regressions.
DoD: grep output is clean or fully annotated; CI green
#priority:high #added:2026-03-14
#completed:2026-06-01 #branch:fix/learning-add-index-data-loss
Done (commit f7bf7d8f): `grep -rn '_ = ' internal/` (non-test) = 68
sites — 47 category-(d) fmt.Fprint (accepted end-state) + 21 explicitly
annotated/handled. `:=`-form discards (x, _ := …) are outside this
grep's scope. make lint = 0 issues, make test = 0 failures.
Whole EH sweep: 7 commits (6ca1198a catalogue → f7bf7d8f), spec
specs/error-handling-audit.md.

- [ ] Add AST-based lint test to detect exported functions with no external
callers #added:2026-03-21-070357
Expand Down
Loading
Loading