Skip to content

Fix/learning add index data loss#111

Merged
josealekhine merged 11 commits into
mainfrom
fix/learning-add-index-data-loss
Jun 2, 2026
Merged

Fix/learning add index data loss#111
josealekhine merged 11 commits into
mainfrom
fix/learning-add-index-data-loss

Conversation

@josealekhine
Copy link
Copy Markdown
Member

No description provided.

`ctx learning add` (and decision add, and every reindex command)
silently destroyed entry bodies that lived between the INDEX:START /
INDEX:END markers: index.Update replaces the whole marker span with a
regenerated table, and ParseHeaders scanning the full file made the
result look complete, masking the loss. Verified live: a LEARNINGS.md
with bodies in the block collapsed to just the table on one add.

Add index.Validate, a precondition guard run before any write that
refuses two shapes — entry headers trapped between the markers, and
markers that are missing/duplicated/out-of-order (the duplicate-marker
symptom). Wire it into the two read-before-mutate choke points,
entry.Write and index.Reindex, so the file is left byte-identical on
refusal. Update's func(string) string signature is left untouched to
keep its CRITICAL call graph stable; malformed input simply never
reaches it. Zero markers stays allowed (legit fresh-index creation).

Behavior is fail-loud + manual fix; auto-repair was declined.

Changes:
- index.Validate guard + new internal/err/index error package
- i18n messages err.index.entries-in-block / .malformed-markers
- guards in entry.Write (Decision/Learning) and index.Reindex
- index.TestValidate (7 shapes) + entry round-trip tests
  (refused-untouched, well-formed-preserved)

Spec: specs/fix-learning-add-index-data-loss.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Record the generalizable lesson from the index data-loss fix: code
that replaces the span between managed-block markers must not trust
that span to be machine-owned, and the fix is a before-mutate guard
rather than smarter parsing of trapped content.

Spec: specs/fix-learning-add-index-data-loss.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Catalogue every error-discard site under internal/ (non-test): 184
sites grouped by category with a recommended action, the EH.1
deliverable for Phase EH. Surfaces 4 high-priority data-loss/crash
findings (memory/publish MergePublished, pad/store ReadEntriesWithIDs,
hub/replicate Append, memory status nil-deref) plus 11 write-handle
defer-closes. Drives the EH.2-EH.5 fixes.

Spec: specs/meta/chores.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
EH high-priority fixes from the EH.1 catalogue, each read in context
before editing:

- pad/core/store: WriteEntries discarded the ReadEntriesWithIDs error.
  A missing pad reads as (nil, nil), so a non-nil error means the prior
  blob is unreadable/undecryptable — and the store was then overwritten
  with reset IDs. Return the error instead of clobbering.
- hub/replicate: the follower stream discarded store.Append errors,
  silently losing replicated entries. Warn via logWarn (best-effort
  loop, no return path); new cfgWarn.HubReplicateAppend key.

Also correct the catalogue: two name-inferred callouts were wrong and
are reclassified after reading — memory/publish MergePublished returns
(string, bool) not an error (FALSE-POS), and memory status LoadState
returns a value not a pointer so it cannot nil-deref (besteffort). Adds
the Phase EH spec codifying the verify-before-fix policy.

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
A failed Close on a write/append handle can mean the final flush
never landed — silent data loss the caller is told nothing about.
Convert these defer-closes to a named-return merge so the close
error becomes the function's error (reusing each site's existing
write-error constructor; a flush-on-close failure is a write
failure):

- write/kb/{sourcemap,evidence,row,relationship,glossary,timeline}
  Append
- trace/jsonl appendJSONL
- skill/copy copyFile (out handle)
- cli/kb/cmd/note Run

Co-located read handles in the same two files are surfaced via
logWarn (read-close failures are not data loss): trace/jsonl
readJSONL and skill/copy copyFile (in handle).

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
os.Remove/RemoveAll and a stray conn.Close were dropped on cleanup
paths where a failure leaves stale state with real consequences:
a partial skill-install dir, a violations file that re-reports
(duplicate alerts), a stale sync lock that blocks future syncs, a
stale hub pid file, an orphaned trace marker. Surface each via
logWarn (cfgWarn.Remove / cfgWarn.Close); these are best-effort with
no useful return path.

io/security SafeWriteFileAtomic is left as-is and annotated: its
tmp.Close()/os.Remove discards sit on already-failed error paths
where the temp file is thrown away, and the meaningful close is
already checked on the success path. Reading beat name-inference —
this code was already correct.

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Read-handle and connection defer-closes dropped their errors. These
are not data loss, but per EH.3 they should not vanish — surface each
via logWarn:

- journal/parser copilot + copilot_cli: read-file closes
- trace resolve_entry + working_tasks: read-file closes
- hub/replicate: follower dial-conn close (keyed by masterAddr)
- connection {listen,status,register,publish,sync}, system/hubsync,
  hub/status: gRPC client.Close after the command's real work

New cfgWarn.CloseHubClient key for the client/connection closes;
file closes reuse cfgWarn.Close keyed by path.

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Data-path marshal errors are now returned instead of writing empty
output: the vscode tasks/extension/mcp config writers, the copilot
vscode MCP writer, and the blocknonpathctx hook response.

Two discards that lose results are now logged: journal querying skips
an unreadable session dir via logWarn (cfgWarn.JournalScanDir) instead
of dropping its sessions silently; drift's post-fix re-check warns on a
failed context reload (cfgWarn.DriftReload) and keeps the prior context
instead of detecting against an empty one.

The genuinely-acceptable discards are annotated rather than changed:
steering yaml.Marshal of plain frontmatter structs (cannot fail;
formatters return []byte by contract), the trigger-test input echo,
best-effort hook-stdin Unmarshal, the opencode path lookup whose error
is handled by an empty-path fallback, and the sourcecoverage date parse
that degrades to zero time on a ctx-generated column.

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Make the remaining `_ =` discards that EH.5's grep flags explicitly
intentional, so an auditor reads them as deliberate rather than lazy:

- cobra MarkFlagRequired / RegisterFlagCompletionFunc: only error on
  an unregistered flag name, bound immediately above (unreachable)
- trace.Record / TruncatePending: best-effort provenance that must
  never fail the user's command
- filepath.Glob in the stats watcher: nil-safe on a malformed pattern
- mcp poller WriteJSON: best-effort notification push, no return path
- hub cluster.Shutdown: best-effort graceful-stop teardown

After this, `grep -rn '_ =' internal/` resolves to only category-(d)
fmt.Fprint output and annotated/handled sites (EH.5 DoD).

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Mark the Phase EH fix/validate tasks done with per-task ship notes.
The sweep surfaced or justified every error discard under internal/:
data-path errors returned, best-effort discards routed to logWarn,
write-handle closes merged into named returns, and the genuinely
acceptable sites annotated. EH.5 grep clean, lint + test green.

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Capture the EH-sweep lesson: an auto-generated error-discard
catalogue classified by callee name / regex is a worklist of
candidates, not findings. Read the discarded value's type and the
call's role before fixing — name-inference produced three false
positives this pass (MergePublished bool, LoadState value, io/security
already-correct closes).

Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
@josealekhine josealekhine self-assigned this Jun 2, 2026
@josealekhine josealekhine merged commit 6e56077 into main Jun 2, 2026
16 checks passed
@josealekhine josealekhine deleted the fix/learning-add-index-data-loss branch June 2, 2026 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant