Fix/learning add index data loss#111
Merged
Merged
Conversation
`ctx learning add` (and decision add, and every reindex command) silently destroyed entry bodies that lived between the INDEX:START / INDEX:END markers: index.Update replaces the whole marker span with a regenerated table, and ParseHeaders scanning the full file made the result look complete, masking the loss. Verified live: a LEARNINGS.md with bodies in the block collapsed to just the table on one add. Add index.Validate, a precondition guard run before any write that refuses two shapes — entry headers trapped between the markers, and markers that are missing/duplicated/out-of-order (the duplicate-marker symptom). Wire it into the two read-before-mutate choke points, entry.Write and index.Reindex, so the file is left byte-identical on refusal. Update's func(string) string signature is left untouched to keep its CRITICAL call graph stable; malformed input simply never reaches it. Zero markers stays allowed (legit fresh-index creation). Behavior is fail-loud + manual fix; auto-repair was declined. Changes: - index.Validate guard + new internal/err/index error package - i18n messages err.index.entries-in-block / .malformed-markers - guards in entry.Write (Decision/Learning) and index.Reindex - index.TestValidate (7 shapes) + entry round-trip tests (refused-untouched, well-formed-preserved) Spec: specs/fix-learning-add-index-data-loss.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Record the generalizable lesson from the index data-loss fix: code that replaces the span between managed-block markers must not trust that span to be machine-owned, and the fix is a before-mutate guard rather than smarter parsing of trapped content. Spec: specs/fix-learning-add-index-data-loss.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Catalogue every error-discard site under internal/ (non-test): 184 sites grouped by category with a recommended action, the EH.1 deliverable for Phase EH. Surfaces 4 high-priority data-loss/crash findings (memory/publish MergePublished, pad/store ReadEntriesWithIDs, hub/replicate Append, memory status nil-deref) plus 11 write-handle defer-closes. Drives the EH.2-EH.5 fixes. Spec: specs/meta/chores.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
EH high-priority fixes from the EH.1 catalogue, each read in context before editing: - pad/core/store: WriteEntries discarded the ReadEntriesWithIDs error. A missing pad reads as (nil, nil), so a non-nil error means the prior blob is unreadable/undecryptable — and the store was then overwritten with reset IDs. Return the error instead of clobbering. - hub/replicate: the follower stream discarded store.Append errors, silently losing replicated entries. Warn via logWarn (best-effort loop, no return path); new cfgWarn.HubReplicateAppend key. Also correct the catalogue: two name-inferred callouts were wrong and are reclassified after reading — memory/publish MergePublished returns (string, bool) not an error (FALSE-POS), and memory status LoadState returns a value not a pointer so it cannot nil-deref (besteffort). Adds the Phase EH spec codifying the verify-before-fix policy. Spec: specs/error-handling-audit.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
A failed Close on a write/append handle can mean the final flush
never landed — silent data loss the caller is told nothing about.
Convert these defer-closes to a named-return merge so the close
error becomes the function's error (reusing each site's existing
write-error constructor; a flush-on-close failure is a write
failure):
- write/kb/{sourcemap,evidence,row,relationship,glossary,timeline}
Append
- trace/jsonl appendJSONL
- skill/copy copyFile (out handle)
- cli/kb/cmd/note Run
Co-located read handles in the same two files are surfaced via
logWarn (read-close failures are not data loss): trace/jsonl
readJSONL and skill/copy copyFile (in handle).
Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
os.Remove/RemoveAll and a stray conn.Close were dropped on cleanup paths where a failure leaves stale state with real consequences: a partial skill-install dir, a violations file that re-reports (duplicate alerts), a stale sync lock that blocks future syncs, a stale hub pid file, an orphaned trace marker. Surface each via logWarn (cfgWarn.Remove / cfgWarn.Close); these are best-effort with no useful return path. io/security SafeWriteFileAtomic is left as-is and annotated: its tmp.Close()/os.Remove discards sit on already-failed error paths where the temp file is thrown away, and the meaningful close is already checked on the success path. Reading beat name-inference — this code was already correct. Spec: specs/error-handling-audit.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Read-handle and connection defer-closes dropped their errors. These
are not data loss, but per EH.3 they should not vanish — surface each
via logWarn:
- journal/parser copilot + copilot_cli: read-file closes
- trace resolve_entry + working_tasks: read-file closes
- hub/replicate: follower dial-conn close (keyed by masterAddr)
- connection {listen,status,register,publish,sync}, system/hubsync,
hub/status: gRPC client.Close after the command's real work
New cfgWarn.CloseHubClient key for the client/connection closes;
file closes reuse cfgWarn.Close keyed by path.
Spec: specs/error-handling-audit.md
Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Data-path marshal errors are now returned instead of writing empty output: the vscode tasks/extension/mcp config writers, the copilot vscode MCP writer, and the blocknonpathctx hook response. Two discards that lose results are now logged: journal querying skips an unreadable session dir via logWarn (cfgWarn.JournalScanDir) instead of dropping its sessions silently; drift's post-fix re-check warns on a failed context reload (cfgWarn.DriftReload) and keeps the prior context instead of detecting against an empty one. The genuinely-acceptable discards are annotated rather than changed: steering yaml.Marshal of plain frontmatter structs (cannot fail; formatters return []byte by contract), the trigger-test input echo, best-effort hook-stdin Unmarshal, the opencode path lookup whose error is handled by an empty-path fallback, and the sourcecoverage date parse that degrades to zero time on a ctx-generated column. Spec: specs/error-handling-audit.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Make the remaining `_ =` discards that EH.5's grep flags explicitly intentional, so an auditor reads them as deliberate rather than lazy: - cobra MarkFlagRequired / RegisterFlagCompletionFunc: only error on an unregistered flag name, bound immediately above (unreachable) - trace.Record / TruncatePending: best-effort provenance that must never fail the user's command - filepath.Glob in the stats watcher: nil-safe on a malformed pattern - mcp poller WriteJSON: best-effort notification push, no return path - hub cluster.Shutdown: best-effort graceful-stop teardown After this, `grep -rn '_ =' internal/` resolves to only category-(d) fmt.Fprint output and annotated/handled sites (EH.5 DoD). Spec: specs/error-handling-audit.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Mark the Phase EH fix/validate tasks done with per-task ship notes. The sweep surfaced or justified every error discard under internal/: data-path errors returned, best-effort discards routed to logWarn, write-handle closes merged into named returns, and the genuinely acceptable sites annotated. EH.5 grep clean, lint + test green. Spec: specs/error-handling-audit.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
Capture the EH-sweep lesson: an auto-generated error-discard catalogue classified by callee name / regex is a worklist of candidates, not findings. Read the discarded value's type and the call's role before fixing — name-inference produced three false positives this pass (MergePublished bool, LoadState value, io/security already-correct closes). Spec: specs/error-handling-audit.md Signed-off-by: Jose Alekhinne <jose@ctx.ist>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.