fix(blob): clean up orphaned blob files on aborted/superseded writes by kriszyp · Pull Request #462 · HarperFast/harper

kriszyp · 2026-05-03T00:48:32Z

Summary

Several pre-commit blob save paths could leave a backing file on disk with no record referencing it. This PR plugs them: the file gets cleaned up after the abort, or after a successful commit if the commit handler took an early-return.

Concrete orphan paths fixed:

Multi-write transaction where one beforeIntermediate rejects after others wrote files (only the failing blob was cleaned up by deleteOnFailure; succeeded peers leaked).
Commit-handler early-returns: out-of-order replicated update treated as duplicate, superseded by a newer put/delete, no-audit fullUpdate loses to existing version, cache-resolve version-changed.
DatabaseTransaction.commit rejecting on Promise.all(completions) failure (e.g. blob save error) without aborting — leaked the underlying RocksDB transaction and the blob files.

Approach

startPreCommitBlobsForRecord now returns { blobs, complete }; Table.ts attaches savedBlobs to each TransactionWrite.
Commit handlers set write.skipped = true (resetting on each invocation) on early-return paths that don't end up writing a record/audit.
Both transaction commit success paths (DatabaseTransaction.commit, LMDBTransaction.commit) walk writes and cleanupUnusedBlobs(write.savedBlobs) for skipped ones. Both abort() paths run the cleanup unconditionally.
DatabaseTransaction.commit adds an explicit reject handler so a completions failure aborts (instead of leaking).

Why deferred and not inline: the commit handler runs again on optimistic-lock retries. A retry can flip a previously-skipped write into a successful one (e.g. concurrent delete makes our older replicated update suddenly win). Inline cleanup would race the deletion's setTimeout against the retry that referenced the blob.

A companion PR in harper-pro fixes one orphan path in the replication decode loop.

Where to look

resources/blob.ts: new cleanupUnusedBlobs and refactored startPreCommitBlobsForRecord.
resources/Table.ts: 4 addWrite call sites refactored to the const write = {…}; write.beforeIntermediate = …; addWrite(write) shape so the commit closure can reference write.savedBlobs/write.skipped. Five early-return paths set write.skipped = true.
resources/DatabaseTransaction.ts & resources/LMDBTransaction.ts: post-commit walk + abort cleanup. The DatabaseTransaction.commit reject handler is a behavioral change worth careful eyes.
DESIGN.md: design notes for future agents extending blob storage.

Test plan

New Blob test > multi-write transaction with one failing blob cleans up succeeded blobs — passes locally.
New cleanupUnusedBlobs is a no-op for unsaved blobs — passes locally.
Existing post-suite cleanupOrphans test — passes (catches regressions across all blob tests).
Integration: out-of-order replication scenario in a multi-node cluster — would be valuable; not added here.
Local: 14 pre-existing blob tests fail with an unrelated RecordEncoder.this.encoder is undefined error in this dev environment; CI should be unaffected.

🤖 Generated with Claude Code

Blobs flagged with `saveBeforeCommit` (or `saveInRecord`) are written to disk in the `beforeIntermediate` phase of a transaction write, before the underlying record write commits. Several paths could leave the file on disk with no record referencing it: - Multi-write transaction where one `beforeIntermediate` rejected after others wrote files (only the failing blob was cleaned up by `deleteOnFailure`; succeeded peers were leaked). - Commit-handler early-returns: out-of-order replicated update treated as duplicate, superseded by a newer put/delete, no-audit fullUpdate loses to existing version, cache-resolve version-changed. - `DatabaseTransaction.commit` rejecting on `Promise.all(completions)` failure (e.g. blob save error) without aborting — leaked the underlying RocksDB transaction *and* the blob files. - Mid-record decode failure in the replication ingest path. Approach: - `startPreCommitBlobsForRecord` now returns `{ blobs, complete }` so each `TransactionWrite` can attach `savedBlobs` for later cleanup. - New `cleanupUnusedBlobs(blobs)` waits for any in-flight save to settle, then unlinks. Idempotent — clears the list after scheduling. - Commit handlers in `Table.ts` set `write.skipped = true` (resetting to `false` on each invocation) on early-return paths that don't end up writing a record or audit reference. The transaction commit success paths walk writes and clean up `skipped` ones. Cleanup is deferred to post-commit because the same handler runs again on optimistic-lock retries and a retry can flip a previously-skipped write into a successful one (e.g. a concurrent delete makes our older replicated update suddenly win). - `LMDBTransaction.abort` and `DatabaseTransaction.abort` walk all writes' `savedBlobs` and clean unconditionally. - `DatabaseTransaction.commit` adds an explicit reject handler so a `Promise.all` failure aborts instead of leaking. Tests: new `Blob test > multi-write transaction with one failing blob cleans up succeeded blobs` and `cleanupUnusedBlobs is a no-op for unsaved blobs`. The post-suite `cleanupOrphans` test acts as a safety net across all blob tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-03T00:52:46Z

Reviewed; no blockers found.

…KNOWN_SIZE When all blob content is read and the reader gets bytesRead=0, it re-reads the header (still UNKNOWN_SIZE) and sets up an fs.watch watcher. If the writer finishes and updates the header between that header read and the watcher being created, no further writes occur and the watcher never fires, causing a 60s timeout with 'size is supposed to be 281474976710655 bytes'. Fix: after readSync at `position` returns 0, immediately re-read the header from position 0. If the size is no longer UNKNOWN_SIZE, bypass the watcher and retry directly. Apply the same check in the timeout callback as a second line of defense. Also eliminates the shared module-level HEADER/headerView buffers in favor of local DataView instances, avoiding any shared-state risk. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…riterFinished Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lect phase When a join condition (e.g. relatedByName) becomes the primary iterator due to a lower estimated count, the select phase uses filterMap.hasMappings=true and calls targetTable.transformEntryForSelect with the outer table's readTxn. That readTxn is a RocksTransaction scoped to the outer table's RocksDB column family and cannot read from a different table's column family, so getEntry returns null, producing [undefined] in the joined attribute. Fix: compute targetReadTxn = targetTable._readTxnForContext(context) when the target table differs from the current table, ensuring each joined table's records are loaded with a transaction scoped to that table's own store. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kriszyp requested a review from a team as a code owner May 3, 2026 00:48

kriszyp mentioned this pull request May 3, 2026

fix(replication): clean up blobs accepted during failed decode HarperFast/harper-pro#113

Open

1 task

kriszyp and others added 3 commits May 2, 2026 19:17

Formatting

4988c8f

refactor(blob): DRY up duplicated header re-read logic into resumeIfW…

66f9607

…riterFinished Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kriszyp mentioned this pull request May 5, 2026

Save & delete blob test fails intermittently #19

Open

4 tasks

cb1kenobi approved these changes May 5, 2026

View reviewed changes

kriszyp added patch labels May 5, 2026

Formatting

334938d

kriszyp merged commit f85b5b7 into main May 6, 2026
26 of 27 checks passed

kriszyp deleted the fix/blob-orphan-cleanup branch May 6, 2026 01:45

github-actions Bot mentioned this pull request May 6, 2026

cherry-pick: fix(blob): clean up orphaned blob files on aborted/superseded writes → v5.0 (conflict) #479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(blob): clean up orphaned blob files on aborted/superseded writes#462

fix(blob): clean up orphaned blob files on aborted/superseded writes#462
kriszyp merged 6 commits into
mainfrom
fix/blob-orphan-cleanup

kriszyp commented May 3, 2026

Uh oh!

claude Bot commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kriszyp commented May 3, 2026

Summary

Approach

Where to look

Test plan

Uh oh!

claude Bot commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants