fix(update): wait for writer flush, honour backpressure, clean up tmp on failure#158
fix(update): wait for writer flush, honour backpressure, clean up tmp on failure#158Ricardo-M-L wants to merge 2 commits into
Conversation
…p tmp `downloadFile()` in `src/update/self-update.ts` had three concrete reliability issues, all triggered in the self-update path: 1. `resolve()` was called as soon as the read loop hit `done`, but `writer.end()` only *queues* the close — bytes may still be in the kernel pagecache. The very next step, `verifySha256()`, reads back the same file; on slower disks (HDD, network FS, CI runners) this races the flush and surfaces as a spurious "Checksum mismatch" error mid-update. Replace the `resolve()` call with a `writer.on( 'finish')` handler so we only proceed once the file is durably closed. 2. `writer.write(value)` was called without honouring its boolean return value. For large binaries the writer's internal buffer grows without bound until the entire download is in memory. Wait for the `'drain'` event when `write()` returns `false`. 3. On any error (network reset, 5xx mid-stream, writer error), the half-written file at `dest` was left in `tmpdir()`. Unlink it in a catch block before rethrowing. 4. The Web Streams reader's lock was never released — neither on the happy path nor on errors. Move the `releaseLock()` into a `finally` so the underlying body stream is always freed. No behavior change on the success path beyond the (correct) flush wait. `tsc --noEmit` passes.
NianJiuZst
left a comment
There was a problem hiding this comment.
Thanks for tackling the reliability issues in this path. I agree with waiting for finish and honoring backpressure, but I don't think the error path is fully cleaned up yet.
There is also still no test coverage for src/update/self-update.ts, so the new finish / drain / cleanup / releaseLock() behavior can regress without anything catching it. Please add at least one mocked stream test that exercises a mid-stream failure and verifies the temp file is cleaned up only after the writer is fully torn down.
Reviewed with GPT-5.5 (xhigh reasoning).
| if (done) { writer.end(); break; } | ||
| // Honour backpressure so we don't grow the writer's internal buffer | ||
| // unboundedly on large binaries / slow disks. | ||
| if (!writer.write(value)) { |
There was a problem hiding this comment.
On the failure path we only unlink dest, but we never close or destroy writer unless the loop reaches the done branch. If reader.read() rejects, or if writer.write() returns false and the writer errors before drain, this block can exit with a live WriteStream and an in-flight pump(). Please mirror the shutdown pattern from src/files/download.ts: tear down the writer in finally, wait for finish/error, and only then remove the temporary file.
Review feedback on MiniMax-AI#158 (thanks @NianJiuZst): the error path needs to tear the writer down BEFORE unlinking dest, otherwise the writer's buffered bytes can race the unlink and leave a partial file in the pagecache. And the function still had no test coverage, so the `finish`/`drain`/cleanup/`releaseLock()` behaviour could regress silently. Changes: * `downloadFile`: on error, `writer.destroy()` and await `'close'` before calling `unlinkSync(dest)`. Skip the wait when the writer is already torn down so the rejection's exception isn't masked. * Export `downloadFile` from `src/update/self-update.ts` so unit tests can drive it directly (still not part of the public CLI surface — callers go through `applySelfUpdate`). * Add `test/update/self-update.test.ts` with three tests, matching the existing `test/files/download.test.ts` style: - mid-stream `ReadableStream` error path: rejects, dest gone, tempdir empty (the case the reviewer specifically asked for) - happy path: payload written, dest exists - 4xx response: rejects with CLIError, dest not created `npx tsc --noEmit` passes.
|
Thanks @NianJiuZst, both points addressed in 0791c28: 1. Error teardown order. } catch (err) {
if (!writer.destroyed) {
await new Promise<void>(r => {
writer.once('close', () => r());
writer.destroy();
});
}
try { (await import('fs')).unlinkSync(dest); } catch { /* best-effort */ }
throw err;
}So any buffered bytes in the writer are flushed/discarded before unlink, no pagecache race. 2. Test coverage. Added
|
What does this PR do?
downloadFile()insrc/update/self-update.tshad three concrete reliability issues, all triggered in the self-update path:1. Premature
resolve()racing the writer flushThe promise resolved as soon as the read loop hit
done, butwriter.end()only queues the close — bytes may still be in the kernel pagecache. The very next step isverifySha256(tmp, target.checksum), which reads the file back. On slower disks (HDD, network FS, CI runners with bursty I/O) this races the flush and surfaces as a spuriousChecksum mismatcherror in the middle of an update.Fix: Wait for
writer.on('finish')before resolving.2. Missing backpressure on
writer.write()writer.write(value)was called without honouring its boolean return value. For large binaries the writer's internal buffer grows without bound until the entire download is in memory.Fix: Await the
drainevent whenwrite()returnsfalse.3. Half-downloaded binary left in
tmpdir()on failureOn any error (network reset, 5xx mid-stream, writer error), the partial file at
destwas left intmpdir()permanently.Fix:
unlinkSync(dest)(best-effort) in acatchbefore rethrowing.4. Web Streams reader lock never released
The reader's lock was never released — neither on the happy path nor on errors. The Web Streams API contract requires paired acquire/release; without the release the underlying response body stays locked.
Fix: Move
reader.releaseLock()into afinally.Test plan
npx tsc --noEmitpassesfsyncthrottling) and verify the checksum step no longer reports mismatchesWhy fix together?
All four are in the same 25-line function and share the same restructure (wrap the
await new Promise()intry/catch/finally). Splitting would mean three separate refactors of the same block.