Skip to content

Fix ZstandardStream truncating multi-frame zstd responses to the first frame#129047

Open
christosk92 wants to merge 3 commits into
dotnet:mainfrom
christosk92:fix-zstd-multiframe-129038
Open

Fix ZstandardStream truncating multi-frame zstd responses to the first frame#129047
christosk92 wants to merge 3 commits into
dotnet:mainfrom
christosk92:fix-zstd-multiframe-129038

Conversation

@christosk92
Copy link
Copy Markdown

Fixes #129038.

Problem

A zstd stream may be a sequence of frames concatenated back-to-back (RFC 8878 §3), and many encoders/CDNs emit one frame per buffer — so large Content-Encoding: zstd HTTP responses commonly arrive as multiple ~64 KB frames. ZstandardStream decoded only the first frame and silently dropped the rest, surfacing downstream as truncated payloads (in the linked issue, System.Text.Json failing at exactly BytePositionInLine: 65536).

Root cause is in ZstandardDecoder.Decompress:

if (result == 0) { _finished = true; return OperationStatus.Done; }

ZSTD_decompressStream returning 0 means end-of-frame, not end-of-stream (lib/zstd.h: "0 when a frame is completely decoded and fully flushed"). _finished is a one-way latch and Decompress starts with if (_finished) return OperationStatus.Done;, so once the first frame completes every subsequent frame is discarded.

Fix

This mirrors the existing multi-member handling in GZipStream/DeflateStream (Inflater.ResetStreamForLeftoverInput + DeflateStream.InflatorIsFinished), adapted to zstd:

  • ZstandardStream.TryDecompress now loops across frames. When the decoder reports Done and more input is available that begins with a zstd frame magic number, it continues decoding the next frame on the same native context. No reset is needed between frames — ZSTD_decompressStream automatically begins the next frame on the following call (so window-size / dictionary settings carry over), which keeps this consistent with the native streaming contract.
  • A new ZstandardDecoder.PrepareForNextFrame() (internal) clears only the managed end-of-frame latch.
  • StartsWithZstdFrame peeks the 4-byte frame magic (standard 0xFD2FB528, or skippable 0x184D2A500x184D2A5F) to distinguish a following frame from trailing non-zstd data after the final frame, which is left untouched — matching how DeflateStream treats data after the last gzip member.
  • A new _atFrameBoundary flag makes end-of-input a clean end (rather than truncated data) only when the last frame finished cleanly, so the existing strict-validation truncation detection still fires for genuinely truncated frames.

Only the streaming path was affected. The static one-shot helpers are already multi-frame-correct (ZstandardDecoder.TryDecompressZSTD_decompress, which concatenates natively; TryGetMaxDecompressedLengthZSTD_decompressBound, which is multi-frame-aware), so they are unchanged.

Tests

Added to CompressionStreamUnitTests.Zstandard.cs (sync + async each):

  • ZstandardStream_ConcatenatedFrames_DecompressesAllFrames — two concatenated frames whose payloads span the 64 KB internal buffer; asserts the full concatenated output (this fails without the fix).
  • ZstandardStream_ConcatenatedFrames_AcrossReads_DecompressesAllFrames — a non-seekable stream returning one byte per read, so each frame magic is discovered across multiple underlying reads (exercises the HttpClient-style no-rewind path and the magic-split-across-reads case).
  • ZstandardStream_FrameFollowedByTrailingData_StopsAtEndOfFrame — a single frame followed by non-zstd trailing bytes; asserts the payload decodes and the trailing bytes remain available on the (seekable) base stream.

Existing tests, including StreamTruncation_IsDetected, should continue to pass.

Notes

I haven't built the full runtime locally, so I'm relying on CI to validate. Happy to adjust the approach — in particular, whether the frame-to-frame continuation should live in ZstandardStream (as here, paralleling DeflateStream) or inside ZstandardDecoder, and whether trailing-data tolerance should match GZipStream exactly. A deterministic, dependency-free repro is linked from the issue: https://github.com/christosk92/zstd-net11-repro

ZSTD_decompressStream returns 0 at the end of each frame, not the end of the
stream, but ZstandardDecoder.Decompress treated that as end-of-stream via a
permanent _finished latch. A zstd stream made of multiple concatenated frames
(valid per RFC 8878) -- e.g. large HTTP Content-Encoding: zstd responses --
was therefore silently truncated to its first frame, surfacing downstream as
truncated payloads (e.g. JSON parse errors at exactly 65536 bytes).

ZstandardStream now continues decoding subsequent frames on the same native
context (mirroring GZipStream multi-member handling), distinguishes a following
frame from trailing data via the frame magic number, and only treats
end-of-input as a clean end when at a frame boundary. Adds regression tests
for concatenated frames (buffered, split across reads, and with trailing data).

Fixes dotnet#129038.
Copilot AI review requested due to automatic review settings June 5, 2026 15:21
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Jun 5, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @karelz, @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates Zstandard decompression to correctly handle multiple concatenated zstd frames (common in HTTP Content-Encoding: zstd) without truncation, and adds regression coverage for multi-frame and trailing-data scenarios.

Changes:

  • Extend ZstandardStream decompression to continue decoding across concatenated frames and distinguish frames vs. trailing non-zstd data.
  • Add tests to verify full output for concatenated frames (including across fragmented reads) and correct handling of trailing data after the final frame.
  • Add decoder support method to clear managed end-of-frame state between frames.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/libraries/System.IO.Compression/tests/Zstandard/CompressionStreamUnitTests.Zstandard.cs Adds regression tests for concatenated frames, across-read boundaries, and trailing-data handling.
src/libraries/System.IO.Compression/src/System/IO/Compression/Zstandard/ZstandardStream.Decompress.cs Implements multi-frame decoding logic and adjusts truncation detection at frame boundaries.
src/libraries/System.IO.Compression/src/System/IO/Compression/Zstandard/ZstandardDecoder.cs Adds PrepareForNextFrame to clear managed end-of-frame state for concatenated streams.

Comment on lines +121 to +137
if (destination.IsEmpty)
{
// The caller provided a zero-byte buffer. This is typically done in order to avoid allocating/renting
// a buffer until data is known to be available. We don't have perfect knowledge here, as _decoder.Decompress
// will return DestinationTooSmall whether or not more data is required. As such, we assume that if there's
// any data in our input buffer, it would have been decompressible into at least one byte of output, and
// otherwise we need to do a read on the underlying stream. This isn't perfect, because having input data
// doesn't necessarily mean it'll decompress into at least one byte of output, but it's a reasonable approximation
// for the 99% case. If it's wrong, it just means that a caller using zero-byte reads as a way to delay
// getting a buffer to use for a subsequent call may end up getting one earlier than otherwise preferred.
Debug.Assert(status == OperationStatus.DestinationTooSmall);
if (_buffer.ActiveLength != 0)
{
Debug.Assert(bytesWritten == 0);
return true;
}
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive. ZstandardDecoder.Decompress returns DestinationTooSmall (never NeedMoreData) whenever destination is empty, via an unconditional guard before any native call. So at this point the status is always DestinationTooSmall, including for 0-byte reads. This matches the pre-existing assert, so leaving it as-is.

Comment on lines +425 to +429
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) =>
Task.FromResult(Read(buffer.AsSpan(offset, count)));

public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) =>
new ValueTask<int>(Read(buffer.Span));
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Both ReadAsync overrides now return a canceled Task/ValueTask when the token is already canceled.

Comment on lines +105 to +111
// Fewer than ZstdFrameMagicLength bytes remain: not enough to tell whether another
// frame follows (its magic number may be split across reads) or this was the last
// frame. Hand back any output now and resolve on the next call / underlying read.
// Because we're at a frame boundary, end-of-input is treated as a clean end rather
// than truncation (see _atFrameBoundary checks in Read/ReadAsync).
lastResult = OperationStatus.NeedMoreData;
return bytesWritten != 0;
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed the inconsistency too: at a frame boundary, end-of-input now reports Done so 1 to 3 trailing bytes are rewound on a seekable stream, matching the >= 4 byte case. Added a test for a frame followed by 1 to 3 trailing bytes.

- Report Done at end-of-input when at a frame boundary so 1-3 trailing bytes
  after the final frame are rewound on a seekable base stream, consistent with
  the >= 4 byte trailing-data case.
- SingleByteReadStream.ReadAsync test helper now honors the CancellationToken.
- Add regression test for a frame followed by 1-3 trailing bytes (seekable).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines +121 to +131
if (destination.IsEmpty)
{
// The caller provided a zero-byte buffer. This is typically done in order to avoid allocating/renting
// a buffer until data is known to be available. We don't have perfect knowledge here, as _decoder.Decompress
// will return DestinationTooSmall whether or not more data is required. As such, we assume that if there's
// any data in our input buffer, it would have been decompressible into at least one byte of output, and
// otherwise we need to do a read on the underlying stream. This isn't perfect, because having input data
// doesn't necessarily mean it'll decompress into at least one byte of output, but it's a reasonable approximation
// for the 99% case. If it's wrong, it just means that a caller using zero-byte reads as a way to delay
// getting a buffer to use for a subsequent call may end up getting one earlier than otherwise preferred.
Debug.Assert(status == OperationStatus.DestinationTooSmall);
Comment on lines +328 to +329
Assert.Equal(expected.Length, output.Length);
Assert.Equal(expected, output.ToArray());
@christosk92
Copy link
Copy Markdown
Author

@dotnet-policy-service agree

The while(true) loop assigns lastResult before any use, so the pre-loop
initializer was a dead store (IDE0059, treated as a build error). Also drop
the redundant re-assignment of Done in the trailing-data branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.IO.Compression community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HttpClient / ZstandardStream silently truncates multi-frame zstd responses to the first frame (.NET 11 preview 4)

2 participants