Skip to content

fix(node): encode string chunks when serializing stream bodies#487

Merged
barjin merged 2 commits into
masterfrom
fix/serialize-body-string-chunks
Jun 22, 2026
Merged

fix(node): encode string chunks when serializing stream bodies#487
barjin merged 2 commits into
masterfrom
fix/serialize-body-string-chunks

Conversation

@barjin

@barjin barjin commented Jun 19, 2026

Copy link
Copy Markdown
Member

Fixes #486.

A request body backed by a ReadableStream that yields string chunks was copied into a Uint8Array via set(). Since set() coerces each character to NaN0, the body went out with a correct Content-Length but filled entirely with zero bytes. This surfaced downstream in Crawlee as a POST going out as content-length: 58 followed by 58 0x00 bytes.

barjin added 2 commits June 19, 2026 15:43
Adds coverage for passing a Web Request with a body, Node Readable
streams, and a ReadableStream that yields string chunks (#486). Uses a
local httpbin-like echo route so the cases don't depend on an external
service.
A ReadableStream that yields string chunks was copied into a Uint8Array
via set(), which coerces each character to NaN and writes a zero byte.
The body went out with a correct Content-Length but filled with zeros.
Normalize each chunk through TextEncoder before concatenating, and apply
the same handling to Node Readable streams and other async iterables.

Fixes #486
@github-actions github-actions Bot added this to the 143rd sprint - Tooling team milestone Jun 19, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Jun 19, 2026
@barjin barjin requested review from Pijukatel and janbuchar June 19, 2026 13:56
chunks.push(toUint8Array(chunk));
}
return { body: typedArray, type: '' };
return { body: concatUint8Arrays(chunks), type: '' };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering how the receiver of this stream knows what character set was used to encode the stream
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Type

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍 Encoding the string-based body with UTF-8 is done in undici (Node.js's native fetch implementation) as well (see source). Since we're going for compatibility with this implementation, it's imo a strong enough argument (although I see the issue too now).

Anyway, imo in 2026, it's safe to expect UTF-8 pretty much anywhere, if not specified otherwise.

@barjin barjin merged commit f52afa7 into master Jun 22, 2026
33 of 34 checks passed
@barjin barjin deleted the fix/serialize-body-string-chunks branch June 22, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fetch silently sends a zero-filled body when a ReadableStream yields string chunks

3 participants