Fix misleading decompression errors for missing URLs in glob patterns#99034
Conversation
When a glob URL pattern like `{00..59}` expands to URLs that return 404,
`http_skip_not_found_url_for_globs` (enabled by default) silently swallows
the error. But the decompression wrappers (Zlib, Brotli, LZMA) didn't
handle a completely empty input stream - they attempted decompression
with zero bytes and produced misleading errors like "inflate failed:
buffer error" instead of simply returning no rows.
Fix by detecting empty streams (`total_in == 0` and inner buffer at EOF)
in `ZlibInflatingReadBuffer`, `BrotliReadBuffer`, and
`LZMAInflatingReadBuffer`. Zstd and Lz4 already handled this correctly.
#49231 (comment)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Workflow [PR], commit [5ff912f] Summary: ✅ AI ReviewSummaryThis PR fixes misleading decompression errors for empty streams returned by missing URLs in glob-based Missing context
ClickHouse Rules
Final Verdict
|
alexey-milovidov
left a comment
There was a problem hiding this comment.
But why we are even trying to decompress on 404?
We are not decompressing the 404 response body. The flow is:
So the decompressor sees an empty stream, not the 404 error page. The fix handles this edge case in the decompression buffers. The alternative would be to propagate the "not found" status through the buffer chain so the decompression wrapper can be skipped entirely, but that would be more invasive — it would require changes to |
|
LGTM |
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 91.67% (33/36, 0 noise lines excluded) |
PR ClickHouse#99034 fixed misleading decompression errors for missing URLs in glob patterns for gzip, zstd, brotli, lz4, and lzma, but missed bzip2. Bzip2ReadBuffer::nextImpl() has no empty-stream guard: when the inner stream is empty (404 URL with http_skip_not_found_url_for_globs=1), it reaches the eof check at line 126 with ret=BZ_OK and throws UNEXPECTED_END_OF_FILE instead of returning an empty result. The other formats handle this via an early check before calling the decompressor (e.g. ZlibInflatingReadBuffer checks total_in==0 && eof). This test should fail until the same guard is added to Bzip2ReadBuffer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@alexey-milovidov the fix should be made for bz2 as well: #99806 |
When http_skip_not_found_url_for_globs is enabled and a URL returns 404, all other compression formats (gzip, zstd, brotli, lz4, lzma) gracefully return 0 rows. Bzip2 was the only format missing this check, causing it to throw UNEXPECTED_END_OF_FILE instead. This adds the same empty-stream guard that was added to the other formats in ClickHouse#99034, and extends the existing test to cover bzip2. Closes ClickHouse#99806
When http_skip_not_found_url_for_globs is enabled and a URL returns 404, all other compression formats (gzip, zstd, brotli, lz4, lzma) gracefully return 0 rows. Bzip2 was the only format missing this check, causing it to throw UNEXPECTED_END_OF_FILE instead. This adds the same empty-stream guard that was added to the other formats in ClickHouse#99034, and extends the existing test to cover bzip2. Closes ClickHouse#99806
Summary
http_skip_not_found_url_for_globs(enabled by default) silently swallows the HTTP error. But the decompression wrappers (Zlib, Brotli, LZMA) didn't handle a completely empty input stream — they attempted decompression with zero bytes and produced misleading errors like "inflate failed: buffer error" instead of simply returning no rows.ZlibInflatingReadBuffer,BrotliReadBuffer, andLZMAInflatingReadBufferto detect empty streams (total_in == 0and inner buffer at EOF) and return EOF. Zstd and Lz4 already handled this correctly.#49231 (comment)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix misleading "inflate failed: buffer error" when reading non-existent compressed files via
url()table function with glob patterns. Now returns empty result as expected whenhttp_skip_not_found_url_for_globsis enabled.Documentation entry for user-facing changes
🤖 Generated with Claude Code
Note
Medium Risk
Touches core decompression read paths used by
url()and other inputs; a small logic change could affect EOF/error handling for truncated or unusual streams across multiple codecs.Overview
Fixes misleading decompression failures when a globbed
url()input is skipped due to 404s (viahttp_skip_not_found_url_for_globs) by treating a completely empty underlying stream as EOF instead of attempting to decompress.Adds empty-stream detection to
ZlibInflatingReadBuffer,LZMAInflatingReadBuffer, andBrotliReadBuffer(trackingtotal_infor brotli) and introduces a stateless test that queries missing.gz/.zst/.br/.lz4/.xzURLs and expects zero rows without errors.Written by Cursor Bugbot for commit 350d87b. This will update automatically on new commits. Configure here.