Skip to content

Fix: Fix Bzip2ReadBuffer throwing UNEXPECTED_END_OF_FILE on empty stream#101691

Merged
Avogar merged 6 commits intoClickHouse:masterfrom
clickgapai:qa-bot/fix-pr99912
Apr 14, 2026
Merged

Fix: Fix Bzip2ReadBuffer throwing UNEXPECTED_END_OF_FILE on empty stream#101691
Avogar merged 6 commits intoClickHouse:masterfrom
clickgapai:qa-bot/fix-pr99912

Conversation

@clickgapai
Copy link
Copy Markdown
Contributor

Found via ClickGap automated review. Please close or comment if this is incorrect.

Implements fix for the issue discussed in PR #99912.

Requested by @alexey-milovidov.

What this fixes

What: Added empty-stream guard to Bzip2ReadBuffer so it returns EOF instead of throwing UNEXPECTED_END_OF_FILE when the inner stream is empty.
Why: PR #99034 added this guard to all other decompression formats (gzip, zstd, brotli, lz4, lzma) but missed bzip2, causing errors when reading 404 URLs with http_skip_not_found_url_for_globs=1.
Test: Extended existing test 04032_empty_gzip_stream_from_url to cover the bzip2 case — queries a 404-returning HTTP server with .bz2 glob URLs and verifies it returns 0 rows without error.
PR pushed: yes

Changes

See diff for source files changed + test added.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

What: Added empty-stream guard to Bzip2ReadBuffer so it returns EOF instead of throwing UNEXPECTED_END_OF_FILE when the inner stream is empty.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

ClickGapAI · Fix for PR #99912

PR ClickHouse#99034 added empty-stream guards to gzip, zstd, brotli, lz4, and
lzma decompression buffers but missed bzip2. When the inner stream is
empty (e.g. a URL returned 404 with http_skip_not_found_url_for_globs),
Bzip2ReadBuffer now sets eof_flag and returns false instead of throwing
UNEXPECTED_END_OF_FILE.

Closes ClickHouse#99806

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 3, 2026

Workflow [PR], commit [3b7b1c6]

Summary:


AI Review

Summary

This PR fixes Bzip2ReadBuffer behavior on empty input streams: instead of throwing UNEXPECTED_END_OF_FILE, it now returns EOF, matching existing behavior for other compression formats and the url+glob 404 path with http_skip_not_found_url_for_globs=1. The added stateless test extension for .bz2 covers the regression path. I did not find correctness, safety, concurrency, or compatibility issues in the final PR diff.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 3, 2026
@clickgapai
Copy link
Copy Markdown
Contributor Author

CI failure in BuzzHouse (amd_debug), Stress test (arm_msan), Stateless tests (arm_binary, parallel) appears to be unrelated breakage and is not related to this change. Let me know if you'd like me to merge master and retrigger CI.

@alexey-milovidov
Copy link
Copy Markdown
Member

@clickgapai Fix the 03772_temporary_files_codec test and send a separate PR.
Fix the Inconsistent AST formatting and send a separate PR.

@clickgapai
Copy link
Copy Markdown
Contributor Author

Understood — I'll fix the 03772_temporary_files_codec test and the Inconsistent AST formatting issue and send them as separate PRs.

@clickgapai
Copy link
Copy Markdown
Contributor Author

Closing this PR per review feedback. The underlying bug analysis (Bzip2ReadBuffer missing empty-stream guard) stands and will be refiled as an issue. The 03772_temporary_files_codec test fix and Inconsistent AST formatting fix will be addressed in separate PRs.

@clickgapai
Copy link
Copy Markdown
Contributor Author

I'll work on fixing the 03772_temporary_files_codec test and the inconsistent AST formatting issue in separate PRs as requested.

@clickgapai
Copy link
Copy Markdown
Contributor Author

Filed separate PR: #101810

…exist

The install_check RPM tests use glob patterns that match all versions in the
packages directory. When both base and PR builds are present, yum fails with
version conflicts. Fix by extracting the newest version and filtering by it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh clickhouse-gh Bot added the manual approve Manual approve required to run CI label Apr 5, 2026
@alexey-milovidov alexey-milovidov self-assigned this Apr 8, 2026
@zlareb1
Copy link
Copy Markdown
Member

zlareb1 commented Apr 8, 2026

CI Failure Analysis: Stress test (arm_msan)

Verdict: UNRELATED_BREAKAGE — no action needed on our part.

The failure is MemorySanitizer: use-of-uninitialized-value (STID: 2410-47e0) + Server died. This is a known pre-existing flaky issue that has been occurring for weeks across many PRs and master commits:

  • Querying CIDB for STID 2410-47e0 in the last 90 days shows 139+ failures across hundreds of unrelated PRs and master itself (pull_request_number = 0).
  • Our test file (04032_empty_gzip_stream_from_url) does not appear anywhere in the stress test failure logs.
  • The stack trace visible in the CI log only shows generic PipelineExecutor/ThreadPool frames, with no reference to Bzip2ReadBuffer or any code we modified.

This failure should be retried or ignored as it is not caused by our changes.

alexey-milovidov and others added 4 commits April 9, 2026 20:06
…2 fix

The version-selection logic was unrelated to the `Bzip2ReadBuffer`
empty-stream guard fix. Master already has the `--allowerasing` fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 12, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (11/11) · Uncovered code

Full report · Diff report

@Avogar Avogar added this pull request to the merge queue Apr 14, 2026
Merged via the queue into ClickHouse:master with commit adb2c11 Apr 14, 2026
164 checks passed
@clickgapai
Copy link
Copy Markdown
Contributor Author

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: Bzip2ReadBuffer::nextImpl — empty stream guard missing — core code present in all supported branches.

Why: The bzip2 empty stream bug has existed since bzip2 support was added in commit 9a45458 (2021-08-07), which predates all supported branches. PR #99034 fixed other compression formats on 2026-03-09 but missed bzip2. All supported branches have this bug. The fix is small (7 lines) and safe.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants