Skip to content

Fix Parquet bloom filter segfault in prefetch range coalescing#102385

Merged
al13n321 merged 3 commits intoClickHouse:masterfrom
groeneai:fix-parquet-bloom-filter-prefetch-oob
Apr 13, 2026
Merged

Fix Parquet bloom filter segfault in prefetch range coalescing#102385
al13n321 merged 3 commits intoClickHouse:masterfrom
groeneai:fix-parquet-bloom-filter-prefetch-oob

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix segfault (or LOGICAL_ERROR in debug builds) when reading Parquet files with bloom filter push down enabled and WHERE clause equality/inequality conditions. The crash occurred due to an out-of-bounds memory access in the Parquet prefetcher's bloom filter data retrieval, and could also cause non-deterministic wrong query results.

What was the root cause?

The Parquet prefetcher coalesces nearby byte ranges into single I/O tasks. When scanning leftward for ranges to include, it updated start_offset via std::min but did not update end_offset via std::max. This is correct when the "left" range truly ends before the initial range, but incorrect when two ranges share the same start offset but have different lengths.

Specifically, bloom_filter_header_prefetch (256 bytes) and bloom_filter_data_prefetch (header + data, often thousands of bytes or more) both start at bloom_filter_offset. After std::sort (which is not stable), the longer range could end up before the shorter one in the sorted array. When the shorter range's task creation scanned leftward and found the longer range:

  1. It included the longer range (both have allow_incidental_read = true because both are smaller than min_bytes_for_seek = 64KB)
  2. It set start_offset = std::min(start_offset, r.start) — no change since starts are equal
  3. It did not set end_offset = std::max(end_offset, r.end)this is the bug
  4. The task was created covering only 256 bytes (the header)
  5. The longer data range was assigned to this undersized task
  6. Later, findAnyHashgetRangeData tried to read bloom filter blocks at offsets far beyond the 256-byte buffer

This caused:

  • Debug builds: LOGICAL_ERROR assertion failure (req->task_offset + req->length <= task->buf.size())
  • Release builds: Segfault (read access violation at unmapped addresses)
  • Non-deterministic wrong results: When the OOB read happened to access valid memory, the bloom filter check would return garbage, causing random row groups to be filtered in or out

The fix

Add end_offset = std::max(end_offset, r.end) to the leftward coalescing scan, and symmetrically start_offset = std::min(start_offset, r.start) to the rightward scan.

Regression test

04097_parquet_bloom_filter_overlapping_ranges — creates a Parquet file with the exact reproduction case from the issue (5M rows, Nullable columns, bloom filter with varying block counts), then runs the query 10 times comparing against a bloom-filter-disabled reference.

Closes #102257

The Parquet prefetcher's range coalescing scan had a bug where the
leftward scan did not update end_offset when including a range that
extends past the current end. This caused an undersized read task when
two ranges share the same start offset but have different lengths.

Specifically, bloom_filter_header_prefetch and bloom_filter_data_prefetch
both start at bloom_filter_offset. The header range is 256 bytes (the
max header size), while the data range is header_size + numBytes (often
much larger). After std::sort (which is not stable), the longer data
range could be placed before the shorter header range. When the header's
task creation scanned leftward and included the data range, it only
updated start_offset via std::min but did NOT update end_offset via
std::max. The resulting task covered only 256 bytes (the header), but
the data range (up to megabytes) was assigned to it. Later, findAnyHash
called getRangeData on a bloom filter block far beyond the task buffer,
causing:
- In debug builds: LOGICAL_ERROR assertion
  (req->task_offset + req->length <= task->buf.size())
- In release builds: segfault (read access violation)
- Non-deterministic wrong query results (random row group filtering)

The fix adds end_offset = std::max(end_offset, r.end) to the leftward
scan, and symmetrically start_offset = std::min(start_offset, r.start)
to the rightward scan.

Closes ClickHouse#102257
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR Validation (session: cron:clickhouse-ci-task-worker:20260410-131500)

a) Deterministic repro? Yes. The crash reproduces on the very first run with the exact query from the issue. No randomization needed — the bug is triggered whenever std::sort places the larger range before the smaller one in the sorted array (which happens reliably given the specific column metadata layout in the test file).

b) Root cause explained? The Prefetcher's pickRangesAndCreateTaskIfNotExists leftward coalescing scan includes nearby ranges but only updates start_offset — not end_offset. When two ranges share the same start (bloom filter header at [offset, offset+256) and data at [offset, offset+4112)), and the longer one appears first after sorting, the scan includes it but the task only covers the shorter range's extent. The data range is assigned to this undersized task, so getRangeData later reads beyond the task buffer.

c) Fix matches root cause? Yes. The fix directly addresses the missing end_offset update: adds end_offset = std::max(end_offset, r.end) to the leftward scan (and symmetrically start_offset = std::min(start_offset, r.start) to the rightward scan).

d) Test intent preserved? / New tests added? New regression test 04097_parquet_bloom_filter_overlapping_ranges reproduces the exact issue scenario.

e) Demonstrated in both directions?

  • Without fix: crashes immediately (LOGICAL_ERROR in debug, segfault in release)
  • With fix: 50/50 passes with consistent deterministic results (4991220 every run)

@groeneai
Copy link
Copy Markdown
Contributor Author

cc @al13n321 @nikitamikhaylov — could you review this? The fix is in Prefetcher::pickRangesAndCreateTaskIfNotExists: the leftward coalescing scan included overlapping ranges but didn't extend end_offset, causing an undersized read task when the bloom filter header and data prefetch ranges share the same start offset.

@den-crane
Copy link
Copy Markdown
Contributor

den-crane commented Apr 10, 2026

Check this issue #102231

try this reproducer (less CPU intensive):

INSERT INTO FUNCTION file('bloom_bug.parquet', 'Parquet')
SETTINGS engine_file_truncate_on_insert = 1
SELECT
    IF(number % 113 = 0, toString(number), '') AS col1,
    toString(number) AS col2
FROM numbers(50000);

SELECT 'BLOOM ON (buggy)' AS test, count() AS result
FROM file('bloom_bug.parquet', Parquet, 'col1 String, col2 String')
WHERE col1 = ''
SETTINGS input_format_parquet_bloom_filter_push_down = 1;
┌─test─────────────┬─result─┐
│ BLOOM ON (buggy) │      0 │
└──────────────────┴────────┘


SELECT 'BLOOM OFF (correct)' AS test, count() AS result
FROM file('bloom_bug.parquet', Parquet, 'col1 String, col2 String')
WHERE col1 = ''
SETTINGS input_format_parquet_bloom_filter_push_down = 0;
┌─test────────────────┬─result─┐
│ BLOOM OFF (correct) │  49557 │
└─────────────────────┴────────┘

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Apr 10, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 10, 2026

Workflow [PR], commit [48513ed]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, flaky check) failure
02044_url_glob_parallel FAIL cidb, issue ISSUE CREATED
02044_url_glob_parallel FAIL cidb IGNORED
02044_url_glob_parallel FAIL cidb IGNORED
02044_url_glob_parallel FAIL cidb IGNORED
02044_url_glob_parallel FAIL cidb IGNORED
Stress test (arm_msan) failure
Server died FAIL cidb, issue ISSUE EXISTS
MemorySanitizer: use-of-uninitialized-value (STID: 4179-5154) FAIL cidb, issue ISSUE EXISTS

AI Review

Summary

This PR fixes an out-of-bounds access in Parquet prefetch range coalescing when overlapping ranges share the same start offset but differ in length. The fix is surgical (end_offset = max(...) on left scan, symmetric start_offset = min(...) on right scan), matches the reported root cause, and is covered by a focused regression test reproducing both the original failure mode and correctness instability. I did not find correctness, safety, performance, or compatibility regressions in the submitted changes.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 10, 2026
@nikitamikhaylov
Copy link
Copy Markdown
Member

nikitamikhaylov commented Apr 10, 2026

@groeneai Check the repro from above and all the test failures.

…test

- Reduce data size from 5M to 500K rows (bug is structural, not
  volume-dependent — triggers at any data size with bloom filters)
- Reduce iterations from 10 to 5 (crash is deterministic)
- Add den-crane's simpler non-nullable reproducer from ClickHouse#102231 as a
  second test case to cover both crash and wrong-results scenarios

This fixes the 180s timeout on amd_tsan flaky check.
@groeneai
Copy link
Copy Markdown
Contributor Author

@nikitamikhaylov @den-crane — verified the reproducer and checked all test failures:

den-crane's reproducer ✅

Confirmed the simpler reproducer from #102231 hits the exact same root causeLOGICAL_ERROR: req->task_offset + req->length <= task->buf.size() in debug builds (segfault in release builds). With the fix applied:

Query Without fix With fix
BLOOM ON (50K rows) Crash / 0 (wrong) 49557
BLOOM OFF 49557 49557

Also tested the #102231 reproducer with Nullable columns (500K rows) — all three modes return identical results (499135) with the fix.

I've added den-crane's simpler reproducer as a second test case in the regression test.

CI test failures

  1. 04097_parquet_bloom_filter_overlapping_ranges on amd_tsantimeout (>180s). Fixed: reduced data from 5M to 500K rows (the bug is structural — it triggers at any data size as long as bloom filter ranges overlap, not volume-dependent) and iterations from 10→5. New commit pushed.

  2. 02223_insert_select_schema_inference on amd_binaryunrelated flaky test, not caused by this PR. It fails across other PRs too.

Issue #102231

Same root cause — the undersized prefetch task causes the bloom filter findAnyHash to read truncated/garbage data, which returns false "not exists" answers and incorrectly skips row groups → count() = 0. This PR fixes both #102257 (crash) and #102231 (wrong results). Added Closes #102231 reference to the PR description would be appropriate.

Note: PR #99993 addresses a separate Nullable filtering issue and does not fix this prefetcher bug (confirmed by @rienath in #102231).

Comment thread src/Processors/Formats/Impl/Parquet/Prefetcher.cpp Outdated
@al13n321
Copy link
Copy Markdown
Member

02223_insert_select_schema_inference: #102424

Co-authored-by: Michael Kolupaev <michael.kolupaev@clickhouse.com>
@al13n321 al13n321 enabled auto-merge April 10, 2026 23:08
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 11, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (14/14) · Uncovered code

Full report · Diff report

@al13n321 al13n321 added this pull request to the merge queue Apr 13, 2026
Merged via the queue into ClickHouse:master with commit b3e00f9 Apr 13, 2026
161 of 164 checks passed
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 13, 2026
@den-crane
Copy link
Copy Markdown
Contributor

I would backport it at least to 25.3 because there are 11 months left before end of support.

@nikitamikhaylov
Copy link
Copy Markdown
Member

26.3

@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Apr 15, 2026
robot-ch-test-poll2 added a commit that referenced this pull request Apr 15, 2026
Cherry pick #102385 to 26.3: Fix Parquet bloom filter segfault in prefetch range coalescing
robot-clickhouse added a commit that referenced this pull request Apr 15, 2026
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 15, 2026
clickhouse-gh Bot added a commit that referenced this pull request Apr 15, 2026
Backport #102385 to 26.3: Fix Parquet bloom filter segfault in prefetch range coalescing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v26.3-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet native reader v3 segfault.

7 participants