Fix Parquet bloom filter segfault in prefetch range coalescing#102385
Fix Parquet bloom filter segfault in prefetch range coalescing#102385al13n321 merged 3 commits intoClickHouse:masterfrom
Conversation
The Parquet prefetcher's range coalescing scan had a bug where the leftward scan did not update end_offset when including a range that extends past the current end. This caused an undersized read task when two ranges share the same start offset but have different lengths. Specifically, bloom_filter_header_prefetch and bloom_filter_data_prefetch both start at bloom_filter_offset. The header range is 256 bytes (the max header size), while the data range is header_size + numBytes (often much larger). After std::sort (which is not stable), the longer data range could be placed before the shorter header range. When the header's task creation scanned leftward and included the data range, it only updated start_offset via std::min but did NOT update end_offset via std::max. The resulting task covered only 256 bytes (the header), but the data range (up to megabytes) was assigned to it. Later, findAnyHash called getRangeData on a bloom filter block far beyond the task buffer, causing: - In debug builds: LOGICAL_ERROR assertion (req->task_offset + req->length <= task->buf.size()) - In release builds: segfault (read access violation) - Non-deterministic wrong query results (random row group filtering) The fix adds end_offset = std::max(end_offset, r.end) to the leftward scan, and symmetrically start_offset = std::min(start_offset, r.start) to the rightward scan. Closes ClickHouse#102257
Pre-PR Validation (session: cron:clickhouse-ci-task-worker:20260410-131500)a) Deterministic repro? Yes. The crash reproduces on the very first run with the exact query from the issue. No randomization needed — the bug is triggered whenever b) Root cause explained? The Prefetcher's c) Fix matches root cause? Yes. The fix directly addresses the missing d) Test intent preserved? / New tests added? New regression test e) Demonstrated in both directions?
|
|
cc @al13n321 @nikitamikhaylov — could you review this? The fix is in |
|
Check this issue #102231 try this reproducer (less CPU intensive): |
|
Workflow [PR], commit [48513ed] Summary: ❌
AI ReviewSummaryThis PR fixes an out-of-bounds access in Parquet prefetch range coalescing when overlapping ranges share the same start offset but differ in length. The fix is surgical ( ClickHouse Rules
Final Verdict
|
|
@groeneai Check the repro from above and all the test failures. |
…test - Reduce data size from 5M to 500K rows (bug is structural, not volume-dependent — triggers at any data size with bloom filters) - Reduce iterations from 10 to 5 (crash is deterministic) - Add den-crane's simpler non-nullable reproducer from ClickHouse#102231 as a second test case to cover both crash and wrong-results scenarios This fixes the 180s timeout on amd_tsan flaky check.
|
@nikitamikhaylov @den-crane — verified the reproducer and checked all test failures: den-crane's reproducer ✅Confirmed the simpler reproducer from #102231 hits the exact same root cause —
Also tested the #102231 reproducer with Nullable columns (500K rows) — all three modes return identical results ( I've added den-crane's simpler reproducer as a second test case in the regression test. CI test failures
Issue #102231Same root cause — the undersized prefetch task causes the bloom filter Note: PR #99993 addresses a separate Nullable filtering issue and does not fix this prefetcher bug (confirmed by @rienath in #102231). |
|
02223_insert_select_schema_inference: #102424 |
Co-authored-by: Michael Kolupaev <michael.kolupaev@clickhouse.com>
LLVM Coverage Report
Changed lines: 100.00% (14/14) · Uncovered code |
b3e00f9
|
I would backport it at least to 25.3 because there are 11 months left before end of support. |
|
26.3 |
Cherry pick #102385 to 26.3: Fix Parquet bloom filter segfault in prefetch range coalescing
Backport #102385 to 26.3: Fix Parquet bloom filter segfault in prefetch range coalescing
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix segfault (or LOGICAL_ERROR in debug builds) when reading Parquet files with bloom filter push down enabled and WHERE clause equality/inequality conditions. The crash occurred due to an out-of-bounds memory access in the Parquet prefetcher's bloom filter data retrieval, and could also cause non-deterministic wrong query results.
What was the root cause?
The Parquet prefetcher coalesces nearby byte ranges into single I/O tasks. When scanning leftward for ranges to include, it updated
start_offsetviastd::minbut did not updateend_offsetviastd::max. This is correct when the "left" range truly ends before the initial range, but incorrect when two ranges share the same start offset but have different lengths.Specifically,
bloom_filter_header_prefetch(256 bytes) andbloom_filter_data_prefetch(header + data, often thousands of bytes or more) both start atbloom_filter_offset. Afterstd::sort(which is not stable), the longer range could end up before the shorter one in the sorted array. When the shorter range's task creation scanned leftward and found the longer range:allow_incidental_read = truebecause both are smaller thanmin_bytes_for_seek = 64KB)start_offset = std::min(start_offset, r.start)— no change since starts are equalend_offset = std::max(end_offset, r.end)— this is the bugfindAnyHash→getRangeDatatried to read bloom filter blocks at offsets far beyond the 256-byte bufferThis caused:
LOGICAL_ERRORassertion failure (req->task_offset + req->length <= task->buf.size())The fix
Add
end_offset = std::max(end_offset, r.end)to the leftward coalescing scan, and symmetricallystart_offset = std::min(start_offset, r.start)to the rightward scan.Regression test
04097_parquet_bloom_filter_overlapping_ranges— creates a Parquet file with the exact reproduction case from the issue (5M rows, Nullable columns, bloom filter with varying block counts), then runs the query 10 times comparing against a bloom-filter-disabled reference.Closes #102257