Fix crash in case of iceberg table with mixed ORC/Parquet files#99168
Fix crash in case of iceberg table with mixed ORC/Parquet files#99168
Conversation
|
Workflow [PR], commit [13e3d24] Summary: ❌
AI Review1) SummaryThis PR removes cross-format mutable state from 2) Missing context (if any)
3) Findings (by severity)No 4) Tests & Evidence
5) ClickHouse Compliance Checklist (Yes/No + short note)
6) Performance & Safety Notes
7) User-Lens ReviewThe behavior is intuitive: mixed-format Iceberg reads now avoid the prior exception path and return expected rows. Error/actionability is unchanged, and the new regression test makes the fix robust against reintroduction. 8) Final Verdict
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
tests/queries/0_stateless/04033_iceberg_orc_parquet_v3_crash.sh
Outdated
Show resolved
Hide resolved
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 95.31% (61/64) |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix very rare crash when Iceberg table contains files of mixed format (ORC and Parquet). Fixes #88126.
Documentation entry for user-facing changes
Before we had a single initialization of KeyCondition for the whole read. Now we switch to per-file initialization. I don't expect any performance degradation, because opening/reading file is several orders of magnitude slower than analyzing something in WHERE condition.
Note
Medium Risk
Touches filter initialization and Parquet page-level predicate pushdown, which can affect row/column skipping and is exercised across multiple input formats; changes are localized but impact correctness/performance of filtering paths.
Overview
Fixes a crash when reading Iceberg tables containing mixed ORC and Parquet files by making
FormatFilterInfobuildkey_condition/PREWHERE additional columns withstd::call_oncesemantics viainitKeyConditionOnce().Updates ORC and Parquet input formats to use this new API and removes
FormatFilterInfo::opaqueand the Parquet V3-specificFilterInfoExt; Parquet page-level filter pushdown now derives per-column conditions directly fromkey_conditionand keeps them alive inParquet::Reader.Adds a stateless regression test (
04033_iceberg_orc_parquet_v3_crash) to ensure the mixed-format Iceberg read no longer segfaults.Written by Cursor Bugbot for commit 13e3d24. This will update automatically on new commits. Configure here.