fix: block type reset after dict filter#63168
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
@yoock Please add regression test cases. |
本质上就是RowGroupReader::next_batch结束时必须调用_convert_dict_cols_to_string_cols(block);,但是这里漏了 |
|
/review |
There was a problem hiding this comment.
Reviewed the full PR diff and existing review context. I found one blocking issue in BE Parquet lazy-read error handling: the new call ignores a Status-returning conversion helper, which violates Doris error propagation rules and can also fail compilation because Status is [[nodiscard]].
Critical checkpoint conclusions:
- Goal/test: The PR appears intended to keep dictionary-filtered columns converted to string on an early EOF lazy-read path. The change is not safely implemented because conversion failures are not propagated; no new test is included for this edge path.
- Scope/clarity: The change is small and focused, but needs the same RETURN_IF_ERROR handling used by adjacent paths.
- Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, config, persistence, storage-format, or FE-BE compatibility concerns found in this one-line change.
- Parallel paths: Neighboring no-row/filter-all paths already propagate _convert_dict_cols_to_string_cols errors; this new path should match them.
- Error handling: Blocking issue found; Status is ignored.
- Test coverage/results: No test or result changes were added. Given the fix is a direct error-propagation correction, existing coverage may be sufficient after updating the call, but this specific lazy EOF path is not newly covered.
- Observability/performance/memory: No additional observability, performance, or memory issues found.
- User focus: No additional user-provided review focus was specified.
| DCHECK_EQ(pre_read_rows + _cached_filtered_rows, 0); | ||
| *read_rows = 0; | ||
| *batch_eof = true; | ||
| _convert_dict_cols_to_string_cols(block); |
There was a problem hiding this comment.
This helper returns Status and Status is [[nodiscard]], so this call must be propagated. More importantly, if _get_block_column_pos() or convert_dict_column_to_string_column() fails, this path currently returns OK with the block left in the dictionary-code representation, unlike all adjacent _convert_dict_cols_to_string_cols call sites that use RETURN_IF_ERROR. Please keep the same error handling here.
| _convert_dict_cols_to_string_cols(block); | |
| RETURN_IF_ERROR(_convert_dict_cols_to_string_cols(block)); |
|
/review |
|
/run buildall |
|
run buildall |
TPC-H: Total hot run time: 30897 ms |
TPC-DS: Total hot run time: 169949 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Fix parquet reader reporting: Read parquet file xxxxx.parquet failed, reason = [INTERNAL_ERROR]comparison must input two same type column or column type is decimalv3/numeric, lhs=Int32, rhs=String --------- Co-authored-by: wanglong16 <wanglong16@xiaomi.com>
### What problem does this PR solve? Fix parquet reader reporting: Read parquet file xxxxx.parquet failed, reason = [INTERNAL_ERROR]comparison must input two same type column or column type is decimalv3/numeric, lhs=Int32, rhs=String --------- Co-authored-by: wanglong16 <wanglong16@xiaomi.com>
What problem does this PR solve?
修复读取parquet时,当上一个row group有dict filter时未将block的类型从Int32重置为String,导致报如下错误:
Read parquet file xxxxx.parquet failed, reason = [INTERNAL_ERROR]comparison must input two same type column or column type is decimalv3/numeric, lhs=Int32, rhs=String
在有些情况下会触发如下代码导致be core: