Skip to content

fix: block type reset after dict filter#63168

Merged
Gabriel39 merged 2 commits into
apache:masterfrom
yoock:fix_block_type_reset_after_dict_filter
May 20, 2026
Merged

fix: block type reset after dict filter#63168
Gabriel39 merged 2 commits into
apache:masterfrom
yoock:fix_block_type_reset_after_dict_filter

Conversation

@yoock
Copy link
Copy Markdown
Contributor

@yoock yoock commented May 12, 2026

What problem does this PR solve?

修复读取parquet时,当上一个row group有dict filter时未将block的类型从Int32重置为String,导致报如下错误:
Read parquet file xxxxx.parquet failed, reason = [INTERNAL_ERROR]comparison must input two same type column or column type is decimalv3/numeric, lhs=Int32, rhs=String

在有些情况下会触发如下代码导致be core:

LOG(FATAL) << fmt::format("Bad cast from type:{} to {}", demangle(typeid(from).name()),
                                  demangle(typeid(To).name()));

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yoock
Copy link
Copy Markdown
Contributor Author

yoock commented May 12, 2026

run buildall

@Gabriel39
Copy link
Copy Markdown
Contributor

@yoock Please add regression test cases.

@yoock
Copy link
Copy Markdown
Contributor Author

yoock commented May 12, 2026

@yoock Please add regression test cases.
这个很难构造回归测试案例 @Gabriel39

本质上就是RowGroupReader::next_batch结束时必须调用_convert_dict_cols_to_string_cols(block);,但是这里漏了

@Gabriel39
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full PR diff and existing review context. I found one blocking issue in BE Parquet lazy-read error handling: the new call ignores a Status-returning conversion helper, which violates Doris error propagation rules and can also fail compilation because Status is [[nodiscard]].

Critical checkpoint conclusions:

  • Goal/test: The PR appears intended to keep dictionary-filtered columns converted to string on an early EOF lazy-read path. The change is not safely implemented because conversion failures are not propagated; no new test is included for this edge path.
  • Scope/clarity: The change is small and focused, but needs the same RETURN_IF_ERROR handling used by adjacent paths.
  • Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, config, persistence, storage-format, or FE-BE compatibility concerns found in this one-line change.
  • Parallel paths: Neighboring no-row/filter-all paths already propagate _convert_dict_cols_to_string_cols errors; this new path should match them.
  • Error handling: Blocking issue found; Status is ignored.
  • Test coverage/results: No test or result changes were added. Given the fix is a direct error-propagation correction, existing coverage may be sufficient after updating the call, but this specific lazy EOF path is not newly covered.
  • Observability/performance/memory: No additional observability, performance, or memory issues found.
  • User focus: No additional user-provided review focus was specified.

DCHECK_EQ(pre_read_rows + _cached_filtered_rows, 0);
*read_rows = 0;
*batch_eof = true;
_convert_dict_cols_to_string_cols(block);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper returns Status and Status is [[nodiscard]], so this call must be propagated. More importantly, if _get_block_column_pos() or convert_dict_column_to_string_column() fails, this path currently returns OK with the block left in the dictionary-code representation, unlike all adjacent _convert_dict_cols_to_string_cols call sites that use RETURN_IF_ERROR. Please keep the same error handling here.

Suggested change
_convert_dict_cols_to_string_cols(block);
RETURN_IF_ERROR(_convert_dict_cols_to_string_cols(block));

@Gabriel39
Copy link
Copy Markdown
Contributor

/review

@Gabriel39
Copy link
Copy Markdown
Contributor

/run buildall

@Gabriel39
Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 30897 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a687b89618cc784e476f94050640544e0b898e2, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17639	3872	3854	3854
q2	q3	10756	1368	816	816
q4	4687	467	349	349
q5	7746	2248	2104	2104
q6	252	172	140	140
q7	946	783	622	622
q8	9487	1809	1550	1550
q9	6689	4986	4960	4960
q10	6446	2114	1777	1777
q11	443	268	235	235
q12	694	425	295	295
q13	18212	3358	2771	2771
q14	264	264	234	234
q15	q16	825	776	704	704
q17	991	892	927	892
q18	6832	5568	5533	5533
q19	1212	1246	1066	1066
q20	532	394	262	262
q21	5691	2623	2435	2435
q22	431	356	298	298
Total cold run time: 100775 ms
Total hot run time: 30897 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4227	4143	4125	4125
q2	q3	4461	4873	4269	4269
q4	2082	2192	1377	1377
q5	4376	4222	4271	4222
q6	224	177	131	131
q7	2071	1834	1746	1746
q8	2545	2107	2069	2069
q9	7749	7910	7808	7808
q10	4615	4543	4177	4177
q11	606	425	388	388
q12	754	875	613	613
q13	3291	3659	2963	2963
q14	316	320	291	291
q15	q16	726	755	658	658
q17	1335	1322	1301	1301
q18	7812	7171	6975	6975
q19	1104	1076	1102	1076
q20	2234	2203	1927	1927
q21	5317	4640	4455	4455
q22	520	453	421	421
Total cold run time: 56365 ms
Total hot run time: 50992 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169949 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4a687b89618cc784e476f94050640544e0b898e2, data reload: false

query5	4340	652	509	509
query6	342	231	199	199
query7	4305	545	300	300
query8	337	239	227	227
query9	8837	4065	4030	4030
query10	453	342	305	305
query11	5825	2420	2213	2213
query12	193	133	128	128
query13	1311	578	413	413
query14	6049	5348	5047	5047
query14_1	4346	4394	4357	4357
query15	215	204	183	183
query16	1017	476	489	476
query17	1164	746	619	619
query18	2592	482	357	357
query19	225	212	181	181
query20	145	136	131	131
query21	216	139	118	118
query22	13643	13527	13324	13324
query23	17215	16308	16011	16011
query23_1	16115	16278	16063	16063
query24	7521	1789	1309	1309
query24_1	1316	1327	1311	1311
query25	570	502	452	452
query26	1317	329	172	172
query27	2712	583	352	352
query28	5046	2051	1962	1962
query29	1012	640	520	520
query30	306	238	199	199
query31	1139	1062	927	927
query32	93	78	76	76
query33	546	367	325	325
query34	1182	1133	643	643
query35	778	826	675	675
query36	1342	1328	1196	1196
query37	153	105	95	95
query38	3202	3121	3032	3032
query39	924	929	891	891
query39_1	864	865	863	863
query40	225	150	126	126
query41	65	62	62	62
query42	113	111	111	111
query43	330	321	300	300
query44	
query45	210	199	202	199
query46	1066	1194	717	717
query47	2291	2323	2222	2222
query48	359	403	295	295
query49	619	488	381	381
query50	965	350	262	262
query51	4328	4315	4212	4212
query52	109	114	97	97
query53	253	281	214	214
query54	309	267	257	257
query55	93	91	87	87
query56	304	299	294	294
query57	1396	1359	1294	1294
query58	314	275	267	267
query59	1586	1612	1422	1422
query60	323	346	311	311
query61	152	145	153	145
query62	675	620	569	569
query63	254	202	204	202
query64	2396	811	628	628
query65	
query66	1700	478	347	347
query67	29979	29845	29785	29785
query68	
query69	471	333	312	312
query70	1023	985	1018	985
query71	325	273	278	273
query72	3163	2768	2392	2392
query73	852	796	442	442
query74	5060	4970	4715	4715
query75	2656	2625	2233	2233
query76	2304	1129	749	749
query77	398	409	339	339
query78	12259	12154	11624	11624
query79	1525	1050	744	744
query80	1334	545	458	458
query81	509	284	241	241
query82	1377	156	122	122
query83	326	280	258	258
query84	255	138	112	112
query85	910	562	447	447
query86	440	329	313	313
query87	3468	3348	3182	3182
query88	3596	2683	2649	2649
query89	449	379	343	343
query90	1916	182	186	182
query91	188	168	138	138
query92	78	74	76	74
query93	1473	1445	926	926
query94	674	353	312	312
query95	692	394	444	394
query96	1090	787	365	365
query97	2711	2689	2557	2557
query98	232	231	238	231
query99	1122	1123	948	948
Total cold run time: 255107 ms
Total hot run time: 169949 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.56% (20733/38712)
Line Coverage 37.20% (196097/527097)
Region Coverage 33.54% (153692/458272)
Branch Coverage 34.58% (66968/193680)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.52% (27875/37914)
Line Coverage 57.53% (302455/525744)
Region Coverage 54.79% (253496/462693)
Branch Coverage 56.31% (109463/194407)

@Gabriel39
Copy link
Copy Markdown
Contributor

/review

@Gabriel39 Gabriel39 merged commit 9bf0817 into apache:master May 20, 2026
36 of 37 checks passed
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

github-actions Bot pushed a commit that referenced this pull request May 20, 2026
### What problem does this PR solve?

Fix parquet reader reporting: Read parquet file xxxxx.parquet failed, reason =
[INTERNAL_ERROR]comparison must input two same type column or column
type is decimalv3/numeric, lhs=Int32, rhs=String

---------

Co-authored-by: wanglong16 <wanglong16@xiaomi.com>
github-actions Bot pushed a commit that referenced this pull request May 20, 2026
### What problem does this PR solve?

Fix parquet reader reporting: Read parquet file xxxxx.parquet failed, reason =
[INTERNAL_ERROR]comparison must input two same type column or column
type is decimalv3/numeric, lhs=Int32, rhs=String

---------

Co-authored-by: wanglong16 <wanglong16@xiaomi.com>
yiguolei pushed a commit that referenced this pull request May 20, 2026
Cherry-picked from #63168

Co-authored-by: wlong <869372753@qq.com>
Co-authored-by: wanglong16 <wanglong16@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants