Skip to content

[improvement](topn) check multiget result rows against request row id count and add be UT#61758

Merged
yiguolei merged 4 commits intoapache:masterfrom
jacktengg:260326-improve-topn-lazy-mat
Mar 26, 2026
Merged

[improvement](topn) check multiget result rows against request row id count and add be UT#61758
yiguolei merged 4 commits intoapache:masterfrom
jacktengg:260326-improve-topn-lazy-mat

Conversation

@jacktengg
Copy link
Contributor

@jacktengg jacktengg commented Mar 26, 2026

Check multiget result rows matches request row id count when doing merge_multi_response:

  1. A BE may return an empty block event if request.request_block_descs(i).row_id_size() != 0:
    If the id_file_map was GC'd on the BE before it could process the request, refer 'if (!id_file_map)' in RowIdStorageReader::read_by_rowids.
  2. Report error in any case where the row count doesn't match, even if it's not empty,
    since that indicates a bug in BE's row fetching logic or serialization logic.

Also add comments.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg
Copy link
Contributor Author

run buildall

@jacktengg
Copy link
Contributor Author

/review

@jacktengg
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR adds a row count validation check in merge_multi_response() to catch mismatches between requested row IDs and returned rows, adds a LOG(INFO) for observability when id_file_map is null, adds extensive comments throughout the materialization operator code, and includes good unit test coverage. The const-correctness and pointer type improvements (autoauto*, auto&const auto&) are welcome cleanups.

Critical Checkpoint Conclusions

  • Goal & correctness: The PR achieves its goal of validating multiget response row counts against request row ID counts. The validation is placed at the right point (after deserialization, before use). Tests prove the error path works.
  • Modification focus: The PR is reasonably focused — validation check + comments + tests + const-correctness cleanups. Acceptable scope.
  • Concurrency: No concurrency changes. merge_multi_response() is called after counter.wait() serializes all RPCs. No new thread safety concerns.
  • Lifecycle management: No lifecycle changes.
  • Configuration items: None added.
  • Incompatible changes: None — this is purely a client-side validation, no protocol changes.
  • Parallel code paths: The old RowIDFetcher path (non-V2) is not modified, but it appears to be a separate code path. No issue.
  • Special conditional checks: Two comments have minor inaccuracies (see inline comments below).
  • Test coverage: Good — 3 new test cases covering the happy path fix, the error detection case, and the stale-block_maps regression case. Tests are well-commented.
  • Observability: The LOG(INFO) addition for id_file_map null is appropriate for this error scenario.
  • Performance: No performance concerns. The validation is O(1) per backend per relation.
  • Other issues: Two stale/misleading comments identified (see inline).

Issues Found

  1. [Minor] Stale comment in materialization_opertor.cpp: The comment says empty blocks can occur "if the id_file_map was GC'd", but the new row count validation now catches that case earlier, so this comment describes unreachable behavior.

  2. [Minor] Misleading test comment in TestMergeMultiResponseBackendNotFound: The top-level comment says the test expects InternalError("backend_id {} not found in block_maps"), but the actual assertion checks for "not match request row id count" — the new error message, not the old one.

@jacktengg
Copy link
Contributor Author

run buildall

@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26435 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f66adea86ceaf91732ce27e280fcfb3c051ddeca, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17700	4437	4258	4258
q2	q3	10656	784	521	521
q4	4679	367	248	248
q5	7556	1193	1030	1030
q6	174	175	142	142
q7	780	834	660	660
q8	9716	1503	1301	1301
q9	5272	4768	4734	4734
q10	6314	1919	1662	1662
q11	472	278	240	240
q12	747	578	461	461
q13	18026	2676	1937	1937
q14	226	228	207	207
q15	q16	730	738	673	673
q17	722	814	478	478
q18	5827	5427	5179	5179
q19	1123	966	628	628
q20	535	505	388	388
q21	4517	1813	1378	1378
q22	341	488	310	310
Total cold run time: 96113 ms
Total hot run time: 26435 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4707	4749	4631	4631
q2	q3	3936	4390	3906	3906
q4	882	1385	814	814
q5	4169	4415	4339	4339
q6	180	170	138	138
q7	1740	1691	1588	1588
q8	2533	2823	2576	2576
q9	7550	7425	7420	7420
q10	3869	4016	3607	3607
q11	495	439	460	439
q12	491	567	438	438
q13	2617	2877	2041	2041
q14	276	309	274	274
q15	q16	714	789	728	728
q17	1258	1386	1400	1386
q18	7227	6847	6705	6705
q19	897	838	904	838
q20	2058	2175	2004	2004
q21	3895	3481	3477	3477
q22	458	416	362	362
Total cold run time: 49952 ms
Total hot run time: 47711 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 170318 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f66adea86ceaf91732ce27e280fcfb3c051ddeca, data reload: false

query5	4349	632	495	495
query6	331	229	210	210
query7	4223	478	266	266
query8	343	252	234	234
query9	8723	2734	2740	2734
query10	542	401	353	353
query11	7005	5112	4910	4910
query12	181	134	125	125
query13	1287	464	354	354
query14	5778	3759	3489	3489
query14_1	2857	2860	2872	2860
query15	208	196	175	175
query16	989	479	450	450
query17	1028	734	634	634
query18	2457	463	359	359
query19	222	222	192	192
query20	131	125	126	125
query21	214	135	108	108
query22	13223	14003	14381	14003
query23	16642	16313	16227	16227
query23_1	16100	16153	15739	15739
query24	7150	1604	1237	1237
query24_1	1224	1223	1233	1223
query25	601	446	397	397
query26	1246	261	147	147
query27	2784	475	292	292
query28	4504	1848	1856	1848
query29	877	569	485	485
query30	302	224	197	197
query31	1012	936	871	871
query32	84	74	69	69
query33	514	324	290	290
query34	893	895	524	524
query35	644	687	614	614
query36	1143	1139	1032	1032
query37	125	91	85	85
query38	2908	2897	2923	2897
query39	849	827	827	827
query39_1	788	789	790	789
query40	235	194	134	134
query41	62	58	60	58
query42	258	258	253	253
query43	239	246	223	223
query44	
query45	194	183	181	181
query46	891	985	595	595
query47	2105	3020	3019	3019
query48	316	314	235	235
query49	628	463	375	375
query50	716	288	212	212
query51	4055	4079	4015	4015
query52	263	261	252	252
query53	291	332	288	288
query54	302	277	268	268
query55	91	85	84	84
query56	324	311	312	311
query57	1974	1777	1674	1674
query58	280	271	265	265
query59	2789	2953	2741	2741
query60	338	338	321	321
query61	155	159	158	158
query62	631	580	530	530
query63	307	288	271	271
query64	5198	1266	1000	1000
query65	
query66	1462	455	373	373
query67	24233	24260	24368	24260
query68	
query69	400	311	285	285
query70	966	920	960	920
query71	338	317	299	299
query72	2847	2694	2446	2446
query73	560	562	323	323
query74	9645	9595	9410	9410
query75	2895	2779	2456	2456
query76	2296	1041	685	685
query77	363	379	319	319
query78	10963	11137	10517	10517
query79	1130	771	571	571
query80	1332	644	544	544
query81	554	257	231	231
query82	1006	156	123	123
query83	342	271	249	249
query84	283	129	94	94
query85	908	524	448	448
query86	405	299	301	299
query87	3279	3154	2971	2971
query88	3541	2665	2641	2641
query89	431	363	332	332
query90	2027	185	178	178
query91	170	167	137	137
query92	77	76	71	71
query93	939	847	500	500
query94	645	319	288	288
query95	593	400	324	324
query96	633	524	234	234
query97	2496	2484	2433	2433
query98	242	218	221	218
query99	1019	1002	913	913
Total cold run time: 250046 ms
Total hot run time: 170318 ms

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 26, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.87% (19911/37659)
Line Coverage 36.40% (186537/512492)
Region Coverage 32.66% (144682/442981)
Branch Coverage 33.87% (63432/187308)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.86% (26504/36884)
Line Coverage 54.74% (279706/510959)
Region Coverage 51.96% (232337/447112)
Branch Coverage 53.42% (100373/187880)

@yiguolei
Copy link
Contributor

skip buildall

@yiguolei yiguolei merged commit 17e61ad into apache:master Mar 26, 2026
31 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 26, 2026
… count and add be UT (#61758)

Check multiget result rows matches request row id count when doing
merge_multi_response:
1. A BE may return an empty block event if
request.request_block_descs(i).row_id_size() != 0:
If the id_file_map was GC'd on the BE before it could process the
request, refer 'if (!id_file_map)' in
RowIdStorageReader::read_by_rowids.
2. Report error in any case where the row count doesn't match, even if
it's not empty,
since that indicates a bug in BE's row fetching logic or serialization
logic.

Also add comments.

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request Mar 26, 2026
…quest row id count and add be UT #61758 (#61773)

Cherry-picked from #61758

Co-authored-by: TengJianPing <tengjianping@selectdb.com>
jacktengg added a commit to jacktengg/incubator-doris that referenced this pull request Mar 26, 2026
… count and add be UT (apache#61758)

Check multiget result rows matches request row id count when doing
merge_multi_response:
1. A BE may return an empty block event if
request.request_block_descs(i).row_id_size() != 0:
If the id_file_map was GC'd on the BE before it could process the
request, refer 'if (!id_file_map)' in
RowIdStorageReader::read_by_rowids.
2. Report error in any case where the row count doesn't match, even if
it's not empty,
since that indicates a bug in BE's row fetching logic or serialization
logic.

Also add comments.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants