Skip to content

[fix](iterator) Use explicit output schema in new_merge_iterator and new_union_iterator#60772

Open
uchenily wants to merge 5 commits intoapache:masterfrom
uchenily:fix-coredump
Open

[fix](iterator) Use explicit output schema in new_merge_iterator and new_union_iterator#60772
uchenily wants to merge 5 commits intoapache:masterfrom
uchenily:fix-coredump

Conversation

@uchenily
Copy link
Contributor

This PR ensures merge/union iterators use an explicit output schema projection and copy only the requested columns, preventing column count mismatches when delete-predicate columns are read in addition to return columns.

BetaRowsetReader now builds an output_schema from return_columns and passes it to merge/union iterators, VMergeIteratorContext copies using the output schema (not the incorrect _iter->schema())

Consider the following table:

CREATE TABLE tbl (
  k INT NOT NULL,
  v1 INT NOT NULL,
  v2 INT NOT NULL
) DUPLICATE KEY(k) ...;

And a delete predicate applied to a non-key column:

DELETE FROM tbl WHERE v1 = 1;

When executing ORDER BY k LIMIT n, Doris has a Top-N optimization. Even though the query is SELECT *, the engine initially avoids scanning all columns. It constructs a minimal intermediate schema containing only the sort keys (k) and the internal __DORIS_ROWID_COL__ to perform the merge and sorting efficiently. (_col_ids = {0, 3}, ==> _num_columns = 2). However, because a delete predicate exists on column v1, the BetaRowsetReader add v1 to this intermediate schema to evaluate and filter out deleted rows during the scan. (_col_ids = {0, 3, 1}, note that column v1 (index=1) is appended to this schema ==> _num_columns = 3)

The previous implementation of VMergeIteratorContext::copy_rows used the incorrect _num_columns value, resulting in an array out-of-bounds access and causing BE coredumped.

Detailed reproduction steps are follows:

  1. modify conf/be.conf
write_buffer_size = 8
  1. execute the following sql
CREATE TABLE tbl1
(
    k INT NOT NULL,
    v1 INT NOT NULL,
    v2 INT NOT NULL
)
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 5
PROPERTIES(
    "replication_num" = "1"

);
CREATE TABLE tbl2
(
    k INT NOT NULL,
    v1 INT NOT NULL,
    v2 INT NOT NULL
)
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES(
    "replication_num" = "1"
);

INSERT INTO tbl1 VALUES (1, 1, 1),(2, 2, 2),(3, 3, 3),(4, 4, 4),(5, 5, 5);
INSERT INTO tbl2 SELECT * FROM tbl1;
SELECT * FROM tbl2 ORDER BY k limit 100; -- ok

DELETE FROM tbl2 WHERE v1 = 100;
SELECT * FROM tbl2 ORDER BY k limit 100; -- coredump

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 15, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@uchenily uchenily changed the title [fix](iterator) [fix](iterator) Use explicit output schema in new_merge_iterator and new_union_iterator Feb 15, 2026
@uchenily
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28891 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bf26b46316cac30d13587d4d832f7286d6dafa24, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17660	4458	4324	4324
q2	q3	10652	814	544	544
q4	4692	367	257	257
q5	7834	1202	1003	1003
q6	224	174	148	148
q7	803	848	693	693
q8	10670	1503	1382	1382
q9	5947	4777	4712	4712
q10	6899	1888	1625	1625
q11	453	269	258	258
q12	749	569	474	474
q13	17819	4251	3421	3421
q14	233	239	223	223
q15	982	788	786	786
q16	743	726	686	686
q17	728	878	463	463
q18	6180	5327	5200	5200
q19	1341	989	637	637
q20	509	513	408	408
q21	4529	1870	1400	1400
q22	361	284	247	247
Total cold run time: 100008 ms
Total hot run time: 28891 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4473	4353	4368	4353
q2	q3	1765	2191	1729	1729
q4	863	1164	772	772
q5	4066	4352	4335	4335
q6	186	177	144	144
q7	1729	1597	1485	1485
q8	2449	2649	2583	2583
q9	7924	7537	7323	7323
q10	2653	2832	2419	2419
q11	539	465	426	426
q12	514	630	511	511
q13	3950	4526	3618	3618
q14	284	301	277	277
q15	878	871	823	823
q16	738	782	778	778
q17	1214	1576	1416	1416
q18	7084	6908	6607	6607
q19	936	887	912	887
q20	2059	2173	2178	2173
q21	4055	3565	3413	3413
q22	470	452	405	405
Total cold run time: 48829 ms
Total hot run time: 46477 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184014 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bf26b46316cac30d13587d4d832f7286d6dafa24, data reload: false

query5	4825	675	531	531
query6	337	236	215	215
query7	4246	472	285	285
query8	352	275	246	246
query9	8747	2790	2818	2790
query10	537	400	332	332
query11	16920	17529	17333	17333
query12	215	149	155	149
query13	1612	490	381	381
query14	7170	3327	3130	3130
query14_1	3031	2890	3032	2890
query15	229	210	184	184
query16	1029	494	455	455
query17	1213	740	665	665
query18	2943	449	346	346
query19	217	227	190	190
query20	145	142	127	127
query21	221	147	127	127
query22	5367	4951	4732	4732
query23	17088	16723	16569	16569
query23_1	16705	16754	16602	16602
query24	7204	1623	1228	1228
query24_1	1233	1286	1211	1211
query25	545	449	386	386
query26	1237	260	156	156
query27	2764	475	277	277
query28	4474	1899	1887	1887
query29	798	569	456	456
query30	309	244	210	210
query31	867	729	645	645
query32	87	74	73	73
query33	513	335	285	285
query34	927	916	575	575
query35	654	680	618	618
query36	1084	1088	1014	1014
query37	143	94	88	88
query38	2949	2901	2847	2847
query39	896	900	839	839
query39_1	827	852	825	825
query40	232	159	137	137
query41	63	60	59	59
query42	106	100	103	100
query43	375	392	347	347
query44	
query45	194	190	184	184
query46	889	987	623	623
query47	2119	2118	2018	2018
query48	328	314	232	232
query49	650	466	386	386
query50	686	277	220	220
query51	4126	4196	4043	4043
query52	108	108	94	94
query53	291	347	289	289
query54	309	273	268	268
query55	93	85	82	82
query56	322	336	312	312
query57	1342	1326	1271	1271
query58	289	283	273	273
query59	2670	2638	2533	2533
query60	332	345	330	330
query61	150	139	144	139
query62	628	591	536	536
query63	319	292	280	280
query64	4915	1375	1110	1110
query65	
query66	1404	468	365	365
query67	16430	16607	16345	16345
query68	
query69	421	331	300	300
query70	1002	959	895	895
query71	347	325	308	308
query72	2995	2695	2400	2400
query73	564	561	317	317
query74	9981	9938	9777	9777
query75	2864	2742	2474	2474
query76	2312	1053	676	676
query77	359	389	313	313
query78	11202	11302	10722	10722
query79	3164	809	613	613
query80	1802	679	552	552
query81	598	277	244	244
query82	997	150	113	113
query83	337	268	242	242
query84	257	118	100	100
query85	928	486	430	430
query86	492	304	303	303
query87	3133	3078	2996	2996
query88	3539	2658	2641	2641
query89	441	372	344	344
query90	2015	175	179	175
query91	169	161	134	134
query92	84	79	70	70
query93	1489	854	514	514
query94	649	331	297	297
query95	594	396	312	312
query96	642	538	236	236
query97	2444	2471	2429	2429
query98	236	219	212	212
query99	989	960	874	874
Total cold run time: 258345 ms
Total hot run time: 184014 ms

@uchenily
Copy link
Contributor Author

run beut

#include "absl/strings/substitute.h"
#include "common/config.h"
#include "common/env_config.h"
// #include "common/env_config.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接删掉

VMergeIteratorContext(RowwiseIteratorUPtr&& iter, int sequence_id_idx, bool is_unique,
bool is_reverse, std::vector<uint32_t>* read_orderby_key_columns)
bool is_reverse, std::vector<uint32_t>* read_orderby_key_columns,
const Schema* output_schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要传递裸指针,不是直接使用shared ptr?

// src block may contain extra columns (e.g. delete-predicate columns) because
// SegmentIterator reads with input_schema = return_columns + delete columns
// to evaluate row filtering. We should only copy the requested output columns.
DCHECK_GE(src.columns(), output_cols);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥是GE,不是EQ?

return Status::OK();
}
size_t output_cols = _output_schema->num_column_ids();
// src block may contain extra columns (e.g. delete-predicate columns) because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SRC block 里不包括delete-predicate columns的,如果包括的话,你之前那个PR 为什么可以工作??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我之前的pr是根据dst block来遍历的

Status VMergeIteratorContext::copy_rows(Block* block, bool advanced) {
    ...
    for (size_t i = 0; i < block->columns(); ++i) {
    ...
}

// delete_hanlder is always set, but it maybe not init, so that it will return empty conditions
// or predicates when it is not inited.
if (_read_context->delete_handler != nullptr) {
_read_context->delete_handler->get_delete_conditions_after_version(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把你github pr description 那个例子,弄成regression test 加入到PR 里

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有什么更直接的办法创建overlapping状态的rowset吗, 我复现的步骤里面需要改 be.conf 添加 write_buffer_size = 8 这个没法放到回归测试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants