Skip to content

[fix](insert overwrite) delay overwrite partition routing until incremental open#63209

Merged
liaoxin01 merged 1 commit into
apache:masterfrom
sollhui:fix_insert_overwrite
May 14, 2026
Merged

[fix](insert overwrite) delay overwrite partition routing until incremental open#63209
liaoxin01 merged 1 commit into
apache:masterfrom
sollhui:fix_insert_overwrite

Conversation

@sollhui
Copy link
Copy Markdown
Contributor

@sollhui sollhui commented May 13, 2026

What problem does this PR solve?

Problem Summary:

In auto-detect insert overwrite, BE sender could publish newly replaced temporary partitions to local row routing before incremental open finished on target BEs.

The race was:

  1. One sender calls FE replacePartition and receives new temporary partition/tablet metadata.
  2. The sender records the new partition id and replaces local _vpartition routing first.
  3. Another concurrent batch can then route rows to the new tablet.
  4. The first sender has not finished incremental open yet, so the target BE may not have created the delta writer for that tablet.
  5. The target BE returns unknown tablet to append data.

This PR makes the sender finish _create_partition_callback, including incremental open/open_wait, before publishing the new partition/tablet to local routing and marking the new partition as handled.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented May 13, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29460 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b370b629466132ba75d960920d0df1031c1d5334, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17691	3826	3812	3812
q2	q3	10716	871	598	598
q4	4667	454	344	344
q5	7470	1368	1138	1138
q6	181	168	139	139
q7	920	938	748	748
q8	9302	1384	1254	1254
q9	5529	5373	5362	5362
q10	6277	2082	1825	1825
q11	460	282	257	257
q12	636	413	298	298
q13	18063	3225	2758	2758
q14	291	290	262	262
q15	q16	899	877	796	796
q17	958	946	744	744
q18	6472	5758	5479	5479
q19	1171	1164	1070	1070
q20	525	404	264	264
q21	4770	2448	1991	1991
q22	492	384	321	321
Total cold run time: 97490 ms
Total hot run time: 29460 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4716	4703	4847	4703
q2	q3	4654	4784	4189	4189
q4	2105	2186	1391	1391
q5	4988	5010	5254	5010
q6	196	168	137	137
q7	2087	1813	1606	1606
q8	3351	3133	3107	3107
q9	8491	8420	8469	8420
q10	4480	4489	4280	4280
q11	603	415	400	400
q12	686	757	521	521
q13	3311	3568	2887	2887
q14	312	314	280	280
q15	q16	762	767	707	707
q17	1467	1319	1257	1257
q18	7983	7042	7027	7027
q19	1145	1159	1148	1148
q20	2232	2258	1982	1982
q21	6062	5349	4860	4860
q22	547	500	420	420
Total cold run time: 60178 ms
Total hot run time: 54332 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172487 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b370b629466132ba75d960920d0df1031c1d5334, data reload: false

query5	4308	652	517	517
query6	336	215	202	202
query7	4253	572	306	306
query8	324	228	229	228
query9	8802	4045	4020	4020
query10	453	347	308	308
query11	5828	2350	2245	2245
query12	187	134	127	127
query13	1276	616	434	434
query14	5973	5401	5090	5090
query14_1	4403	4368	4367	4367
query15	217	210	197	197
query16	997	494	428	428
query17	1155	800	648	648
query18	2536	506	370	370
query19	224	215	172	172
query20	147	134	133	133
query21	221	141	121	121
query22	13679	14025	14547	14025
query23	17428	16615	16162	16162
query23_1	16334	16330	16330	16330
query24	7364	1737	1335	1335
query24_1	1344	1314	1352	1314
query25	572	499	416	416
query26	1303	297	163	163
query27	2718	610	330	330
query28	4421	1984	1955	1955
query29	1007	624	555	555
query30	320	242	201	201
query31	1111	1057	934	934
query32	82	69	74	69
query33	530	340	291	291
query34	1179	1172	632	632
query35	753	780	656	656
query36	1326	1405	1196	1196
query37	153	100	87	87
query38	3206	3119	3071	3071
query39	923	916	918	916
query39_1	876	881	882	881
query40	228	148	134	134
query41	63	59	61	59
query42	109	104	102	102
query43	313	329	284	284
query44	
query45	214	204	196	196
query46	1041	1202	713	713
query47	2349	2319	2269	2269
query48	397	410	317	317
query49	631	533	430	430
query50	684	285	213	213
query51	4284	4249	4225	4225
query52	104	104	92	92
query53	255	272	195	195
query54	311	282	278	278
query55	91	88	83	83
query56	291	311	293	293
query57	1439	1408	1323	1323
query58	306	264	267	264
query59	1540	1616	1425	1425
query60	341	334	325	325
query61	180	156	154	154
query62	667	624	556	556
query63	253	198	203	198
query64	2467	826	679	679
query65	
query66	1760	514	386	386
query67	29985	29997	29876	29876
query68	
query69	456	338	299	299
query70	989	1012	941	941
query71	296	271	268	268
query72	2947	2746	2493	2493
query73	856	784	422	422
query74	5070	4885	4752	4752
query75	2784	2648	2330	2330
query76	2305	1131	765	765
query77	424	438	362	362
query78	12922	13041	12365	12365
query79	1455	1003	772	772
query80	706	621	538	538
query81	456	289	240	240
query82	1374	168	129	129
query83	366	286	262	262
query84	279	149	123	123
query85	974	511	438	438
query86	407	350	313	313
query87	3415	3362	3265	3265
query88	3525	2689	2693	2689
query89	437	380	333	333
query90	1883	180	178	178
query91	172	164	141	141
query92	79	80	73	73
query93	965	936	546	546
query94	530	338	301	301
query95	689	470	337	337
query96	1004	769	346	346
query97	2702	2701	2579	2579
query98	241	232	225	225
query99	1153	1109	994	994
Total cold run time: 253424 ms
Total hot run time: 172487 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.62% (27742/37684)
Line Coverage 57.54% (300629/522510)
Region Coverage 54.87% (251387/458142)
Branch Coverage 56.31% (108599/192856)

@liaoxin01
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed for PR 63209.

Summary opinion: no blocking issues found. The change is small and targeted: partition replacement now finishes the create-partition callback, including incremental open/open_wait, before publishing the new partition/tablet mapping into local routing. This matches the stated race and avoids exposing new tablet ids before the sender-side channels/streams are ready.

Critical checkpoint conclusions:

  • Goal and proof: the code addresses the described auto-detect insert overwrite race, and the updated unit test verifies the callback runs before local replacement is visible.
  • Scope: the modification is focused on callback/routing order plus a targeted test harness extension.
  • Concurrency: the affected behavior is a routing publication race; this PR reduces the unsafe window by keeping old routing visible until incremental open completes. I did not find a new lock-order or dependency issue in the changed code.
  • Lifecycle/static state: no new static/global lifecycle concerns were introduced.
  • Configuration/compatibility: no new config, wire format, or storage-format compatibility changes.
  • Parallel paths: both create-partition and replace-partition paths now publish routing after callback completion; V1/V2 writer callbacks consume the same result shape.
  • Error handling: callback and add/replace failures are propagated with RETURN_IF_ERROR; no ignored Status in the changed lines.
  • Tests: unit coverage was updated for the replace ordering. I did not run the BE unit test in this review environment.
  • Observability/performance: no additional observability appears necessary for this small ordering fix; the extra copy in cast_as_create_result is not on a hot path and avoids moving from the original result before later use.

User focus: no additional user-provided review focus was specified.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27740/37684)
Line Coverage 57.54% (300628/522510)
Region Coverage 54.88% (251407/458142)
Branch Coverage 56.31% (108596/192856)

Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit 1d18465 into apache:master May 14, 2026
33 of 34 checks passed
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
…mental open (#63209)

### What problem does this PR solve?

Problem Summary:

In auto-detect insert overwrite, BE sender could publish newly replaced
temporary partitions to local row routing before incremental open
finished on target BEs.

The race was:

1. One sender calls FE `replacePartition` and receives new temporary
partition/tablet metadata.
2. The sender records the new partition id and replaces local
`_vpartition` routing first.
3. Another concurrent batch can then route rows to the new tablet.
4. The first sender has not finished incremental open yet, so the target
BE may not have created the delta writer for that tablet.
5. The target BE returns `unknown tablet to append data`.

This PR makes the sender finish `_create_partition_callback`, including
incremental open/open_wait, before publishing the new partition/tablet
to local routing and marking the new partition as handled.
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
…mental open (#63209)

### What problem does this PR solve?

Problem Summary:

In auto-detect insert overwrite, BE sender could publish newly replaced
temporary partitions to local row routing before incremental open
finished on target BEs.

The race was:

1. One sender calls FE `replacePartition` and receives new temporary
partition/tablet metadata.
2. The sender records the new partition id and replaces local
`_vpartition` routing first.
3. Another concurrent batch can then route rows to the new tablet.
4. The first sender has not finished incremental open yet, so the target
BE may not have created the delta writer for that tablet.
5. The target BE returns `unknown tablet to append data`.

This PR makes the sender finish `_create_partition_callback`, including
incremental open/open_wait, before publishing the new partition/tablet
to local routing and marking the new partition as handled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants