Skip to content

[fix](be) Deduplicate cluster key MOW compaction output#63116

Open
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix_compaction
Open

[fix](be) Deduplicate cluster key MOW compaction output#63116
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix_compaction

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 10, 2026

Issue Number: None

Related PR: None

Problem Summary: Cluster-key MOW compaction sorts rows by cluster key, so duplicate unique keys may be non-adjacent and can remain visible in the output rowset. Scan the output rowset primary key index after compaction and add output-rowset internal delete bitmap entries for older duplicate unique-key rows.

None

  • Test: Unit Test
    • Ran ./run-be-ut.sh --run --filter=VerticalCompactionTest.ClusterKeyMowCompactionNeedsOutputRowsetInternalDedup -j 8
  • Behavior changed: No
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 10, 2026

/review

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 10, 2026

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix correctly targets cluster-key MoW compaction output in the shared-nothing path, but it is not wired into the cloud compaction path, so the same duplicate-visible-row bug remains in cloud mode.

Critical checkpoints:

  • Goal/test: the goal is to delete older duplicate unique-key rows produced by cluster-key MoW compaction. The added unit test covers local vertical compaction only; cloud compaction is not covered.
  • Scope/focus: the helper is focused, but the call site is incomplete because cloud has a parallel compaction delete-bitmap path.
  • Concurrency/lifecycle: no new shared mutable state or lock-order issue found in the helper itself; it runs before delete-bitmap merge/persistence.
  • Config/compatibility: no new config or storage-format compatibility issue found.
  • Parallel paths: blocking issue found: CloudTablet::calc_delete_bitmap_for_compaction is not updated.
  • Tests: existing added test does not exercise the cloud path, which is where the missing call is.
  • Observability/performance: no additional blocking observability issue found; the helper scans output PK indexes as expected for this correctness fix.

User focus: no additional user-provided review focus was supplied.

// may be non-adjacent in merge order. Scan the output primary key index to delete older
// duplicate rows inside the output rowset.
if (!tablet()->tablet_schema()->cluster_key_uids().empty()) {
RETURN_IF_ERROR(tablet()->calc_compaction_output_rowset_internal_delete_bitmap(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only fixes the shared-nothing CompactionMixin::modify_rowsets() path. Cloud compactions do not call this method; CloudCumulativeCompaction::modify_rowsets() calls CloudTablet::calc_delete_bitmap_for_compaction() instead, and that function still only converts input-rowset delete bitmaps before/after the meta-service lock. For a cloud cluster-key MoW table with the same shape as the new test (older (uk=1, ck=30) and newer (uk=1, ck=20) sorted by cluster key into the same output rowset), the output rowset is committed without any internal delete bitmap entry, so both duplicates can remain visible. Please apply calc_compaction_output_rowset_internal_delete_bitmap() in the cloud delete-bitmap calculation path as well and add coverage for that path.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29455 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7bea25047fc68e0d78885ab1d5407921ac7361dc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17743	3833	3810	3810
q2	q3	10738	865	608	608
q4	4663	459	348	348
q5	7444	1333	1139	1139
q6	183	166	140	140
q7	896	940	755	755
q8	9328	1397	1294	1294
q9	5693	5420	5322	5322
q10	6298	2089	1831	1831
q11	479	261	257	257
q12	638	409	292	292
q13	18104	3314	2728	2728
q14	288	282	263	263
q15	q16	899	868	791	791
q17	1016	996	826	826
q18	6481	5664	5564	5564
q19	1210	1250	1068	1068
q20	505	394	253	253
q21	4540	2247	1860	1860
q22	421	358	306	306
Total cold run time: 97567 ms
Total hot run time: 29455 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4143	4105	4073	4073
q2	q3	4628	4736	4160	4160
q4	2090	2165	1388	1388
q5	4957	4857	4998	4857
q6	194	161	131	131
q7	2065	1788	1692	1692
q8	3571	3251	3167	3167
q9	8545	8449	8454	8449
q10	4510	4539	4222	4222
q11	613	417	413	413
q12	717	738	519	519
q13	3199	3618	2980	2980
q14	290	299	268	268
q15	q16	771	783	681	681
q17	1325	1311	1267	1267
q18	8256	7169	7043	7043
q19	1152	1183	1122	1122
q20	2223	2214	1937	1937
q21	6114	5330	5022	5022
q22	557	511	442	442
Total cold run time: 59920 ms
Total hot run time: 53833 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170799 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7bea25047fc68e0d78885ab1d5407921ac7361dc, data reload: false

query5	4310	664	524	524
query6	357	232	209	209
query7	4255	583	312	312
query8	341	259	262	259
query9	8836	4019	4019	4019
query10	480	358	312	312
query11	6022	2418	2184	2184
query12	194	143	131	131
query13	1280	632	436	436
query14	6878	5388	5046	5046
query14_1	4408	4407	4356	4356
query15	214	204	182	182
query16	1001	459	360	360
query17	1388	772	644	644
query18	2736	509	371	371
query19	339	212	182	182
query20	146	140	137	137
query21	215	148	117	117
query22	13561	13894	14294	13894
query23	17333	16611	16139	16139
query23_1	16347	16451	16235	16235
query24	7675	1796	1433	1433
query24_1	1415	1406	1455	1406
query25	644	520	500	500
query26	1301	311	164	164
query27	2698	627	341	341
query28	4321	1957	1962	1957
query29	949	619	518	518
query30	298	235	198	198
query31	1108	1052	937	937
query32	81	77	70	70
query33	513	345	283	283
query34	1138	1111	629	629
query35	775	777	671	671
query36	1321	1348	1166	1166
query37	152	99	88	88
query38	3172	3137	3073	3073
query39	920	920	895	895
query39_1	875	860	891	860
query40	245	154	176	154
query41	62	62	63	62
query42	111	106	105	105
query43	319	318	284	284
query44	
query45	208	209	195	195
query46	1111	1163	706	706
query47	2288	2287	2193	2193
query48	398	410	292	292
query49	633	522	423	423
query50	714	283	208	208
query51	4298	4279	4208	4208
query52	106	103	101	101
query53	260	275	199	199
query54	306	271	260	260
query55	100	88	85	85
query56	294	301	311	301
query57	1456	1390	1316	1316
query58	294	273	268	268
query59	1559	1597	1398	1398
query60	351	331	330	330
query61	161	158	163	158
query62	673	614	521	521
query63	245	201	201	201
query64	2360	813	690	690
query65	
query66	1682	517	394	394
query67	29917	30002	29816	29816
query68	
query69	466	339	309	309
query70	1034	967	962	962
query71	307	281	268	268
query72	3011	2756	2501	2501
query73	832	782	448	448
query74	5089	4867	4723	4723
query75	2806	2674	2330	2330
query76	2298	1110	736	736
query77	424	427	344	344
query78	12977	12946	12340	12340
query79	1520	1007	761	761
query80	1373	581	495	495
query81	511	279	240	240
query82	1061	163	129	129
query83	341	280	255	255
query84	255	145	132	132
query85	931	503	443	443
query86	454	320	316	316
query87	3409	3340	3206	3206
query88	3615	2705	2664	2664
query89	437	380	337	337
query90	1942	183	182	182
query91	178	168	134	134
query92	85	73	73	73
query93	1053	964	555	555
query94	709	335	296	296
query95	662	459	350	350
query96	996	771	349	349
query97	2709	2687	2609	2609
query98	236	232	229	229
query99	1105	1104	982	982
Total cold run time: 255826 ms
Total hot run time: 170799 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 71.43% (120/168) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.57% (20656/38556)
Line Coverage 37.20% (195030/524280)
Region Coverage 33.52% (152013/453548)
Branch Coverage 34.58% (66342/191873)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 72.02% (121/168) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.70% (27829/37760)
Line Coverage 57.56% (300978/522907)
Region Coverage 54.76% (250802/457968)
Branch Coverage 56.28% (108396/192601)

Issue Number: None

Related PR: None

Problem Summary: Cluster-key MOW compaction sorts rows by cluster key, so duplicate unique keys may be non-adjacent and can remain visible in the output rowset. Scan the output rowset primary key index after compaction and add output-rowset internal delete bitmap entries for older duplicate unique-key rows.

None

- Test: Unit Test
    - Ran ./run-be-ut.sh --run --filter=VerticalCompactionTest.ClusterKeyMowCompactionNeedsOutputRowsetInternalDedup -j 8
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants