Skip to content

[Fix](cloud-mow) Full compaction should only update delete bitmap of its output rowset #50974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented May 16, 2025

What problem does this PR solve?

When table has sequence column, full compaction may generate delete bitmap marks on published rowset when calculating delete bitmaps for incremental rowsets. This will cause correctness problem when those published rowset has multi segments and have duplicate keys between those segments because full compaction will override the origin delete bitmaps on those segment currently.
To fix it, full compaction should only update delete bitmaps of its own output rowset.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented May 16, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33874 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 97ebec4064b3714bb62d95b62b712edcfdb8ab4d, data reload: false

------ Round 1 ----------------------------------
q1	25866	5039	5011	5011
q2	2068	279	181	181
q3	10390	1225	666	666
q4	10226	995	525	525
q5	7527	2337	2336	2336
q6	177	165	133	133
q7	907	757	608	608
q8	9318	1291	1132	1132
q9	6703	5100	5141	5100
q10	6917	2339	1925	1925
q11	500	292	282	282
q12	355	362	219	219
q13	17879	3703	3056	3056
q14	220	231	212	212
q15	511	490	488	488
q16	422	433	376	376
q17	614	847	374	374
q18	7506	7165	7194	7165
q19	1572	952	563	563
q20	337	339	226	226
q21	3880	3156	2358	2358
q22	1032	1006	938	938
Total cold run time: 114927 ms
Total hot run time: 33874 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5107	5069	5023	5023
q2	232	330	228	228
q3	2132	2638	2293	2293
q4	1313	1721	1316	1316
q5	4349	4404	4421	4404
q6	212	173	130	130
q7	2010	1936	1772	1772
q8	2564	2585	2566	2566
q9	7295	7177	6910	6910
q10	3048	3162	2749	2749
q11	579	506	488	488
q12	666	782	615	615
q13	3476	3827	3270	3270
q14	295	301	279	279
q15	543	509	510	509
q16	457	495	433	433
q17	1134	1558	1390	1390
q18	7723	7655	7518	7518
q19	800	820	909	820
q20	2038	1987	1839	1839
q21	4781	4489	4537	4489
q22	1066	1033	1051	1033
Total cold run time: 51820 ms
Total hot run time: 50074 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192929 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 97ebec4064b3714bb62d95b62b712edcfdb8ab4d, data reload: false

query1	1401	1087	1054	1054
query2	6341	1905	1869	1869
query3	11027	4648	4273	4273
query4	54261	25654	23179	23179
query5	5058	514	441	441
query6	358	209	206	206
query7	4885	528	295	295
query8	325	270	245	245
query9	5638	2672	2650	2650
query10	444	328	274	274
query11	15055	15101	14957	14957
query12	167	111	105	105
query13	1058	545	419	419
query14	10211	6369	6546	6369
query15	220	201	191	191
query16	7079	639	441	441
query17	1078	756	592	592
query18	1561	416	322	322
query19	218	196	178	178
query20	129	136	127	127
query21	206	135	110	110
query22	4403	4340	4358	4340
query23	34223	33533	33704	33533
query24	6538	2473	2432	2432
query25	468	472	410	410
query26	695	281	154	154
query27	2412	521	370	370
query28	3122	2176	2176	2176
query29	587	571	440	440
query30	271	220	189	189
query31	891	881	744	744
query32	73	64	65	64
query33	442	362	331	331
query34	824	866	558	558
query35	791	840	752	752
query36	983	1028	924	924
query37	117	104	86	86
query38	4365	4379	4284	4284
query39	1539	1505	1480	1480
query40	212	131	125	125
query41	56	54	57	54
query42	124	112	114	112
query43	529	518	496	496
query44	1418	856	858	856
query45	183	181	170	170
query46	906	1065	670	670
query47	1829	1855	1871	1855
query48	415	470	341	341
query49	700	518	446	446
query50	673	694	409	409
query51	4323	4265	4229	4229
query52	117	111	102	102
query53	241	261	189	189
query54	598	592	507	507
query55	88	89	86	86
query56	298	328	304	304
query57	1187	1189	1160	1160
query58	270	274	261	261
query59	2873	2920	2800	2800
query60	354	335	338	335
query61	129	135	130	130
query62	731	744	722	722
query63	233	192	196	192
query64	1684	1002	683	683
query65	4290	4259	4235	4235
query66	703	460	303	303
query67	16037	15614	15297	15297
query68	6995	905	515	515
query69	547	308	270	270
query70	1205	1114	1067	1067
query71	512	325	288	288
query72	5960	4799	4818	4799
query73	1431	688	359	359
query74	9181	9039	8784	8784
query75	3802	3191	2678	2678
query76	4194	1206	765	765
query77	645	377	285	285
query78	10251	10251	9380	9380
query79	2338	780	573	573
query80	620	514	451	451
query81	492	260	219	219
query82	431	122	95	95
query83	354	257	227	227
query84	295	110	83	83
query85	779	350	310	310
query86	386	309	275	275
query87	4489	4447	4302	4302
query88	3168	2297	2292	2292
query89	468	314	287	287
query90	1798	204	200	200
query91	145	143	110	110
query92	78	63	52	52
query93	1800	921	570	570
query94	666	420	302	302
query95	369	284	277	277
query96	508	591	284	284
query97	2744	2741	2653	2653
query98	229	212	204	204
query99	1457	1419	1340	1340
Total cold run time: 299002 ms
Total hot run time: 192929 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 97ebec4064b3714bb62d95b62b712edcfdb8ab4d, data reload: false

query1	0.03	0.03	0.03
query2	0.13	0.10	0.11
query3	0.25	0.20	0.20
query4	1.59	0.20	0.19
query5	0.45	0.45	0.45
query6	1.14	0.67	0.66
query7	0.02	0.02	0.01
query8	0.05	0.03	0.04
query9	0.59	0.51	0.53
query10	0.58	0.59	0.57
query11	0.16	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.61
query14	0.80	0.81	0.85
query15	0.89	0.87	0.86
query16	0.38	0.38	0.37
query17	1.05	1.04	1.01
query18	0.22	0.21	0.21
query19	1.89	1.86	1.83
query20	0.01	0.01	0.02
query21	15.40	0.87	0.54
query22	0.75	1.36	0.78
query23	14.73	1.39	0.62
query24	7.17	1.11	0.67
query25	0.52	0.21	0.23
query26	0.56	0.17	0.13
query27	0.07	0.06	0.05
query28	9.69	0.92	0.47
query29	12.53	3.96	3.33
query30	0.24	0.09	0.06
query31	2.83	0.59	0.40
query32	3.24	0.56	0.48
query33	2.99	3.08	3.07
query34	15.94	5.08	4.49
query35	4.49	4.49	4.50
query36	0.68	0.49	0.50
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.13	0.14
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 103.32 s
Total hot run time: 29.27 s

@bobhan1
Copy link
Contributor Author

bobhan1 commented May 16, 2025

run p0

@@ -377,7 +377,8 @@ Status CloudFullCompaction::_cloud_full_compaction_calc_delete_bitmap(
segments.begin(), segments.end(), 0,
[](size_t sum, const segment_v2::SegmentSharedPtr& s) { return sum += s->num_rows(); });
for (const auto& [k, v] : tmp_delete_bitmap->delete_bitmap) {
if (std::get<1>(k) != DeleteBitmap::INVALID_SEGMENT_ID) {
if (std::get<0>(k) == _output_rowset->rowset_id() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments here

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 16, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.87% (14883/26641)
Line Coverage 44.65% (131877/295363)
Region Coverage 43.76% (66405/151733)
Branch Coverage 38.36% (34022/88694)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit af64ad9 into apache:master May 16, 2025
31 of 35 checks passed
github-actions bot pushed a commit that referenced this pull request May 16, 2025
…its output rowset (#50974)

### What problem does this PR solve?

When table has sequence column, full compaction may generate delete
bitmap marks on published rowset when calculating delete bitmaps for
incremental rowsets. This will cause correctness problem when those
published rowset has multi segments and have duplicate keys between
those segments because full compaction will **override** the origin
delete bitmaps on those segment currently.
To fix it, full compaction should only update delete bitmaps of its own
output rowset.
dataroaring pushed a commit that referenced this pull request May 17, 2025
…e bitmap of its output rowset #50974 (#50985)

Cherry-picked from #50974

Co-authored-by: bobhan1 <baohan@selectdb.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…its output rowset (apache#50974)

### What problem does this PR solve?

When table has sequence column, full compaction may generate delete
bitmap marks on published rowset when calculating delete bitmaps for
incremental rowsets. This will cause correctness problem when those
published rowset has multi segments and have duplicate keys between
those segments because full compaction will **override** the origin
delete bitmaps on those segment currently.
To fix it, full compaction should only update delete bitmaps of its own
output rowset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.6-merged p0_w reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants