Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Aug 6, 2025

pick #53661

We found a bug that rows may be distributed to wrong tablet when loading
to bucket hash table in rare situations and it's hard to find the root
cause currently.

For better debugging, this PR
1. Add a function `crc32_internal` to mimic the hashing method when
distributed data in sink node for bucket hash table
2. check that if all remaining bucket hash tables' data are correct
after all tests finished.
@bobhan1 bobhan1 requested a review from morrySnow as a code owner August 6, 2025 03:03
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 6, 2025

run buildall

1 similar comment
@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 6, 2025

run buildall

@bobhan1 bobhan1 force-pushed the branch-3.1-pick-53661 branch from f8cdd63 to 3ef19d6 Compare August 6, 2025 03:42
@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 6, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32792 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3ef19d67484d67459c5a5eb2d8faa9ef06b7da71, data reload: false

------ Round 1 ----------------------------------
q1	17621	5567	5640	5567
q2	2063	280	177	177
q3	10781	1277	764	764
q4	10561	890	455	455
q5	9607	2568	2135	2135
q6	190	167	136	136
q7	908	761	600	600
q8	9351	1450	1191	1191
q9	5354	4959	4939	4939
q10	6779	2244	1800	1800
q11	489	287	270	270
q12	335	355	213	213
q13	17780	3563	2985	2985
q14	233	233	213	213
q15	550	476	462	462
q16	418	439	385	385
q17	603	862	353	353
q18	6836	6334	6391	6334
q19	1639	964	586	586
q20	330	351	208	208
q21	2918	2187	2034	2034
q22	1069	1029	985	985
Total cold run time: 106415 ms
Total hot run time: 32792 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5590	5531	5470	5470
q2	239	328	232	232
q3	2250	2618	2314	2314
q4	1391	1874	1405	1405
q5	4439	5020	4979	4979
q6	169	161	131	131
q7	2063	1923	1808	1808
q8	2570	3105	2679	2679
q9	7255	7237	7232	7232
q10	3030	3324	2728	2728
q11	570	500	492	492
q12	651	748	642	642
q13	3391	3779	3150	3150
q14	282	302	263	263
q15	513	461	481	461
q16	453	473	428	428
q17	1213	1704	1236	1236
q18	7544	7339	7382	7339
q19	855	1142	1083	1083
q20	2034	2054	1878	1878
q21	5313	4891	4636	4636
q22	1130	1050	1009	1009
Total cold run time: 52945 ms
Total hot run time: 51595 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 199188 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3ef19d67484d67459c5a5eb2d8faa9ef06b7da71, data reload: false

query1	1301	966	920	920
query2	6247	1979	1905	1905
query3	10894	4529	4530	4529
query4	33222	23993	23603	23603
query5	3643	624	482	482
query6	284	216	205	205
query7	3987	498	330	330
query8	308	242	231	231
query9	9551	2590	2548	2548
query10	448	333	251	251
query11	18211	15611	15443	15443
query12	163	115	109	109
query13	1569	559	429	429
query14	9045	7877	6944	6944
query15	232	208	188	188
query16	7904	691	520	520
query17	1627	773	607	607
query18	2115	446	338	338
query19	227	208	186	186
query20	132	121	121	121
query21	207	136	117	117
query22	4579	4656	4453	4453
query23	35324	34783	34605	34605
query24	7574	2731	2762	2731
query25	505	492	436	436
query26	1228	269	176	176
query27	2087	501	369	369
query28	5792	2264	2210	2210
query29	621	591	474	474
query30	249	216	181	181
query31	1086	922	889	889
query32	71	63	60	60
query33	542	375	340	340
query34	781	897	559	559
query35	786	825	744	744
query36	1028	1048	961	961
query37	110	100	70	70
query38	4121	4084	4023	4023
query39	1578	1487	1452	1452
query40	214	128	113	113
query41	54	50	49	49
query42	122	106	104	104
query43	501	528	473	473
query44	1396	822	826	822
query45	190	186	172	172
query46	907	1071	688	688
query47	2024	2007	1942	1942
query48	420	433	354	354
query49	783	514	444	444
query50	705	714	450	450
query51	7446	7354	7328	7328
query52	103	103	99	99
query53	241	267	194	194
query54	546	591	494	494
query55	80	89	78	78
query56	291	309	263	263
query57	1301	1263	1211	1211
query58	249	250	221	221
query59	3067	3184	3005	3005
query60	296	283	282	282
query61	119	121	117	117
query62	776	756	685	685
query63	239	203	207	203
query64	3821	1029	677	677
query65	3506	3331	3331	3331
query66	1071	432	348	348
query67	16484	15791	15546	15546
query68	7674	868	547	547
query69	484	319	272	272
query70	1200	1130	1062	1062
query71	428	295	274	274
query72	5770	3968	3887	3887
query73	633	752	361	361
query74	10181	8977	9328	8977
query75	3376	3221	2700	2700
query76	3378	1214	788	788
query77	751	371	304	304
query78	11055	10808	9948	9948
query79	3561	877	609	609
query80	708	559	444	444
query81	499	273	235	235
query82	622	122	92	92
query83	199	169	151	151
query84	283	108	85	85
query85	800	411	308	308
query86	347	303	301	301
query87	4388	4374	4270	4270
query88	4693	2425	2409	2409
query89	425	333	298	298
query90	1801	194	193	193
query91	143	141	112	112
query92	67	57	56	56
query93	1864	892	550	550
query94	681	409	277	277
query95	354	287	277	277
query96	496	619	282	282
query97	3240	3318	3143	3143
query98	222	217	207	207
query99	1557	1435	1337	1337
Total cold run time: 298211 ms
Total hot run time: 199188 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3ef19d67484d67459c5a5eb2d8faa9ef06b7da71, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.04	0.03
query3	0.24	0.06	0.07
query4	1.62	0.10	0.11
query5	0.51	0.52	0.53
query6	1.13	0.74	0.74
query7	0.02	0.02	0.01
query8	0.05	0.03	0.04
query9	0.59	0.52	0.50
query10	0.56	0.54	0.55
query11	0.14	0.11	0.10
query12	0.13	0.12	0.11
query13	0.63	0.59	0.59
query14	0.77	0.83	0.78
query15	0.85	0.82	0.82
query16	0.39	0.39	0.40
query17	1.05	1.06	1.03
query18	0.24	0.22	0.23
query19	1.92	1.88	1.83
query20	0.02	0.00	0.01
query21	15.38	0.90	0.56
query22	0.76	0.81	0.64
query23	15.13	1.47	0.52
query24	2.71	0.97	1.15
query25	0.12	0.13	0.08
query26	0.29	0.15	0.14
query27	0.06	0.04	0.06
query28	13.66	1.03	0.43
query29	12.56	3.92	3.30
query30	0.24	0.09	0.07
query31	2.83	0.60	0.38
query32	3.23	0.54	0.45
query33	2.97	3.01	3.01
query34	16.42	5.20	4.57
query35	4.61	4.58	4.51
query36	0.64	0.48	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.17	0.14	0.13
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.04 s
Total hot run time: 28.89 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 7.50% (3/40) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.41% (12645/27849)
Line Coverage 36.26% (112692/310813)
Region Coverage 35.31% (58254/164999)
Branch Coverage 32.50% (31677/97472)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 8.33% (1/12) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.00% (38/40) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 76.23% (20975/27517)
Line Coverage 69.59% (216151/310606)
Region Coverage 67.60% (129350/191340)
Branch Coverage 61.18% (67303/110002)

@morrySnow morrySnow changed the title branch-3.1: [Debug](distribute) Check bucket hash table before quit (#53661) branch-3.1: [Debug](distribute) Check bucket hash table before quit #53661 Aug 6, 2025
@morrySnow morrySnow merged commit 43af1d4 into apache:branch-3.1 Aug 6, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants