Skip to content

[fix](be ut) Skip custom memcpy on ARM+ASAN to fix segfault at process startup#63656

Open
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:fix/be-ut-coredump-with-asan-in-arm
Open

[fix](be ut) Skip custom memcpy on ARM+ASAN to fix segfault at process startup#63656
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:fix/be-ut-coredump-with-asan-in-arm

Conversation

@heguanhui
Copy link
Copy Markdown
Contributor

@heguanhui heguanhui commented May 26, 2026

What problem does this PR solve?

The glibc-compatibility module provides a custom memcpy implementation (memcpy_aarch64.cpp) that overrides the global memcpy symbol via extern "C". This is done to avoid dependency on a specific glibc symbol version (e.g., memcpy@@GLIBC_2.14) for portability.

However, libpthread's __pthread_initialize_minimal() calls memcpy during very early process startup — before main(), before C++ static initialization, and before ASAN shadow memory is set up. When ASAN is enabled, the custom memcpy accesses memory that ASAN shadow has not yet mapped, resulting in SIGSEGV.

This only affects aarch64 + ASAN because:

RELEASE builds have no ASAN shadow memory checks
x86_64 + ASAN does not exhibit this crash (different shadow memory layout and initialization timing)

Problem Summary:

Release note

None

Check List (For Author)

  • Test
    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)

问题及修复.docx

- [ ] No need to test or manual test. Explain why:
    - [ ] This is a refactor/code format and no logic has been changed.
    - [ ] Previous test can cover this change.
    - [ ] No code files have been changed.
    - [ ] Other reason <!-- Add your reason?  -->
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@heguanhui
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.80% (20873/38797)
Line Coverage 37.37% (197684/528943)
Region Coverage 33.68% (154879/459919)
Branch Coverage 34.68% (67437/194478)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31673 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 31a1597e3d02f5c8f32e4d531acb64184139f15f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17770	4085	4078	4078
q2	q3	10783	1413	792	792
q4	4706	475	351	351
q5	7617	2276	2115	2115
q6	234	169	136	136
q7	956	769	639	639
q8	9382	1685	1494	1494
q9	5198	5042	4980	4980
q10	6375	2215	1874	1874
q11	431	266	241	241
q12	632	431	299	299
q13	18110	3317	2803	2803
q14	261	260	238	238
q15	q16	787	767	714	714
q17	907	959	1003	959
q18	6947	5931	5559	5559
q19	1203	1331	995	995
q20	655	470	313	313
q21	5998	2800	2720	2720
q22	456	373	483	373
Total cold run time: 99408 ms
Total hot run time: 31673 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4826	4818	4817	4817
q2	q3	4860	5297	4680	4680
q4	2143	2215	1432	1432
q5	5013	4646	4699	4646
q6	239	178	128	128
q7	1896	1722	1611	1611
q8	2404	2102	2172	2102
q9	8040	7441	7375	7375
q10	4796	4707	4215	4215
q11	532	389	363	363
q12	746	740	539	539
q13	3005	3421	2793	2793
q14	277	277	264	264
q15	q16	683	704	616	616
q17	1295	1282	1265	1265
q18	7446	6991	6872	6872
q19	1108	1115	1092	1092
q20	2238	2223	1958	1958
q21	5306	4626	4504	4504
q22	528	451	421	421
Total cold run time: 57381 ms
Total hot run time: 51693 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.73% (28015/37997)
Line Coverage 57.62% (303999/527588)
Region Coverage 54.74% (254196/464342)
Branch Coverage 56.27% (109851/195204)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172256 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 31a1597e3d02f5c8f32e4d531acb64184139f15f, data reload: false

query5	4299	659	509	509
query6	354	219	197	197
query7	4220	560	296	296
query8	326	228	231	228
query9	8833	4069	4041	4041
query10	441	338	287	287
query11	5792	2504	2246	2246
query12	192	126	120	120
query13	1258	612	427	427
query14	6160	5491	5158	5158
query14_1	4464	4508	4466	4466
query15	214	204	184	184
query16	1008	458	429	429
query17	947	703	589	589
query18	2450	482	360	360
query19	221	206	179	179
query20	131	133	136	133
query21	227	145	123	123
query22	13613	13590	13396	13396
query23	17385	16634	16325	16325
query23_1	16424	16397	16520	16397
query24	7569	1772	1332	1332
query24_1	1349	1326	1328	1326
query25	570	500	439	439
query26	1319	310	172	172
query27	2702	556	344	344
query28	4517	2012	2030	2012
query29	1040	644	539	539
query30	319	249	201	201
query31	1158	1099	961	961
query32	99	82	73	73
query33	553	381	311	311
query34	1168	1128	652	652
query35	784	805	698	698
query36	1443	1438	1264	1264
query37	155	107	93	93
query38	3242	3132	3140	3132
query39	955	941	908	908
query39_1	883	883	880	880
query40	235	153	130	130
query41	76	69	69	69
query42	115	111	111	111
query43	333	339	299	299
query44	
query45	222	207	200	200
query46	1087	1194	778	778
query47	2397	2404	2311	2311
query48	413	424	313	313
query49	650	505	398	398
query50	959	363	258	258
query51	4386	4356	4264	4264
query52	112	108	97	97
query53	257	291	211	211
query54	331	283	283	283
query55	95	95	88	88
query56	319	328	323	323
query57	1463	1434	1366	1366
query58	305	277	272	272
query59	1635	1759	1438	1438
query60	321	319	305	305
query61	154	148	152	148
query62	692	666	587	587
query63	246	199	202	199
query64	2402	800	620	620
query65	
query66	1705	484	360	360
query67	29733	29711	29568	29568
query68	
query69	478	344	315	315
query70	1014	965	1034	965
query71	311	270	259	259
query72	3177	2734	2379	2379
query73	806	733	432	432
query74	5138	4957	4787	4787
query75	2712	2612	2273	2273
query76	2390	1156	787	787
query77	394	415	331	331
query78	12369	12518	11901	11901
query79	1438	1066	722	722
query80	659	533	446	446
query81	453	282	238	238
query82	1397	160	122	122
query83	355	270	251	251
query84	317	144	148	144
query85	889	531	444	444
query86	400	328	329	328
query87	3448	3372	3227	3227
query88	3669	2744	2760	2744
query89	439	391	340	340
query90	1952	184	173	173
query91	178	166	137	137
query92	82	83	86	83
query93	1508	1444	910	910
query94	531	343	296	296
query95	669	387	442	387
query96	1134	844	331	331
query97	2737	2751	2632	2632
query98	239	227	227	227
query99	1192	1181	1032	1032
Total cold run time: 254826 ms
Total hot run time: 172256 ms

heguanhui added a commit to heguanhui/doris that referenced this pull request May 26, 2026
@zclllyybb
Copy link
Copy Markdown
Contributor

/review

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because the new ARM sanitizer guard does not cover the BE unit-test build mode that the PR title says it fixes.

Critical checkpoint conclusions:

  • Goal/test proof: The goal is to avoid custom memcpy on ARM sanitizer BE UT startup, but the current condition only handles CMake build type ASAN while run-be-ut.sh passes ASAN_UT for the default ASAN UT build. No test evidence in the PR demonstrates the BE UT path is fixed.
  • Scope/focus: The code change is small and focused, but the predicate is too narrow for the targeted build path.
  • Concurrency/lifecycle/config/compatibility: No new runtime concurrency, lifecycle, config, storage-format, or protocol compatibility concerns found in this CMake-only change.
  • Parallel paths: Regular BE ASAN is handled, but BE UT ASAN_UT is a parallel sanitizer build path and is missed.
  • Test coverage: No automated coverage was added; at minimum the affected ARM ASAN_UT configuration should be validated.
  • Performance/observability: No runtime performance or observability impact beyond build linkage.

User focus: No additional user-provided review focus was specified.

# the global memcpy symbol. libpthread's __pthread_initialize_minimal() calls memcpy
# before ASAN shadow memory is initialized, causing SIGSEGV. Skip custom memcpy in
# this case and fall back to glibc's memcpy.
if (ARCH_ARM AND "${CMAKE_BUILD_TYPE}" STREQUAL "ASAN")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This misses the BE unit-test build mode. run-be-ut.sh defaults BUILD_TYPE_UT to ASAN, but before invoking CMake it maps that to -DCMAKE_BUILD_TYPE=ASAN_UT (lines 245-254). On ARM BE UTs, this condition is therefore false and memcpy_aarch64.cpp is still added to glibc-compatibility-explicit, leaving the startup crash described in the PR title unfixed. Please include ASAN_UT (or otherwise key off the sanitizer flags) in this guard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants