Skip to content

[fix](be) Remove unsafe JsonbWriter key overload#63355

Merged
HappenLee merged 1 commit into
apache:masterfrom
mrhhsg:codex/remove-jsonb-writer-cstr-key
May 20, 2026
Merged

[fix](be) Remove unsafe JsonbWriter key overload#63355
HappenLee merged 1 commit into
apache:masterfrom
mrhhsg:codex/remove-jsonb-writer-cstr-key

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 18, 2026

What problem does this PR solve?

Related PR: N/A

Problem Summary: JsonbWriter::writeKey(const char*) inferred key length with strlen, which silently truncated byte strings that contained embedded NUL bytes. Remove this overload so callers must use the explicit-length API.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-be-ut.sh --run --filter=JsonbDocumentTest.*:JsonbSerializeTest.*:JsonbParserTest.*:ConvertFieldToTypeTest.*
    • build-support/clang-format.sh
    • build-support/check-format.sh
    • build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (failed: existing clang-tidy environment/pre-existing diagnostics report missing stddef.h and unmatched NOLINTEND in be/src/core/types.h)
  • Behavior changed: Yes. Internal C++ callers can no longer use JsonbWriter::writeKey(const char*) and must pass an explicit key length.
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg mrhhsg force-pushed the codex/remove-jsonb-writer-cstr-key branch from 5c39e10 to 3cbbf65 Compare May 18, 2026 08:52
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 18, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found in the PR patch.

Critical checkpoint conclusions:

  • Goal/test: The PR removes the unsafe JsonbWriter::writeKey(const char*) overload and updates existing tests to use explicit lengths; the changed tests cover the affected call sites in JSONB unit tests.
  • Scope: The change is small and focused on the overload removal plus required test updates.
  • Concurrency/lifecycle/config/compatibility: Not applicable; no runtime concurrency, lifecycle, config, storage format, or protocol changes are introduced.
  • Parallel code paths: Existing C++ call sites either already pass explicit lengths/key ids or are updated in tests; no remaining single-argument char* use was found in the reviewed patch.
  • Error handling/memory/data correctness: No new Status handling or memory ownership paths are introduced. The explicit-length API preserves embedded NUL bytes that the removed strlen overload could truncate.
  • Tests: Relevant BE unit tests are updated; I did not rerun them in this review-only pass.
  • Observability/performance: Not applicable; this is compile-time API hardening.

User focus: no additional user-provided review focus was present.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 18, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.50% (20649/38598)
Line Coverage 37.16% (195116/525093)
Region Coverage 33.53% (152710/455412)
Branch Coverage 34.56% (66575/192614)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31274 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3cbbf65fd1954ddc260fa0a11ef43516947a08c8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17663	3897	3903	3897
q2	q3	10843	1397	819	819
q4	4686	482	349	349
q5	7633	2301	2116	2116
q6	236	176	137	137
q7	926	778	643	643
q8	9422	1616	1479	1479
q9	5240	4931	4953	4931
q10	6366	2131	1819	1819
q11	444	283	241	241
q12	637	433	288	288
q13	18110	3351	2786	2786
q14	264	253	234	234
q15	q16	821	776	710	710
q17	992	924	939	924
q18	6827	5739	5504	5504
q19	1299	1198	1055	1055
q20	505	413	330	330
q21	6263	2915	2672	2672
q22	486	373	340	340
Total cold run time: 99663 ms
Total hot run time: 31274 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4773	4500	4485	4485
q2	q3	4837	5222	4633	4633
q4	2092	2203	1414	1414
q5	4698	4843	4576	4576
q6	235	177	127	127
q7	1837	1759	1577	1577
q8	2359	2063	2005	2005
q9	7637	7587	7155	7155
q10	4482	4377	3983	3983
q11	534	378	346	346
q12	714	726	518	518
q13	3014	3354	2799	2799
q14	277	282	252	252
q15	q16	686	701	594	594
q17	1259	1236	1217	1217
q18	7285	6915	6704	6704
q19	1098	1133	1109	1109
q20	2230	2196	1917	1917
q21	5287	4582	4460	4460
q22	514	466	424	424
Total cold run time: 55848 ms
Total hot run time: 50295 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.60% (27821/37799)
Line Coverage 57.54% (301341/523711)
Region Coverage 54.79% (251924/459828)
Branch Coverage 56.31% (108874/193344)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169646 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3cbbf65fd1954ddc260fa0a11ef43516947a08c8, data reload: false

query5	4312	669	516	516
query6	331	228	201	201
query7	4282	600	295	295
query8	320	239	226	226
query9	8834	4049	4060	4049
query10	456	346	309	309
query11	5792	2432	2234	2234
query12	189	130	127	127
query13	1289	650	428	428
query14	5998	5417	5024	5024
query14_1	4312	4349	4335	4335
query15	211	203	191	191
query16	994	465	448	448
query17	1003	723	576	576
query18	2448	489	350	350
query19	212	200	160	160
query20	132	131	129	129
query21	213	139	120	120
query22	13753	13616	13428	13428
query23	17202	16347	16026	16026
query23_1	16170	16138	16214	16138
query24	7527	1800	1297	1297
query24_1	1323	1312	1332	1312
query25	588	516	448	448
query26	1329	360	181	181
query27	2671	610	347	347
query28	4504	1957	1979	1957
query29	986	663	532	532
query30	315	244	207	207
query31	1130	1089	949	949
query32	92	79	79	79
query33	559	371	309	309
query34	1180	1149	654	654
query35	779	778	691	691
query36	1302	1313	1136	1136
query37	161	110	99	99
query38	3214	3148	3069	3069
query39	956	926	899	899
query39_1	864	915	872	872
query40	233	155	131	131
query41	77	72	70	70
query42	119	116	116	116
query43	337	345	315	315
query44	
query45	220	214	201	201
query46	1076	1270	742	742
query47	2258	2269	2144	2144
query48	413	436	298	298
query49	657	514	404	404
query50	1038	353	256	256
query51	4330	4354	4273	4273
query52	117	111	97	97
query53	269	285	213	213
query54	330	281	289	281
query55	97	94	89	89
query56	311	325	325	325
query57	1418	1370	1317	1317
query58	338	291	280	280
query59	1590	1626	1450	1450
query60	341	373	315	315
query61	161	159	159	159
query62	670	619	554	554
query63	237	205	205	205
query64	2403	813	625	625
query65	
query66	1712	495	389	389
query67	30081	29961	29861	29861
query68	
query69	464	343	302	302
query70	1096	990	943	943
query71	314	284	275	275
query72	3108	2749	2438	2438
query73	862	800	432	432
query74	5135	4993	4699	4699
query75	2655	2579	2264	2264
query76	2342	1175	773	773
query77	422	416	336	336
query78	12160	12413	11627	11627
query79	1409	1044	762	762
query80	649	542	453	453
query81	461	279	238	238
query82	1422	160	124	124
query83	352	281	250	250
query84	306	147	113	113
query85	873	570	485	485
query86	380	351	325	325
query87	3425	3375	3207	3207
query88	3612	2667	2651	2651
query89	433	395	339	339
query90	1986	182	175	175
query91	178	167	138	138
query92	79	77	76	76
query93	1520	1391	837	837
query94	560	350	303	303
query95	657	484	356	356
query96	1045	752	328	328
query97	2748	2702	2546	2546
query98	237	233	234	233
query99	1091	1116	981	981
Total cold run time: 253411 ms
Total hot run time: 169646 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.60% (27820/37799)
Line Coverage 57.54% (301338/523711)
Region Coverage 54.78% (251906/459828)
Branch Coverage 56.31% (108870/193344)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27822/37799)
Line Coverage 57.55% (301391/523711)
Region Coverage 54.82% (252069/459828)
Branch Coverage 56.32% (108893/193344)

@mrhhsg mrhhsg marked this pull request as ready for review May 19, 2026 03:31
Comment thread be/src/util/jsonb_writer.h Outdated
}

bool writeKey(const char* key) { return writeKey(key, strlen(key)); }
bool writeKey(const char* key) = delete;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接删了吧

### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: JsonbWriter::writeKey(const char*) used strlen to infer the key length, which made callers vulnerable to silent truncation when passing data with embedded NUL bytes. Remove the overload so callers must use the explicit-length API.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-be-ut.sh --run --filter=JsonbDocumentTest.*:JsonbSerializeTest.*:JsonbParserTest.*:ConvertFieldToTypeTest.*
    - build-support/clang-format.sh
    - build-support/check-format.sh
    - build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (failed: existing clang-tidy environment/pre-existing diagnostics report missing stddef.h and unmatched NOLINTEND in be/src/core/types.h)
- Behavior changed: Yes. Internal C++ callers can no longer use JsonbWriter::writeKey(const char*) and must pass an explicit key length.
- Does this need documentation: No
@mrhhsg mrhhsg force-pushed the codex/remove-jsonb-writer-cstr-key branch from 3cbbf65 to 1b8c30c Compare May 19, 2026 03:54
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 19, 2026

/review

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 20, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31532 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1b8c30c368e5d42bf9eb773e2146ba2f8e4a1630, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17911	4038	3993	3993
q2	q3	10839	1481	824	824
q4	4698	479	339	339
q5	7761	2386	2148	2148
q6	388	181	140	140
q7	943	790	636	636
q8	9612	1793	1563	1563
q9	7037	4955	4959	4955
q10	6452	2087	1829	1829
q11	444	272	250	250
q12	691	428	304	304
q13	18222	3383	2818	2818
q14	266	255	238	238
q15	q16	818	775	714	714
q17	987	987	1005	987
q18	7171	5730	5491	5491
q19	1232	1415	1095	1095
q20	523	448	271	271
q21	6072	2745	2619	2619
q22	458	379	318	318
Total cold run time: 102525 ms
Total hot run time: 31532 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4721	4772	4609	4609
q2	q3	4902	5233	4666	4666
q4	2151	2239	1439	1439
q5	4882	4694	4616	4616
q6	234	185	128	128
q7	1948	1761	1578	1578
q8	2300	2013	1950	1950
q9	7270	7311	7300	7300
q10	4470	4419	4002	4002
q11	561	401	364	364
q12	722	720	510	510
q13	2993	3443	2761	2761
q14	268	284	262	262
q15	q16	693	710	624	624
q17	1325	1300	1282	1282
q18	7460	6836	6966	6836
q19	1134	1129	1141	1129
q20	2237	2206	1961	1961
q21	5469	4741	4684	4684
q22	565	478	422	422
Total cold run time: 56305 ms
Total hot run time: 51123 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170478 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1b8c30c368e5d42bf9eb773e2146ba2f8e4a1630, data reload: false

query5	4344	667	531	531
query6	343	220	207	207
query7	4235	565	349	349
query8	330	245	229	229
query9	8818	4038	4033	4033
query10	468	347	299	299
query11	5749	2611	2213	2213
query12	183	130	139	130
query13	1270	613	455	455
query14	6060	5383	5083	5083
query14_1	4391	4403	4365	4365
query15	220	208	187	187
query16	1055	482	467	467
query17	1138	734	594	594
query18	2499	476	355	355
query19	218	204	173	173
query20	143	134	132	132
query21	218	143	121	121
query22	13674	13535	13370	13370
query23	17271	16504	16045	16045
query23_1	16185	16207	16181	16181
query24	7568	1742	1275	1275
query24_1	1309	1309	1323	1309
query25	565	490	417	417
query26	1332	325	177	177
query27	2692	572	356	356
query28	4449	1982	1968	1968
query29	986	629	496	496
query30	315	239	202	202
query31	1114	1078	964	964
query32	82	77	76	76
query33	535	346	299	299
query34	1228	1133	653	653
query35	766	796	684	684
query36	1316	1315	1085	1085
query37	155	107	93	93
query38	3229	3157	3089	3089
query39	936	924	884	884
query39_1	870	889	902	889
query40	232	157	131	131
query41	67	66	65	65
query42	115	114	113	113
query43	336	330	298	298
query44	
query45	210	201	198	198
query46	1064	1260	741	741
query47	2287	2256	2198	2198
query48	401	425	308	308
query49	643	494	388	388
query50	1058	351	268	268
query51	4332	4306	4198	4198
query52	112	109	97	97
query53	264	293	202	202
query54	313	275	260	260
query55	98	92	88	88
query56	314	303	314	303
query57	1407	1376	1310	1310
query58	319	280	280	280
query59	1524	1621	1405	1405
query60	330	357	311	311
query61	166	158	160	158
query62	673	630	556	556
query63	249	208	211	208
query64	2427	828	663	663
query65	
query66	1710	510	379	379
query67	30056	30101	29862	29862
query68	
query69	478	347	328	328
query70	998	1025	1030	1025
query71	329	286	286	286
query72	3239	2883	2444	2444
query73	848	763	432	432
query74	5081	4873	4730	4730
query75	2706	2636	2261	2261
query76	2304	1141	758	758
query77	410	405	342	342
query78	12297	12015	11699	11699
query79	1459	1065	744	744
query80	662	558	467	467
query81	447	279	238	238
query82	1372	165	123	123
query83	360	278	249	249
query84	258	141	116	116
query85	881	542	457	457
query86	421	351	337	337
query87	3483	3377	3193	3193
query88	3544	2682	2662	2662
query89	439	393	343	343
query90	2006	202	182	182
query91	189	201	142	142
query92	82	83	80	80
query93	1490	1482	912	912
query94	557	356	326	326
query95	697	381	351	351
query96	1076	785	363	363
query97	2685	2714	2591	2591
query98	253	232	240	232
query99	1123	1143	980	980
Total cold run time: 254111 ms
Total hot run time: 170478 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27910/37914)
Line Coverage 57.55% (302554/525746)
Region Coverage 54.68% (253012/462693)
Branch Coverage 56.27% (109398/194406)

Copy link
Copy Markdown
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HappenLee HappenLee merged commit 3d7b764 into apache:master May 20, 2026
30 of 31 checks passed
github-actions Bot pushed a commit that referenced this pull request May 20, 2026
Problem Summary: `JsonbWriter::writeKey(const char*)` inferred key
length with `strlen`, which silently truncated byte strings that
contained embedded NUL bytes. Remove this overload so callers must use
the explicit-length API.

### Release note

None

### Check List (For Author)

- Test: Unit Test
- `./run-be-ut.sh --run
--filter=JsonbDocumentTest.*:JsonbSerializeTest.*:JsonbParserTest.*:ConvertFieldToTypeTest.*`
    - `build-support/clang-format.sh`
    - `build-support/check-format.sh`
- `build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN`
(failed: existing clang-tidy environment/pre-existing diagnostics report
missing `stddef.h` and unmatched `NOLINTEND` in `be/src/core/types.h`)
- Behavior changed: Yes. Internal C++ callers can no longer use
`JsonbWriter::writeKey(const char*)` and must pass an explicit key
length.
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants