Skip to content

[fix](be) Fix varbinary literal construction#64089

Merged
BiteTheDDDDt merged 1 commit into
apache:masterfrom
BiteTheDDDDt:codex/fix-varbinary-literal
Jun 4, 2026
Merged

[fix](be) Fix varbinary literal construction#64089
BiteTheDDDDt merged 1 commit into
apache:masterfrom
BiteTheDDDDt:codex/fix-varbinary-literal

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Jun 3, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary:

create_texpr_literal_node<TYPE_VARBINARY> treated the input pointer as std::string*, but Doris Field stores TYPE_VARBINARY values as StringView. When TopN predicate conversion builds a VARBINARY literal from a Field, the helper reinterprets a StringView* as a std::string*, which can make std::string assignment read a bogus size and request a huge allocation under ASAN.

This PR reads VARBINARY literal input as StringView, copies the exact byte range into the thrift literal, and adds VARBINARY coverage for create_texpr_node_from(Field, TYPE_VARBINARY, ...) and VLiteral round trip. It also wires the const void* helper for TYPE_VARBINARY.

Release note

None

Check List (For Author)

  • Test:
    • Unit Test: ./run-be-ut.sh --run --filter=TEST_VEXPR.LITERALTEST
    • Manual test: build-support/check-format.sh
    • Manual test: git diff --check upstream/master...HEAD
    • Static analysis: build-support/run-clang-tidy.sh --base upstream/master --build-dir be/ut_build_ASAN selected only the 3 changed files with line-level filtering, but exited non-zero because existing translation-unit analysis errors are emitted from vcompound_pred.h, vexpr.cpp, jni-util.h, and pre-existing vexpr_test.cpp checks. No new changed-line clang-tidy diagnostics remain.
  • Behavior changed: No
  • Does this need documentation: No

Copilot AI review requested due to automatic review settings June 3, 2026 16:11
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: VARBINARY values in expression Fields are stored as StringView, but create_texpr_literal_node<TYPE_VARBINARY> interpreted the input pointer as std::string. When TopN pushdown builds a binary expression from a VARBINARY runtime value, the StringView layout can be read as a std::string and ASAN may report an oversized allocation while copying the literal. Read VARBINARY literal data as StringView, preserve the exact byte payload when creating TVarBinaryLiteral, and add coverage for embedded null bytes.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - ./run-be-ut.sh --run --filter=TEST_VEXPR.LITERALTEST

    - build-support/check-format.sh

    - git diff --check

- Behavior changed: No

- Does this need documentation: No
@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/fix-varbinary-literal branch from ab0ea31 to b65a93c Compare June 3, 2026 17:04
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29117 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b65a93ccad06a510af344f9f0365329b6dc32861, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17861	4153	4055	4055
q2	q3	10763	1484	828	828
q4	4710	479	354	354
q5	7624	896	604	604
q6	194	174	138	138
q7	809	846	614	614
q8	9550	1636	1668	1636
q9	6715	4558	4547	4547
q10	6760	1824	1533	1533
q11	442	273	253	253
q12	670	439	289	289
q13	18192	3329	2793	2793
q14	265	260	235	235
q15	q16	827	777	710	710
q17	951	945	1028	945
q18	6892	5682	5570	5570
q19	1453	1278	1014	1014
q20	492	411	279	279
q21	5812	2566	2419	2419
q22	438	356	301	301
Total cold run time: 101420 ms
Total hot run time: 29117 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4435	4287	4337	4287
q2	q3	4582	5010	4331	4331
q4	2108	2236	1358	1358
q5	4477	4332	4319	4319
q6	232	171	125	125
q7	1783	2085	1795	1795
q8	2572	2211	2077	2077
q9	8111	8063	8018	8018
q10	4791	4780	4362	4362
q11	608	461	395	395
q12	761	848	612	612
q13	3303	3592	3029	3029
q14	319	323	290	290
q15	q16	759	753	643	643
q17	1392	1333	1336	1333
q18	7916	7268	6881	6881
q19	1107	1099	1128	1099
q20	2251	2221	1963	1963
q21	5326	4595	4434	4434
q22	525	457	422	422
Total cold run time: 57358 ms
Total hot run time: 51773 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169778 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b65a93ccad06a510af344f9f0365329b6dc32861, data reload: false

query5	4357	638	495	495
query6	442	196	176	176
query7	4810	567	294	294
query8	376	222	205	205
query9	8776	4058	4019	4019
query10	476	313	263	263
query11	5951	2366	2221	2221
query12	152	102	99	99
query13	1310	602	446	446
query14	6441	5416	5079	5079
query14_1	4774	4404	4377	4377
query15	207	200	192	192
query16	1047	459	464	459
query17	1164	714	604	604
query18	2563	490	357	357
query19	210	187	150	150
query20	117	110	107	107
query21	230	154	122	122
query22	13600	13451	13404	13404
query23	17313	16572	16177	16177
query23_1	16293	16428	16342	16342
query24	7613	1745	1295	1295
query24_1	1305	1295	1314	1295
query25	577	476	431	431
query26	1300	325	171	171
query27	2663	571	339	339
query28	4444	1998	1989	1989
query29	1056	613	483	483
query30	308	233	197	197
query31	1143	1068	944	944
query32	97	60	58	58
query33	509	307	256	256
query34	1189	1147	662	662
query35	744	761	682	682
query36	1408	1404	1267	1267
query37	152	103	92	92
query38	3172	3065	3065	3065
query39	931	929	894	894
query39_1	892	900	863	863
query40	224	120	103	103
query41	66	64	62	62
query42	102	97	93	93
query43	315	322	279	279
query44	
query45	196	189	180	180
query46	1081	1239	712	712
query47	2389	2423	2281	2281
query48	396	398	296	296
query49	647	499	357	357
query50	985	340	256	256
query51	4364	4343	4269	4269
query52	87	90	80	80
query53	239	267	189	189
query54	297	239	196	196
query55	79	75	68	68
query56	233	236	221	221
query57	1453	1408	1332	1332
query58	241	219	215	215
query59	1560	1689	1488	1488
query60	277	246	263	246
query61	152	157	159	157
query62	691	666	571	571
query63	229	190	181	181
query64	2519	797	634	634
query65	
query66	1800	468	346	346
query67	29857	29655	29583	29583
query68	
query69	412	309	255	255
query70	991	914	968	914
query71	302	222	218	218
query72	3009	2744	2411	2411
query73	864	762	402	402
query74	5136	4948	4812	4812
query75	2663	2594	2245	2245
query76	2362	1152	791	791
query77	369	405	301	301
query78	12444	12478	11873	11873
query79	1432	1044	761	761
query80	664	469	404	404
query81	451	286	239	239
query82	575	164	128	128
query83	343	278	246	246
query84	265	140	113	113
query85	893	533	443	443
query86	381	296	277	277
query87	3384	3375	3211	3211
query88	3636	2739	2739	2739
query89	432	379	328	328
query90	1849	183	169	169
query91	176	168	139	139
query92	64	64	63	63
query93	1609	1478	867	867
query94	551	352	317	317
query95	688	461	336	336
query96	1009	773	357	357
query97	2755	2709	2562	2562
query98	216	208	210	208
query99	1150	1185	1044	1044
Total cold run time: 252009 ms
Total hot run time: 169778 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and tests: The PR fixes VARBINARY literal construction by treating VARBINARY Field/raw values as StringView and copying the exact byte range into TVarBinaryLiteral. The added BE unit coverage exercises Field -> TExprNode -> VLiteral -> ColumnVarbinary round-trip with embedded null bytes and both inline/long StringView payloads.
  • Scope: The change is small and focused on literal construction plus targeted test coverage. The unrelated style-only literal suffix edits are harmless.
  • Concurrency and lifecycle: No new concurrency, locking, static initialization, or non-trivial lifecycle behavior is introduced. The StringView payload is copied into the thrift string before the source Field/StringView can go out of scope.
  • Configuration and compatibility: No new config items, storage format changes, thrift schema changes, or FE/BE protocol shape changes are introduced. Existing VARBINARY_LITERAL thrift payload remains a string value.
  • Parallel paths: Both create_texpr_node_from(Field, ...) and create_texpr_node_from(const void*, ...) now support TYPE_VARBINARY consistently with PrimitiveTypeTraits<TYPE_VARBINARY>::CppType and ColumnVarbinary storage.
  • Error handling and memory safety: No Status is newly ignored in production code. The fix removes the prior invalid std::string reinterpretation and avoids bogus allocations by constructing std::string(data, size).
  • Data correctness: Embedded null bytes are preserved by size-aware copying; no query visibility, transaction, delete bitmap, or persistence behavior is affected.
  • Observability: No new observability is needed for this local literal-construction fix.
  • Test results: I verified git diff --check passed. build-support/check-format.sh could not run because the runner clang-format is not version 16. ./run-be-ut.sh --run --filter=TEST_VEXPR.LITERALTEST could not complete in this checkout because gensrc failed with missing thirdparty/installed/bin/protoc.

User focus: No additional user-provided review focus was present.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 66.67% (6/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.93% (27494/38222)
Line Coverage 55.51% (294487/530524)
Region Coverage 52.32% (246000/470152)
Branch Coverage 53.43% (106229/198801)

@hello-stephen
Copy link
Copy Markdown
Contributor

skip buildall

@BiteTheDDDDt BiteTheDDDDt merged commit 0d8654b into apache:master Jun 4, 2026
33 of 34 checks passed
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants