[fix](be) Validate Arrow input buffers before column conversion by Mryange · Pull Request #64796 · apache/doris

Mryange · 2026-06-24T13:16:33Z

What problem does this PR solve?

Doris converts Arrow arrays into Doris columns through DataTypeSerDe::read_column_from_arrow. If the Arrow producer sends malformed array metadata, such as truncated validity bitmaps, truncated offsets buffers, non-monotonic string/list offsets, or offsets pointing past the child/value buffer, the existing conversion code may read invalid Arrow memory and crash BE.

Root cause: the Arrow-to-Doris serde path trusted Arrow array metadata before accessing Arrow buffers. Several hot paths call IsNull(), Value(), raw value offsets, list offsets, or child arrays directly, so malformed Arrow buffers can trigger out-of-bounds reads before Doris reports a clean error.

This PR adds lightweight, type-specific Arrow input validation before those buffer accesses. The checks are modeled as local preflight checks rather than full ValidateFull(): validity bitmap size, fixed-width data buffer size, boolean bitmap size, binary/string offsets buffer size, per-value data range, and list/map offsets monotonicity plus child length bounds. A BE config enable_arrow_input_validation is added and defaults to true.

The change also fixes an existing FixedSizeBinaryArray sliced-read null check: the loop uses a relative index after GetValue(start), but IsNull() expects the original Arrow row index, so it must check start + offset_i.

Type	Rows	Check disabled	Check enabled	Overhead
String	4096	39,352 ns	42,032 ns	+6.8%
String	65536	604,782 ns	624,064 ns	+3.2%
Int64	4096	1,480 ns	1,523 ns	+2.9%
Int64	65536	16,160 ns	15,883 ns	-1.7%
Boolean	4096	4,784 ns	4,888 ns	+2.2%
Boolean	65536	79,017 ns	80,453 ns	+1.8%
ArrayString	4096	139,747 ns	147,793 ns	+5.8%
ArrayString	65536	2,384,377 ns	2,477,601 ns	+3.9%
MapStringInt	4096	84,375 ns	96,022 ns	+13.8%
MapStringInt	65536	2,742,538 ns	2,903,358 ns	+5.9%

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-06-24T13:16:45Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Mryange · 2026-06-30T02:03:46Z

/review

github-actions

I found three blocking issues in the Arrow input validation changes. The main problems are that the new config does not actually cover all Arrow-to-Doris serde paths, the list/map validation reintroduces aligned typed reads on offset buffers that Doris already handled with unaligned loads, and the string/binary path can still dereference a missing value-data buffer for zero-length non-null values.

Validation performed: reviewed the full PR diff and surrounding Arrow conversion call paths; checked existing inline review context and found no duplicates; verified the Arrow 17.0.0 API behavior for the offset read issue; ran git diff --check successfully. I did not run BE unit tests because this worktree is missing .worktree_initialized, custom_env.sh, thirdparty/installed, and initialized submodules, and the required setup script would need files outside the current checkout.

github-actions · 2026-06-30T02:23:33Z

 DECLARE_Int32(arrow_flight_sql_port);

+// Validate Arrow input buffers before converting them to Doris columns.
+DECLARE_Bool(enable_arrow_input_validation);


This flag is exposed as validating Arrow input before Doris column conversion, but it only protects the serdes touched in this PR. The same virtual read_column_from_arrow path is used by shared callers such as convert_from_arrow_batch() for Python UDF/UDAF results and by external Arrow readers, yet sibling serdes still dereference Arrow buffers without consulting the flag: JSONB reads value_offsets()/value_data(), decimals call DecimalArray::Value(), DateV2/DateTimeV2 read raw_values(), IPv4 dereferences buffers[1], and IPv6 reads string value_data()/value_offset(). A malformed Arrow result for one of those return types can still take the old unchecked path, so the new config gives a false sense that Arrow input validation is enabled for all Doris conversions. Please either apply the same range/buffer checks to the remaining read_column_from_arrow implementations that access Arrow buffers, or narrow the config/comment/tests to the exact types this PR actually protects.

github-actions · 2026-06-30T02:23:33Z

+                start, end, offsets.length());
+    }
+
+    int64_t previous_offset = offsets.Value(start);


This validation path reads list/map offsets with offsets.Value(), which in Arrow 17 is a typed raw_values()[i] load. That reintroduces the aligned Int32 access the existing runtime code avoided with unaligned_load<int32_t>() just below the validation call. Doris already has UBSan-oriented positive tests that construct these Arrow offset buffers unaligned (offsets_address % 4 == 1) and then call the list/map readers, so with validation enabled by default those tests can fail before reaching the unaligned-safe conversion path. Please read the offsets in check_arrow_offsets_range() through the same byte-pointer plus unaligned_load<int32_t>() pattern, and keep a validation-enabled unaligned list/map test so this does not regress again.

github-actions · 2026-06-30T02:23:33Z

-                assert_cast<ColumnType&>(column).insert_data(
-                        reinterpret_cast<const char*>(raw_data), length);
+                if (config::enable_arrow_input_validation) {
+                    check_arrow_value_range(*concrete_array, start_offset, length, buffer_size);


The range check still lets a missing value-data buffer through when the non-null value is empty. In this branch buffer_size becomes 0 when value_data() is null, and check_arrow_value_range(..., start_offset=0, length=0, buffer_size=0) succeeds. The next statement still evaluates buffer->data() + start_offset, so a malformed Arrow StringArray/BinaryArray with offsets {0, 0} and no value buffer crashes instead of producing the validation error this PR is meant to add. Please either reject a missing value-data buffer before the loop when any non-null row will be read, or avoid dereferencing buffer for zero-length values. The same issue exists in the large string/binary branch below.

Mryange · 2026-06-30T04:09:24Z

/review

github-actions

Automated review completed on head 8de9140.

I found one additional correctness issue in the new Arrow validation helper: malformed nullable Arrow input can report nulls while omitting the validity bitmap, and the current validation path accepts it before Doris builds the nullable null map.

Critical checkpoint conclusions:

Goal/test: the PR adds default-enabled Arrow buffer validation for selected BE Arrow readers, but the nullable validity-bitmap case is still not correctly rejected and needs a regression test.
Scope: the changes are focused on BE Arrow serde validation, tests, and benchmarks.
Concurrency/lifecycle: no new runtime concurrency or lifecycle issue found; test config mutation uses an RAII restore helper.
Config/compatibility: a new BE config is added; existing review thread 3495786715 already covers the broader config-surface mismatch with unprotected sibling serdes.
Parallel paths: existing threads already cover unaligned list/map offset validation and missing zero-length string value-data handling, so I did not duplicate those comments.
Tests/results: the new BE UT is included by the recursive be/test CMake glob, but it does not cover the missing validity-bitmap/null_count case. I did not run BE UTs because thirdparty/installed is absent in this runner.
Performance/observability: no separate performance or observability issue was substantiated beyond the validation correctness gaps.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite proposed OPT-1, accepted as inline comment F-001. tests-session-config reported TSC-001 through TSC-003 as duplicates of existing review threads and found no distinct new tests/session/config issue. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same current ledger/comment set.

Validation performed: static review of all changed files and surrounding Arrow serde paths; git show --check over the listed PR files was clean.

github-actions · 2026-06-30T04:20:02Z

+inline void check_arrow_validity_bitmap(const arrow::Array& array) {
+    arrow_validation_detail::check_arrow_length_and_offset(array);
+    const auto& buffers = array.data()->buffers;
+    if (buffers.empty() || !buffers[0]) {


This still lets a malformed nullable Arrow array through when it reports nulls but omits the validity bitmap. check_arrow_validity_bitmap() returns as soon as buffers[0] is missing, and then DataTypeNullableSerDe::read_column_from_arrow() immediately builds the Doris null map with arrow_array->IsNull(i). In Arrow 17, a primitive array without a validity bitmap does not have per-row null bits to consult; a partial positive null_count can therefore be consumed as all-valid instead of being rejected. Please reject a missing validity bitmap whenever the Arrow metadata says nulls may be present, and add a nullable test like an Int64Array with null_bitmap=nullptr and null_count=1.

Mryange · 2026-06-30T05:08:45Z

/review

github-actions

Automated review completed. I found two additional blocking issues beyond the existing inline threads: sliced Arrow arrays can pass the new validation while the touched readers still ignore array.offset(), and the new validation helpers throw Exception through shared Status-returning conversion APIs without a catch-and-convert boundary.

Critical checkpoints: the PR goal is clear, but the current code does not fully accomplish safe Arrow input validation because valid sliced arrays can decode wrong rows and validation failures can escape the expected Status path. The change is mostly focused, with no concurrency, lifecycle, transaction, persistence, or FE/BE protocol concerns found. A BE config is added with default enabled behavior; the earlier existing thread already covers that the advertised coverage is broader than the touched serdes. Parallel paths were checked; existing threads already cover sibling serde coverage, unaligned offset loads, missing string value buffers, and missing nullable validity bitmaps, so I did not repeat those. Test coverage includes new BE unit tests, but it lacks sliced-array coverage and caller-level convert_from_arrow_batch() status propagation coverage. I could not run local build/tests because this checkout is shallow and thirdparty/installed is missing.

User focus: no additional user-provided focus points were supplied.

Subagent conclusions: OPT-1 became inline comment M-1, TSC-1 became inline comment M-2, D-1 through D-4 were merged as duplicates of existing review threads, and convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

github-actions · 2026-06-30T05:20:01Z

+        arrow_validation_detail::throw_invalid_arrow(array, "data buffer is missing");
+    }
+
+    const size_t offset = static_cast<size_t>(array.offset());


This validation path still accepts sliced Arrow arrays, but the readers below are not offset-aware. For example, a zero-copy sliced Int64Array can have offset=1,length=2 over a buffer containing [1,2,3]; check_arrow_fixed_width_buffer() validates offset + length bytes, but DataTypeNumberSerDe::read_column_from_arrow() later copies from buffer->data() + start, so the caller gets [1,2] instead of the logical slice [2,3]. The same mismatch exists in the touched string/list/map paths that read value_offsets()->data() or offsets()->raw_values() with start directly. Please either reject non-zero array.offset() for these raw-buffer readers, or make all of them use offset-aware APIs/pointers and add sliced-array tests for fixed-width, string, list, and map inputs.

github-actions · 2026-06-30T05:20:01Z

+}
+
+inline void throw_invalid_arrow(std::string_view arrow_type, std::string_view message) {
+    throw Exception(ErrorCode::INVALID_ARGUMENT, "Invalid Arrow {}: {}", arrow_type, message);


These helpers throw doris::Exception from read_column_from_arrow() implementations whose public contract is Status, but not every shared caller has a catch-and-convert boundary. In particular, FromRecordBatchToBlockConverter::convert() calls the serde through plain RETURN_IF_ERROR, and convert_from_arrow_batch() is then used by Python UDF/UDAF result conversion through another RETURN_IF_ERROR. A malformed Arrow result can therefore bypass the normal Doris Status propagation instead of returning an invalid-argument status. Please either make the validation helpers return Status, or wrap the shared read_column_from_arrow() caller paths with the existing catch-and-convert macro and add a caller-level test through convert_from_arrow_batch().

Mryange · 2026-07-01T03:50:12Z

/review

github-actions

Automated review pass completed. I did not find any new non-duplicate inline issue beyond the existing review threads already attached to this PR.

Critical checkpoint conclusions:

Task goal: the PR is intended to validate Arrow input buffers before Doris column conversion. The current diff adds a config flag, validation helpers, coverage in selected serdes, BE unit tests, and a benchmark.
Existing known issues: the active inline threads already cover the remaining high-risk validation gaps, including incomplete sibling-serde buffer validation, exception propagation through shared callers, sliced-array handling, missing validity bitmaps, missing string value buffers for empty values, and unaligned list/map offset reads. I treated these as already-known context and did not resubmit them.
Scope/focus: no additional user focus was provided; I reviewed the changed files and directly related Arrow conversion paths.
Concurrency/lifecycle/storage compatibility: no new thread, lock, transaction, persistence, or storage-format path was introduced by this PR.
Config/compatibility: the new BE config defaults to enabling validation. Remaining disabled-validation behavior that looked suspicious was verified as inherited from the PR base rather than introduced by this diff.
Tests/style: the new BE unit test file is picked up by the BE test CMake glob, and git diff --check 64df172ca8582442770ae9edb5a5db09b80e723f 4c2b695de5b8c973b5eaef1973d1a9f9fed874ce is clean. I did not run BE unit tests because thirdparty/installed is missing in this checkout.

Subagent conclusions:

optimizer-rewrite proposed OPT-1; main verification dismissed it as inherited from the PR base because the row_count == 0 || buffer == nullptr guard already exists before this PR.
tests-session-config reported one duplicate of an existing GitHub thread and one duplicate of OPT-1; no new non-duplicate test/config issue remained.
Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same current ledger and empty proposed final inline comment set.

Mryange · 2026-07-01T04:10:17Z

run buildall

Mryange · 2026-07-01T07:27:05Z

/review

github-actions

Follow-up automated review for head 94418b2720a033ffda4e5f8c6dedb0d68ad34942.

I did not find any new non-duplicate inline issue to submit. The remaining substantiated Arrow validation concerns I could verify are already covered by existing inline threads: exception propagation from throwing validation helpers, partial raw-buffer validation coverage in sibling serdes, the zero-length string/binary missing value-buffer path, sliced-array offset handling, missing validity bitmap handling, and unaligned list/map offset validation. I did not re-submit those as duplicate comments.

Critical checkpoints: the PR goal is still Arrow input-buffer validation before Doris column conversion; the implementation is focused on BE serde/config/test/benchmark code; no FE optimizer/rewrite, transaction, storage-format, or concurrency path is changed. The new BE unit test is included by the recursive be/test/CMakeLists.txt glob and git diff --check is clean. The config is a non-mutable BE flag defaulting on; sibling serde coverage remains the key correctness concern and is already tracked in the existing review thread. Parallel Arrow conversion callers and serde families were checked for duplicate suppression.

Test/validation limits: I did not run BE UTs or format checks because this checkout is missing .worktree_initialized, thirdparty/installed, initialized submodules, and the required clang-format v16 toolchain.

Subagent conclusions: optimizer-rewrite found no optimizer/rewrite-specific issue for this BE-only diff. tests-session-config found no new test/config/style issue beyond existing threads and noted the same validation limits. Final convergence round 1 ended with both subagents reporting NO_NEW_VALUABLE_FINDINGS for the current ledger/comment set.

User focus: no additional user-provided review focus was present.

Mryange · 2026-07-01T07:39:28Z

run buildall

hello-stephen · 2026-07-01T08:18:20Z

TPC-H: Total hot run time: 29660 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4c2b695de5b8c973b5eaef1973d1a9f9fed874ce, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17618	4070	3969	3969
q2	1999	340	215	215
q3	10279	1442	835	835
q4	4698	465	345	345
q5	7645	869	568	568
q6	200	171	133	133
q7	807	866	639	639
q8	10210	1542	1567	1542
q9	6106	4440	4401	4401
q10	6842	1800	1534	1534
q11	517	350	314	314
q12	718	550	441	441
q13	18080	3416	2743	2743
q14	266	266	250	250
q15	q16	794	792	720	720
q17	1050	1094	988	988
q18	6768	5897	5575	5575
q19	1197	1317	1116	1116
q20	814	665	529	529
q21	5630	2684	2501	2501
q22	442	366	302	302
Total cold run time: 102680 ms
Total hot run time: 29660 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4396	4304	4278	4278
q2	295	320	210	210
q3	4539	4943	4454	4454
q4	2084	2153	1357	1357
q5	4462	4293	4340	4293
q6	233	177	126	126
q7	2042	1982	1651	1651
q8	2551	2240	2150	2150
q9	7931	7858	7914	7858
q10	4838	4811	4353	4353
q11	565	421	476	421
q12	741	752	536	536
q13	3353	3655	2992	2992
q14	330	313	269	269
q15	q16	721	715	634	634
q17	1384	1333	1365	1333
q18	8120	7454	7021	7021
q19	1124	1108	1107	1107
q20	2220	2220	1965	1965
q21	5249	4567	4389	4389
q22	520	477	426	426
Total cold run time: 57698 ms
Total hot run time: 51823 ms

hello-stephen · 2026-07-01T08:29:32Z

TPC-DS: Total hot run time: 174428 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4c2b695de5b8c973b5eaef1973d1a9f9fed874ce, data reload: false

query5	4344	647	499	499
query6	467	227	204	204
query7	4872	581	333	333
query8	342	193	174	174
query9	8739	4088	4028	4028
query10	454	354	311	311
query11	5984	2376	2161	2161
query12	161	105	109	105
query13	1252	624	424	424
query14	6320	5365	4959	4959
query14_1	4303	4342	4257	4257
query15	217	208	181	181
query16	1072	493	447	447
query17	1096	730	566	566
query18	2653	487	344	344
query19	206	190	151	151
query20	110	107	108	107
query21	236	152	131	131
query22	13783	13598	13430	13430
query23	17394	16510	16235	16235
query23_1	16265	16259	16368	16259
query24	7502	1776	1308	1308
query24_1	1330	1304	1285	1285
query25	545	450	400	400
query26	1342	371	224	224
query27	2574	610	403	403
query28	4463	2083	2053	2053
query29	1105	640	508	508
query30	343	274	231	231
query31	1126	1113	990	990
query32	114	65	63	63
query33	542	334	255	255
query34	1166	1180	661	661
query35	781	780	687	687
query36	1412	1413	1185	1185
query37	152	107	99	99
query38	1896	1729	1672	1672
query39	916	927	902	902
query39_1	894	892	892	892
query40	245	166	146	146
query41	72	71	69	69
query42	100	95	94	94
query43	318	323	278	278
query44	1460	808	786	786
query45	204	192	179	179
query46	1084	1186	760	760
query47	2352	2361	2220	2220
query48	383	431	284	284
query49	588	429	335	335
query50	1051	426	347	347
query51	4481	4450	4386	4386
query52	89	88	74	74
query53	270	284	210	210
query54	290	243	251	243
query55	77	73	69	69
query56	302	318	327	318
query57	1429	1409	1325	1325
query58	291	261	247	247
query59	1584	1619	1426	1426
query60	325	281	267	267
query61	178	175	178	175
query62	714	660	600	600
query63	254	214	226	214
query64	2448	747	631	631
query65	4821	4817	4780	4780
query66	1728	502	390	390
query67	29777	29711	29591	29591
query68	3270	1589	935	935
query69	414	299	274	274
query70	1078	976	954	954
query71	360	321	328	321
query72	3024	2613	2329	2329
query73	861	763	455	455
query74	5132	4971	4778	4778
query75	2619	2625	2232	2232
query76	2357	1214	809	809
query77	362	385	288	288
query78	12539	12525	11869	11869
query79	1497	1166	783	783
query80	1283	553	455	455
query81	558	322	289	289
query82	608	157	123	123
query83	372	321	288	288
query84	280	165	131	131
query85	975	613	541	541
query86	424	284	296	284
query87	1831	1851	1760	1760
query88	3782	2859	2814	2814
query89	466	408	359	359
query90	1883	202	194	194
query91	205	190	165	165
query92	64	64	62	62
query93	1740	1498	1014	1014
query94	732	369	306	306
query95	797	574	469	469
query96	1075	863	376	376
query97	2689	2694	2557	2557
query98	215	204	206	204
query99	1199	1151	1019	1019
Total cold run time: 260312 ms
Total hot run time: 174428 ms

hello-stephen · 2026-07-01T08:34:23Z

ClickBench: Total hot run time: 25.29 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4c2b695de5b8c973b5eaef1973d1a9f9fed874ce, data reload: false

query1	0.01	0.01	0.00
query2	0.09	0.04	0.05
query3	0.26	0.14	0.14
query4	1.60	0.13	0.13
query5	0.26	0.24	0.22
query6	1.33	1.08	1.06
query7	0.03	0.01	0.01
query8	0.06	0.04	0.04
query9	0.38	0.31	0.32
query10	0.57	0.55	0.59
query11	0.20	0.14	0.14
query12	0.18	0.15	0.14
query13	0.48	0.47	0.48
query14	1.01	1.01	1.01
query15	0.61	0.60	0.59
query16	0.32	0.32	0.32
query17	1.08	1.10	1.08
query18	0.22	0.20	0.21
query19	1.98	1.96	2.02
query20	0.02	0.01	0.02
query21	15.41	0.21	0.15
query22	4.71	0.04	0.05
query23	16.10	0.33	0.11
query24	3.10	0.44	0.35
query25	0.11	0.06	0.05
query26	0.74	0.21	0.15
query27	0.05	0.04	0.03
query28	3.57	0.90	0.53
query29	12.48	4.36	3.49
query30	0.28	0.15	0.16
query31	2.77	0.61	0.33
query32	3.22	0.60	0.49
query33	3.21	3.18	3.22
query34	15.64	4.23	3.52
query35	3.52	3.51	3.51
query36	0.55	0.42	0.44
query37	0.09	0.07	0.07
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.17	0.16	0.15
query41	0.09	0.04	0.04
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 96.65 s
Total hot run time: 25.29 s

hello-stephen · 2026-07-01T11:16:22Z

TPC-H: Total hot run time: 29885 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 94418b2720a033ffda4e5f8c6dedb0d68ad34942, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17868	4036	4027	4027
q2	2046	322	198	198
q3	10251	1431	849	849
q4	4679	469	340	340
q5	7560	875	569	569
q6	180	170	134	134
q7	792	853	626	626
q8	9342	1689	1663	1663
q9	5796	4457	4489	4457
q10	6800	1809	1538	1538
q11	511	350	326	326
q12	708	566	446	446
q13	18097	3401	2780	2780
q14	265	265	247	247
q15	q16	789	788	702	702
q17	954	965	937	937
q18	7035	5774	5661	5661
q19	1157	1323	1083	1083
q20	787	678	546	546
q21	5770	2723	2443	2443
q22	443	372	313	313
Total cold run time: 101830 ms
Total hot run time: 29885 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4323	4261	4267	4261
q2	294	324	212	212
q3	4635	4942	4404	4404
q4	2082	2203	1380	1380
q5	4470	4303	4336	4303
q6	242	179	130	130
q7	1752	2192	1752	1752
q8	2571	2185	2227	2185
q9	7985	8178	7718	7718
q10	4903	4777	4290	4290
q11	598	410	378	378
q12	778	763	543	543
q13	3209	3525	3013	3013
q14	315	312	271	271
q15	q16	700	726	671	671
q17	1346	1319	1353	1319
q18	7924	7391	7372	7372
q19	1155	1144	1101	1101
q20	2243	2201	1927	1927
q21	5282	4620	4441	4441
q22	505	453	396	396
Total cold run time: 57312 ms
Total hot run time: 52067 ms

hello-stephen · 2026-07-01T11:27:22Z

TPC-DS: Total hot run time: 173699 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 94418b2720a033ffda4e5f8c6dedb0d68ad34942, data reload: false

query5	4299	632	494	494
query6	475	231	196	196
query7	4831	609	341	341
query8	339	194	169	169
query9	8792	4019	4023	4019
query10	469	356	307	307
query11	5899	2349	2167	2167
query12	153	103	106	103
query13	1283	609	431	431
query14	6229	5352	4971	4971
query14_1	4277	4276	4264	4264
query15	209	200	176	176
query16	1011	481	457	457
query17	907	695	588	588
query18	2427	473	335	335
query19	206	193	156	156
query20	111	109	106	106
query21	243	155	133	133
query22	13679	13714	13367	13367
query23	17485	16522	16081	16081
query23_1	16282	16315	16115	16115
query24	7582	1785	1313	1313
query24_1	1331	1276	1274	1274
query25	569	442	371	371
query26	1334	359	204	204
query27	2607	641	388	388
query28	4530	2045	2044	2044
query29	1096	633	502	502
query30	343	268	227	227
query31	1132	1098	989	989
query32	130	64	63	63
query33	556	323	276	276
query34	1195	1135	658	658
query35	785	793	669	669
query36	1430	1414	1225	1225
query37	172	114	95	95
query38	1898	1696	1653	1653
query39	930	921	914	914
query39_1	912	888	887	887
query40	251	175	148	148
query41	71	71	68	68
query42	97	94	96	94
query43	326	329	280	280
query44	1466	795	782	782
query45	215	194	176	176
query46	1057	1214	757	757
query47	2366	2404	2207	2207
query48	375	396	299	299
query49	593	439	326	326
query50	1015	445	347	347
query51	4542	4436	4388	4388
query52	89	91	80	80
query53	274	287	247	247
query54	284	245	207	207
query55	74	69	67	67
query56	295	289	267	267
query57	1431	1433	1348	1348
query58	287	252	251	251
query59	1586	1627	1421	1421
query60	299	269	241	241
query61	156	152	151	151
query62	711	642	569	569
query63	241	214	202	202
query64	2528	755	585	585
query65	4877	4782	4782	4782
query66	1826	526	377	377
query67	29692	29512	29379	29379
query68	3242	1468	1019	1019
query69	409	314	270	270
query70	1065	990	977	977
query71	350	344	325	325
query72	2893	2640	2339	2339
query73	852	770	421	421
query74	5118	4960	4761	4761
query75	2636	2575	2251	2251
query76	2352	1210	818	818
query77	342	372	290	290
query78	12494	12449	11864	11864
query79	1270	1128	759	759
query80	648	563	462	462
query81	459	326	285	285
query82	238	154	122	122
query83	406	317	295	295
query84	326	161	130	130
query85	922	591	499	499
query86	322	290	273	273
query87	1823	1816	1757	1757
query88	3758	2833	2788	2788
query89	427	415	358	358
query90	2205	196	210	196
query91	205	183	167	167
query92	64	61	56	56
query93	1601	1659	1066	1066
query94	543	354	320	320
query95	792	521	579	521
query96	1106	810	358	358
query97	2722	2699	2579	2579
query98	228	208	212	208
query99	1184	1174	1019	1019
Total cold run time: 258452 ms
Total hot run time: 173699 ms

hello-stephen · 2026-07-01T11:32:18Z

ClickBench: Total hot run time: 25.18 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 94418b2720a033ffda4e5f8c6dedb0d68ad34942, data reload: false

query1	0.00	0.00	0.00
query2	0.10	0.05	0.05
query3	0.26	0.12	0.14
query4	1.61	0.17	0.13
query5	0.26	0.22	0.22
query6	1.26	1.06	1.04
query7	0.04	0.01	0.01
query8	0.06	0.03	0.04
query9	0.39	0.32	0.32
query10	0.55	0.54	0.55
query11	0.20	0.15	0.14
query12	0.18	0.14	0.15
query13	0.48	0.49	0.47
query14	1.01	1.01	1.01
query15	0.64	0.58	0.61
query16	0.32	0.30	0.32
query17	1.12	1.09	1.11
query18	0.23	0.20	0.22
query19	2.04	1.89	1.95
query20	0.01	0.01	0.01
query21	15.40	0.21	0.13
query22	4.90	0.05	0.06
query23	16.13	0.32	0.12
query24	3.02	0.40	0.35
query25	0.15	0.06	0.04
query26	0.76	0.21	0.15
query27	0.05	0.04	0.05
query28	3.51	0.95	0.55
query29	12.48	4.40	3.48
query30	0.27	0.15	0.15
query31	2.76	0.64	0.31
query32	3.23	0.60	0.48
query33	3.18	3.20	3.20
query34	15.64	4.21	3.55
query35	3.55	3.52	3.53
query36	0.56	0.42	0.42
query37	0.09	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.02	0.03
query40	0.18	0.16	0.15
query41	0.08	0.02	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 96.88 s
Total hot run time: 25.18 s

hello-stephen · 2026-07-01T18:52:22Z

BE Regression && UT Coverage Report

Increment line coverage 82.62% (252/305) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	72.51% (27928/38515)
Line Coverage	55.89% (299747/536327)
Region Coverage	52.61% (250001/475235)
Branch Coverage	53.70% (108399/201861)

Mryange · 2026-07-02T11:58:11Z

run buildall

hello-stephen · 2026-07-02T13:55:25Z

TPC-H: Total hot run time: 29874 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fe4fede1646dd3e4f3f618d270d6eff1af85e19b, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17603	4029	4004	4004
q2	2074	331	215	215
q3	10457	1520	866	866
q4	4688	472	342	342
q5	7583	877	587	587
q6	188	168	135	135
q7	805	873	622	622
q8	9356	1688	1620	1620
q9	5801	4418	4420	4418
q10	6794	1797	1551	1551
q11	514	361	333	333
q12	710	578	437	437
q13	18121	3460	2782	2782
q14	267	263	240	240
q15	q16	795	781	710	710
q17	1068	1021	988	988
q18	7048	5689	5694	5689
q19	1318	1401	1126	1126
q20	765	679	555	555
q21	5928	2652	2354	2354
q22	444	360	300	300
Total cold run time: 102327 ms
Total hot run time: 29874 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4387	4301	4278	4278
q2	282	319	210	210
q3	4574	4967	4430	4430
q4	2077	2132	1370	1370
q5	4464	4309	4330	4309
q6	228	173	129	129
q7	1811	2094	1686	1686
q8	2536	2221	2187	2187
q9	8037	8044	7871	7871
q10	4781	4760	4267	4267
q11	582	430	390	390
q12	767	782	599	599
q13	3279	3643	2865	2865
q14	304	289	268	268
q15	q16	718	734	640	640
q17	1359	1329	1342	1329
q18	7829	7257	7237	7237
q19	1162	1157	1122	1122
q20	2196	2200	1927	1927
q21	5242	4553	4446	4446
q22	516	469	395	395
Total cold run time: 57131 ms
Total hot run time: 51955 ms

hello-stephen · 2026-07-02T14:06:26Z

TPC-DS: Total hot run time: 173516 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fe4fede1646dd3e4f3f618d270d6eff1af85e19b, data reload: false

query5	4333	663	485	485
query6	468	225	197	197
query7	4867	599	340	340
query8	341	197	177	177
query9	8777	4044	4123	4044
query10	456	377	307	307
query11	5946	2361	2173	2173
query12	169	105	102	102
query13	1258	610	430	430
query14	6223	5371	5005	5005
query14_1	4292	4279	4295	4279
query15	212	205	183	183
query16	1037	466	453	453
query17	959	735	597	597
query18	2450	487	358	358
query19	234	196	155	155
query20	140	108	105	105
query21	235	166	136	136
query22	13714	13651	13461	13461
query23	17476	16565	15958	15958
query23_1	16186	16165	16247	16165
query24	7443	1776	1294	1294
query24_1	1308	1313	1313	1313
query25	568	463	408	408
query26	1328	357	213	213
query27	2614	590	379	379
query28	4418	2034	2012	2012
query29	1118	648	514	514
query30	353	263	221	221
query31	1124	1098	993	993
query32	144	65	63	63
query33	531	326	258	258
query34	1218	1144	629	629
query35	789	787	676	676
query36	1428	1414	1211	1211
query37	156	117	94	94
query38	1907	1703	1667	1667
query39	929	963	896	896
query39_1	874	868	880	868
query40	248	163	143	143
query41	66	62	61	61
query42	95	93	94	93
query43	316	328	279	279
query44	1438	759	752	752
query45	201	187	180	180
query46	1046	1165	736	736
query47	2388	2366	2230	2230
query48	344	403	298	298
query49	576	420	300	300
query50	1036	436	345	345
query51	4438	4448	4329	4329
query52	86	86	78	78
query53	269	271	200	200
query54	296	224	210	210
query55	72	70	63	63
query56	301	300	302	300
query57	1445	1395	1335	1335
query58	288	254	234	234
query59	1557	1663	1387	1387
query60	310	264	242	242
query61	162	143	149	143
query62	700	647	585	585
query63	249	206	211	206
query64	2544	758	618	618
query65	4872	4798	4785	4785
query66	1847	513	379	379
query67	29691	29591	29382	29382
query68	3234	1620	944	944
query69	431	294	270	270
query70	1067	959	963	959
query71	396	325	307	307
query72	2864	2665	2312	2312
query73	896	779	438	438
query74	5120	4977	4733	4733
query75	2594	2583	2230	2230
query76	2304	1198	785	785
query77	337	374	292	292
query78	12622	12498	11916	11916
query79	1418	1208	784	784
query80	984	537	467	467
query81	499	325	279	279
query82	1161	161	125	125
query83	403	314	290	290
query84	331	168	130	130
query85	988	580	518	518
query86	435	283	291	283
query87	1831	1806	1767	1767
query88	3684	2785	2792	2785
query89	451	410	356	356
query90	1914	198	193	193
query91	204	193	160	160
query92	65	59	57	57
query93	1619	1527	964	964
query94	700	379	313	313
query95	784	596	480	480
query96	1078	803	337	337
query97	2703	2723	2568	2568
query98	222	204	199	199
query99	1178	1162	1026	1026
Total cold run time: 259766 ms
Total hot run time: 173516 ms

hello-stephen · 2026-07-02T14:11:22Z

ClickBench: Total hot run time: 25.22 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fe4fede1646dd3e4f3f618d270d6eff1af85e19b, data reload: false

query1	0.01	0.01	0.00
query2	0.11	0.05	0.06
query3	0.25	0.14	0.13
query4	1.60	0.13	0.14
query5	0.24	0.22	0.23
query6	1.20	1.04	1.06
query7	0.04	0.01	0.01
query8	0.06	0.04	0.03
query9	0.39	0.32	0.33
query10	0.59	0.56	0.54
query11	0.21	0.15	0.14
query12	0.19	0.14	0.15
query13	0.48	0.47	0.48
query14	1.02	1.00	1.00
query15	0.63	0.59	0.59
query16	0.32	0.31	0.31
query17	1.15	1.12	1.08
query18	0.22	0.20	0.21
query19	2.02	1.97	1.92
query20	0.02	0.01	0.01
query21	15.54	0.22	0.13
query22	4.85	0.05	0.06
query23	16.11	0.30	0.12
query24	2.97	0.43	0.33
query25	0.11	0.05	0.04
query26	0.75	0.21	0.16
query27	0.05	0.04	0.04
query28	3.51	0.89	0.57
query29	12.46	4.28	3.46
query30	0.28	0.15	0.15
query31	2.77	0.58	0.32
query32	3.22	0.59	0.49
query33	3.24	3.20	3.19
query34	15.56	4.24	3.53
query35	3.51	3.53	3.49
query36	0.64	0.47	0.44
query37	0.08	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.16
query41	0.09	0.03	0.02
query42	0.04	0.04	0.04
query43	0.05	0.04	0.03
Total cold run time: 96.86 s
Total hot run time: 25.22 s

hello-stephen · 2026-07-02T15:27:14Z

BE UT Coverage Report

Increment line coverage 82.62% (252/305) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	56.27% (22939/40763)
Line Coverage	39.93% (222991/558490)
Region Coverage	35.91% (176219/490699)
Branch Coverage	36.99% (78348/211829)

hello-stephen · 2026-07-02T15:51:18Z

BE Regression && UT Coverage Report

Increment line coverage 82.62% (252/305) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.04% (29060/39788)
Line Coverage	56.62% (314388/555259)
Region Coverage	53.19% (262106/492766)
Branch Coverage	54.22% (114976/212066)

Mryange marked this pull request as ready for review June 29, 2026 08:13

Mryange force-pushed the arrow-validation-benchmark branch from 99899b8 to ccada0f Compare June 30, 2026 02:03

github-actions Bot requested changes Jun 30, 2026

View reviewed changes

Mryange force-pushed the arrow-validation-benchmark branch from 24f208f to 633ae15 Compare June 30, 2026 05:05

upd

72d3d2c

Mryange force-pushed the arrow-validation-benchmark branch from 633ae15 to 72d3d2c Compare June 30, 2026 05:07

github-actions Bot requested changes Jun 30, 2026

View reviewed changes

upd

4c2b695

github-actions Bot reviewed Jul 1, 2026

View reviewed changes

fix unaligned_load

94418b2

github-actions Bot reviewed Jul 1, 2026

View reviewed changes

Merge branch 'master' into arrow-validation-benchmark

fe4fede

Uh oh!

Conversation

Mryange commented Jun 24, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Jun 24, 2026

Uh oh!

Mryange commented Jun 30, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mryange commented Jun 30, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Mryange commented Jun 30, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Mryange commented Jul 1, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Mryange commented Jul 1, 2026

Uh oh!

Mryange commented Jul 1, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Mryange commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

BE Regression && UT Coverage Report

Uh oh!

Mryange commented Jul 2, 2026

Uh oh!

hello-stephen commented Jul 2, 2026

Uh oh!

hello-stephen commented Jul 2, 2026

Uh oh!

hello-stephen commented Jul 2, 2026

github-actions Bot Jun 30, 2026 •

edited

Loading