Skip to content

[Chore](be) Check column sanity across operator outputs#63518

Open
BiteTheDDDDt wants to merge 1 commit into
apache:masterfrom
BiteTheDDDDt:fix-column-sanity-check
Open

[Chore](be) Check column sanity across operator outputs#63518
BiteTheDDDDt wants to merge 1 commit into
apache:masterfrom
BiteTheDDDDt:fix-column-sanity-check

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Invalid column internals, such as a nullable column whose nested column size differs from its null map size, may only fail later when a downstream operator reads nested data. This change runs column sanity checks at pipeline operator output boundaries, including projection outputs, so malformed columns are reported closer to their producer. It also keeps targeted nested loop join lazy materialization sanity checks at the source-copy points that exposed the issue.

Release note

None

Check List (For Author)

  • Test: Manual test
    • build-support/clang-format.sh
    • build-support/check-format.sh
    • git diff --check
    • ninja -C be/build_Release src/core/CMakeFiles/Core.dir/block/block.cpp.o src/core/CMakeFiles/Core.dir/column/column_string.cpp.o src/exec/CMakeFiles/Exec.dir/operator/operator.cpp.o src/exec/CMakeFiles/Exec.dir/operator/nested_loop_join_probe_operator.cpp.o
  • Behavior changed: No
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Invalid column internals, such as a nullable column whose nested column size differs from its null map size, may only fail later when a downstream operator reads the nested data. Run column sanity checks at pipeline operator output boundaries, including projection outputs, so malformed columns are reported closer to their producer. Also keep targeted nested loop join lazy materialization checks at the source-copy points that exposed the issue.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - build-support/clang-format.sh
    - build-support/check-format.sh
    - git diff --check
    - ninja -C be/build_Release src/core/CMakeFiles/Core.dir/block/block.cpp.o src/core/CMakeFiles/Core.dir/column/column_string.cpp.o src/exec/CMakeFiles/Exec.dir/operator/operator.cpp.o src/exec/CMakeFiles/Exec.dir/operator/nested_loop_join_probe_operator.cpp.o
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 22, 2026 04:05
@BiteTheDDDDt BiteTheDDDDt changed the title [fix](be) Check column sanity across operator outputs [Chore](be) Check column sanity across operator outputs May 22, 2026
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to surface malformed column internals (e.g., mismatched nested column sizes / null maps) earlier by running column sanity checks at pipeline operator output boundaries (including after projections), and by adding additional checks at nested loop join lazy materialization copy points.

Changes:

  • Add Block::check_type_and_column() calls at OperatorXBase::get_block_after_projects() outputs (both projected and non-projected paths).
  • Make Block::check_type_and_column() run IColumn::sanity_check() in non-debug builds (while keeping type compatibility checks debug-only).
  • Add explicit sanity_check() calls during nested loop join lazy materialization column copying/filtering, and adjust ColumnStr sanity checking behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
be/src/exec/operator/operator.cpp Runs block sanity/type checks at operator output boundaries (including projection outputs).
be/src/exec/operator/nested_loop_join_probe_operator.cpp Adds sanity checks at lazy materialization source-copy/filter points in nested loop join.
be/src/core/column/column_string.h Makes sanity_check_simple() always active (previously debug-only).
be/src/core/column/column_string.cpp Ensures ColumnStr::sanity_check() performs basic checks outside NDEBUG.
be/src/core/block/block.cpp Runs column->sanity_check() in check_type_and_column() for release builds, with exception-to-Status conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

constexpr int8_t MARK_NULL = -1;

ColumnPtr make_const_column_from_row(const ColumnWithTypeAndName& source, size_t row, size_t rows) {
source.column->sanity_check();
Comment on lines 53 to 56
void append_many_from_source(MutableColumnPtr& dst_column, const ColumnWithTypeAndName& src_column,
size_t row, size_t rows) {
src_column.column->sanity_check();
if (!src_column.column->is_nullable() && dst_column->is_nullable()) {
Comment on lines 66 to 75
void append_filtered_from_source(MutableColumnPtr& dst_column,
const ColumnWithTypeAndName& src_column,
const IColumn::Filter& filter, size_t selected_rows) {
if (selected_rows == 0) {
return;
}
src_column.column->sanity_check();
auto filtered_column = src_column.column->filter(filter, selected_rows);
filtered_column->sanity_check();
if (!src_column.column->is_nullable() && dst_column->is_nullable()) {
Comment on lines 119 to 130
@@ -128,7 +127,6 @@ class ColumnStr final : public COWHelper<IColumn, ColumnStr<T>> {
if (offsets[-1] != 0) {
throw Exception(Status::InternalError("wrong offsets[-1]: {}", offsets[-1]));
}
#endif
}
Comment on lines +417 to 422
RETURN_IF_ERROR(block->check_type_and_column());
return Status::OK();
}
status = get_block(state, block, eos);
RETURN_IF_ERROR(status);
RETURN_IF_ERROR(block->check_type_and_column());
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR diff and the existing inline review context. I did not add duplicate inline comments, but the already-open review threads identify blocking issues that should be addressed before merge.

Critical checkpoint conclusions:

  • Goal/test coverage: the PR appears to move selected column sanity checks into release/operator execution paths. Existing tests for these new runtime validation paths were not added in the PR diff.
  • Scope/clarity: the code change is small, but it expands validation into hot operator/string access paths.
  • Concurrency/lifecycle/config/compatibility: no new concurrency, lifecycle, config, wire protocol, or storage-format concerns found in the reviewed diff.
  • Error handling: existing inline comments already cover unconverted sanity_check() exceptions in nested loop join helper paths.
  • Parallel paths/performance: existing inline comments already cover duplicate output block validation and release-build string accessor overhead.
  • Observability/transactions/data writes/FE-BE variable passing: not applicable to this PR diff.

User focus: no additional user-provided review focus was specified.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31382 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5eaae908e8d423b4b17d7ff4f84787e9d5e3b203, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17607	3913	3890	3890
q2	q3	10763	1480	825	825
q4	4685	479	346	346
q5	7578	2298	2113	2113
q6	248	183	140	140
q7	950	760	660	660
q8	9347	1717	1571	1571
q9	5478	4928	4913	4913
q10	6430	2114	1800	1800
q11	447	270	254	254
q12	636	426	291	291
q13	18122	3456	2803	2803
q14	264	257	234	234
q15	q16	820	779	712	712
q17	932	1026	997	997
q18	6998	5976	5492	5492
q19	1348	1247	1123	1123
q20	617	461	298	298
q21	6179	2928	2598	2598
q22	464	400	322	322
Total cold run time: 99913 ms
Total hot run time: 31382 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4759	4740	4715	4715
q2	q3	4971	5296	4551	4551
q4	2139	2208	1453	1453
q5	4815	4690	4663	4663
q6	236	179	133	133
q7	1903	1724	1596	1596
q8	2396	2096	2141	2096
q9	7778	7577	7180	7180
q10	4458	4429	4063	4063
q11	532	377	354	354
q12	715	726	522	522
q13	3039	3478	2822	2822
q14	292	272	243	243
q15	q16	685	701	606	606
q17	1290	1251	1239	1239
q18	7367	6802	6845	6802
q19	1150	1148	1113	1113
q20	2216	2194	1912	1912
q21	5350	4642	4612	4612
q22	523	480	396	396
Total cold run time: 56614 ms
Total hot run time: 51071 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169023 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5eaae908e8d423b4b17d7ff4f84787e9d5e3b203, data reload: false

query5	4326	670	528	528
query6	324	217	233	217
query7	4255	574	323	323
query8	320	229	219	219
query9	8858	3967	3970	3967
query10	446	336	298	298
query11	5804	2550	2246	2246
query12	182	128	123	123
query13	1264	604	433	433
query14	6015	5359	5060	5060
query14_1	4363	4317	4329	4317
query15	210	207	179	179
query16	1021	489	426	426
query17	941	710	575	575
query18	2438	540	358	358
query19	209	197	158	158
query20	135	128	125	125
query21	214	140	120	120
query22	13670	13652	13406	13406
query23	17275	16468	16061	16061
query23_1	16187	16116	16232	16116
query24	7484	1784	1301	1301
query24_1	1348	1279	1326	1279
query25	576	493	441	441
query26	1318	328	176	176
query27	2741	580	348	348
query28	4532	1966	1983	1966
query29	1028	618	493	493
query30	308	241	202	202
query31	1145	1084	973	973
query32	87	78	71	71
query33	551	357	292	292
query34	1184	1113	636	636
query35	755	777	674	674
query36	1313	1327	1220	1220
query37	156	105	89	89
query38	3219	3214	3091	3091
query39	924	922	899	899
query39_1	887	888	895	888
query40	232	144	122	122
query41	73	63	64	63
query42	112	110	108	108
query43	328	328	283	283
query44	
query45	215	201	202	201
query46	1097	1210	726	726
query47	2261	2341	2144	2144
query48	391	418	286	286
query49	633	497	378	378
query50	987	357	246	246
query51	4312	4345	4266	4266
query52	106	107	93	93
query53	254	273	200	200
query54	311	267	253	253
query55	96	88	90	88
query56	301	308	307	307
query57	1379	1381	1287	1287
query58	289	279	271	271
query59	1587	1651	1447	1447
query60	326	329	307	307
query61	161	156	152	152
query62	673	623	560	560
query63	239	198	208	198
query64	2466	846	690	690
query65	
query66	1779	515	368	368
query67	30293	30253	29348	29348
query68	
query69	481	365	313	313
query70	1025	997	965	965
query71	316	291	279	279
query72	3205	2930	2382	2382
query73	814	799	439	439
query74	5126	4919	4744	4744
query75	2680	2620	2230	2230
query76	2270	1200	775	775
query77	392	422	324	324
query78	12239	12181	11738	11738
query79	1443	987	729	729
query80	866	544	456	456
query81	491	285	247	247
query82	1371	166	123	123
query83	360	278	255	255
query84	310	145	115	115
query85	933	573	448	448
query86	425	346	309	309
query87	3443	3378	3228	3228
query88	3558	2658	2671	2658
query89	452	389	330	330
query90	1789	183	182	182
query91	181	171	144	144
query92	77	83	76	76
query93	1482	1425	894	894
query94	615	379	326	326
query95	682	398	454	398
query96	1035	799	339	339
query97	2706	2682	2580	2580
query98	237	231	224	224
query99	1107	1127	1008	1008
Total cold run time: 254043 ms
Total hot run time: 169023 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (14/14) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27929/37883)
Line Coverage 57.65% (303467/526425)
Region Coverage 54.82% (254016/463365)
Branch Coverage 56.32% (109725/194833)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants