Skip to content

[fix](fe) Fix deep nested complex type subtype validation bypass#63208

Merged
morrySnow merged 1 commit into
apache:masterfrom
morrySnow:doris-25584-complex-type-deep-subtype-validation
May 13, 2026
Merged

[fix](fe) Fix deep nested complex type subtype validation bypass#63208
morrySnow merged 1 commit into
apache:masterfrom
morrySnow:doris-25584-complex-type-deep-subtype-validation

Conversation

@morrySnow
Copy link
Copy Markdown
Contributor

@morrySnow morrySnow commented May 13, 2026

Summary

Fixes: BITMAP/HLL/JSONB/VARIANT used as ARRAY/MAP/STRUCT sub-elements at depth 3+ were silently accepted instead of being rejected.

Root cause

In DataType.validateCatalogDataType(), all three complex-type branches (ARRAY, MAP, STRUCT) only called validateNestedType when the direct child type was instanceof ScalarType. When the child was itself complex (e.g. a MAP inside an ARRAY), the guard failed and the entire subtree was skipped.

ARRAY<MAP<BITMAP, INT>>
  └─ ARRAY branch: itemType = MAP → not ScalarType → SKIP ← bug
       └─ MAP<BITMAP, INT> never validated → BITMAP silently accepted

Depth-2 nesting was correctly rejected (ARRAY<BITMAP> → error), but depth-3+ bypassed the check.

Fix

Remove the instanceof ScalarType guard; call validateNestedType(parent, child) for all child types. Also move the STRUCT duplicate-field-name check outside the former ScalarType guard.

Tests

  • 7 new unit tests in DataTypeTest covering all four Jira repro cases + regression guards for existing depth-2 rejection + valid deep nesting.
  • 1 new regression test suite test_complex_disallowed_subtypes with 6 exception cases and 1 valid-nesting acceptance case.

What problem does this PR solve?

Problem Summary: validateCatalogDataType skipped validation of complex child types due to an instanceof ScalarType guard, allowing BITMAP/HLL/JSONB/VARIANT at depth 3+ to bypass the sub-type allowlist check.

Release note

BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep.

Check List (For Author)

  • Test: Regression test + Unit Test
  • Behavior changed: Yes — depth-3+ nesting with BITMAP/HLL/JSONB/VARIANT now throws AnalysisException instead of being silently accepted
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow
Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow
Copy link
Copy Markdown
Contributor Author

/review

### What problem does this PR solve?

Issue Number: close #DORIS-25584

Problem Summary: In `DataType.validateCatalogDataType()`, all three
complex-type branches (ARRAY, MAP, STRUCT) only called
`validateNestedType` when the direct child type was `instanceof
ScalarType`. When the child was itself a complex type (e.g. a MAP
inside an ARRAY), the guard failed and the entire subtree was skipped —
so BITMAP/HLL/JSONB/VARIANT used as elements at depth 3+ were silently
accepted.

Example: `ARRAY<MAP<BITMAP, INT>>`
- ARRAY branch checks itemType (MAP) → not ScalarType → SKIP
- Inner MAP is never validated → BITMAP as map key accepted silently

Fix: remove the `instanceof ScalarType` guard; call
`validateNestedType(parent, child)` for all child types regardless of
whether they are scalar or complex. Also move the STRUCT duplicate field
name check outside the former ScalarType guard so it applies to all
field types.

### Release note

BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as
ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep.

### Check List (For Author)

- Test: Regression test / Unit Test
- Behavior changed: Yes — depth-3+ nesting with BITMAP/HLL/JSONB/VARIANT
  now throws AnalysisException instead of being silently accepted
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@morrySnow morrySnow force-pushed the doris-25584-complex-type-deep-subtype-validation branch from ec9e37c to fbdf959 Compare May 13, 2026 08:20
@morrySnow
Copy link
Copy Markdown
Contributor Author

/review

@morrySnow
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed. I did not find any blocking correctness issues in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The change addresses the validation bypass by recursively validating complex child types, and the added unit/regression tests cover direct and depth-3 invalid subtypes plus one valid nested case.
  • Scope: The implementation is small and focused on the existing validation path.
  • Concurrency and lifecycle: No concurrency, locking, static initialization, or special lifecycle changes are introduced.
  • Configuration and compatibility: No new configuration, storage format, protocol, or rolling-upgrade compatibility concern is introduced.
  • Parallel paths: The modified Nereids validation path now recurses through ARRAY/MAP/STRUCT children consistently; I did not identify a missed equivalent path in this diff.
  • Conditional checks: The removed ScalarType guard matches the intended invariant; unsupported child types are still enforced by supportSubType before recursion.
  • Tests: Added FE unit coverage and a regression suite for the reported behavior. I attempted to run ./run-fe-ut.sh org.apache.doris.nereids.types.DataTypeTest, but the runner is missing thirdparty/installed/bin/protoc, so generated source setup failed before the test ran.
  • Observability: Not applicable; this is analyzer validation logic with existing user-visible AnalysisException messages.
  • Transactions/persistence/data writes: Not applicable.
  • FE-BE variable passing: Not applicable.
  • Performance: The recursive validation is proportional to type tree size and only runs during type validation; no hot-path performance issue found.
  • User focus: No additional user-provided review focus was present.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found. The recursive validation change is focused on the stated deep nested complex subtype bypass, and the added FE unit tests plus regression suite cover both direct and depth-3 rejected BITMAP/HLL/JSON cases and a valid nested complex case.

Critical checkpoint conclusions:

  • Goal and proof: The PR addresses recursive validation for complex nested subtypes; tests demonstrate rejected invalid nested subtypes and accepted valid nesting.
  • Scope: The GitHub PR file list is small and focused on validation and tests.
  • Concurrency/lifecycle/config/compatibility: Not applicable; this is FE analysis-time validation only and adds no new config, persistence, protocol, or runtime lifecycle changes.
  • Parallel paths: The recursive path now covers ARRAY, MAP, STRUCT, and VARIANT predefined fields through the shared catalog-type validation helper.
  • Testing: FE unit coverage and regression coverage are present. I did not run the full test suite in this review runner.
  • Observability/transactions/data writes: Not applicable.
  • Performance: No meaningful runtime impact expected; validation recursion is analysis-time only and bounded by existing max nesting depth.

User focus: no additional user-provided review focus was present.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29244 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fbdf959e05ecfc65e983724878c9bab6ab84f975, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17617	3827	3835	3827
q2	q3	10710	873	604	604
q4	4663	455	340	340
q5	7442	1323	1130	1130
q6	187	168	140	140
q7	896	951	739	739
q8	9299	1390	1259	1259
q9	5515	5358	5306	5306
q10	6259	2041	1811	1811
q11	459	262	261	261
q12	624	419	289	289
q13	18056	3309	2751	2751
q14	290	283	261	261
q15	q16	899	865	796	796
q17	974	1004	698	698
q18	6459	5714	5519	5519
q19	1276	1295	1002	1002
q20	537	393	264	264
q21	4869	2422	1914	1914
q22	451	409	333	333
Total cold run time: 97482 ms
Total hot run time: 29244 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4782	4617	4824	4617
q2	q3	4656	4755	4167	4167
q4	2106	2158	1394	1394
q5	4937	5057	5283	5057
q6	197	168	135	135
q7	2036	1787	1593	1593
q8	3358	3100	3104	3100
q9	8459	8405	8471	8405
q10	4466	4477	4263	4263
q11	589	425	384	384
q12	705	754	515	515
q13	3218	3583	2943	2943
q14	445	312	265	265
q15	q16	786	811	705	705
q17	1303	1280	1246	1246
q18	7875	7209	7088	7088
q19	1184	1142	1153	1142
q20	2207	2208	1942	1942
q21	6139	5364	4827	4827
q22	527	508	425	425
Total cold run time: 59975 ms
Total hot run time: 54213 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169539 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fbdf959e05ecfc65e983724878c9bab6ab84f975, data reload: false

query5	4309	635	515	515
query6	335	218	203	203
query7	4232	586	305	305
query8	325	233	219	219
query9	8815	4033	4017	4017
query10	450	332	297	297
query11	5814	2418	2256	2256
query12	183	128	125	125
query13	1275	593	433	433
query14	6004	5357	5046	5046
query14_1	4355	4360	4369	4360
query15	211	208	181	181
query16	1030	455	446	446
query17	1159	770	640	640
query18	2709	490	370	370
query19	221	204	168	168
query20	142	131	129	129
query21	214	138	124	124
query22	13587	13605	13315	13315
query23	17143	16292	15993	15993
query23_1	16109	16141	16126	16126
query24	7420	1765	1359	1359
query24_1	1352	1361	1357	1357
query25	581	524	465	465
query26	1329	333	178	178
query27	2723	603	328	328
query28	4420	1961	1937	1937
query29	1023	677	542	542
query30	308	240	204	204
query31	1124	1072	938	938
query32	93	75	73	73
query33	565	368	313	313
query34	1165	1139	687	687
query35	764	795	664	664
query36	1333	1338	1188	1188
query37	151	102	92	92
query38	3214	3135	3038	3038
query39	930	925	905	905
query39_1	868	875	895	875
query40	243	151	134	134
query41	66	63	63	63
query42	108	110	109	109
query43	322	322	282	282
query44	
query45	214	200	193	193
query46	1062	1166	742	742
query47	2341	2339	2230	2230
query48	403	373	292	292
query49	639	530	425	425
query50	718	281	221	221
query51	4342	4282	4158	4158
query52	103	105	94	94
query53	258	281	201	201
query54	309	276	260	260
query55	96	89	84	84
query56	308	304	295	295
query57	1422	1417	1351	1351
query58	294	280	268	268
query59	1500	1555	1386	1386
query60	339	359	316	316
query61	165	164	160	160
query62	670	624	566	566
query63	251	197	202	197
query64	2400	830	670	670
query65	
query66	1699	518	407	407
query67	29922	29255	29823	29255
query68	
query69	466	342	308	308
query70	963	1024	1008	1008
query71	306	277	272	272
query72	2881	2727	2365	2365
query73	835	748	408	408
query74	5071	4918	4716	4716
query75	2755	2643	2381	2381
query76	2312	1130	765	765
query77	428	431	347	347
query78	12959	12957	12290	12290
query79	1513	990	743	743
query80	1002	589	500	500
query81	495	276	234	234
query82	1313	157	120	120
query83	328	280	249	249
query84	264	142	116	116
query85	1026	514	430	430
query86	436	318	342	318
query87	3451	3325	3213	3213
query88	3550	2679	2670	2670
query89	451	380	340	340
query90	1771	181	174	174
query91	179	165	145	145
query92	77	78	71	71
query93	1019	940	552	552
query94	597	348	295	295
query95	671	462	346	346
query96	1022	808	352	352
query97	2714	2691	2565	2565
query98	234	229	240	229
query99	1106	1100	997	997
Total cold run time: 253268 ms
Total hot run time: 169539 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 57.14% (4/7) 🎉
Increment coverage report
Complete coverage report

@morrySnow morrySnow merged commit c02bd65 into apache:master May 13, 2026
32 of 33 checks passed
@morrySnow morrySnow deleted the doris-25584-complex-type-deep-subtype-validation branch May 13, 2026 10:40
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 57.14% (4/7) 🎉
Increment coverage report
Complete coverage report

github-actions Bot pushed a commit that referenced this pull request May 13, 2026
)

## Summary

Fixes: BITMAP/HLL/JSONB/VARIANT used as ARRAY/MAP/STRUCT sub-elements at
depth 3+ were silently accepted instead of being rejected.

## Root cause

In `DataType.validateCatalogDataType()`, all three complex-type branches
(ARRAY, MAP, STRUCT) only called `validateNestedType` when the direct
child type was `instanceof ScalarType`. When the child was itself
complex (e.g. a MAP inside an ARRAY), the guard failed and the entire
subtree was skipped.

```
ARRAY<MAP<BITMAP, INT>>
  └─ ARRAY branch: itemType = MAP → not ScalarType → SKIP ← bug
       └─ MAP<BITMAP, INT> never validated → BITMAP silently accepted
```

Depth-2 nesting was correctly rejected (`ARRAY<BITMAP>` → error), but
depth-3+ bypassed the check.

## Fix

Remove the `instanceof ScalarType` guard; call
`validateNestedType(parent, child)` for **all** child types. Also move
the STRUCT duplicate-field-name check outside the former ScalarType
guard.

## Tests

- 7 new unit tests in `DataTypeTest` covering all four Jira repro cases
+ regression guards for existing depth-2 rejection + valid deep nesting.
- 1 new regression test suite `test_complex_disallowed_subtypes` with 6
`exception` cases and 1 valid-nesting acceptance case.

### What problem does this PR solve?

Problem Summary: `validateCatalogDataType` skipped validation of complex
child types due to an `instanceof ScalarType` guard, allowing
BITMAP/HLL/JSONB/VARIANT at depth 3+ to bypass the sub-type allowlist
check.

### Release note

BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as
ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
github-actions Bot pushed a commit that referenced this pull request May 13, 2026
)

## Summary

Fixes: BITMAP/HLL/JSONB/VARIANT used as ARRAY/MAP/STRUCT sub-elements at
depth 3+ were silently accepted instead of being rejected.

## Root cause

In `DataType.validateCatalogDataType()`, all three complex-type branches
(ARRAY, MAP, STRUCT) only called `validateNestedType` when the direct
child type was `instanceof ScalarType`. When the child was itself
complex (e.g. a MAP inside an ARRAY), the guard failed and the entire
subtree was skipped.

```
ARRAY<MAP<BITMAP, INT>>
  └─ ARRAY branch: itemType = MAP → not ScalarType → SKIP ← bug
       └─ MAP<BITMAP, INT> never validated → BITMAP silently accepted
```

Depth-2 nesting was correctly rejected (`ARRAY<BITMAP>` → error), but
depth-3+ bypassed the check.

## Fix

Remove the `instanceof ScalarType` guard; call
`validateNestedType(parent, child)` for **all** child types. Also move
the STRUCT duplicate-field-name check outside the former ScalarType
guard.

## Tests

- 7 new unit tests in `DataTypeTest` covering all four Jira repro cases
+ regression guards for existing depth-2 rejection + valid deep nesting.
- 1 new regression test suite `test_complex_disallowed_subtypes` with 6
`exception` cases and 1 valid-nesting acceptance case.

### What problem does this PR solve?

Problem Summary: `validateCatalogDataType` skipped validation of complex
child types due to an `instanceof ScalarType` guard, allowing
BITMAP/HLL/JSONB/VARIANT at depth 3+ to bypass the sub-type allowlist
check.

### Release note

BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as
ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yiguolei pushed a commit that referenced this pull request May 15, 2026
… bypass #63208 (#63222)

Cherry-picked from #63208

Co-authored-by: morrySnow <zhangwenxin@selectdb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
morningman pushed a commit that referenced this pull request May 16, 2026
… bypass #63208 (#63223)

Cherry-picked from #63208

Co-authored-by: morrySnow <zhangwenxin@selectdb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.6-merged dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants