[fix](fe) Fix deep nested complex type subtype validation bypass#63208
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
### What problem does this PR solve? Issue Number: close #DORIS-25584 Problem Summary: In `DataType.validateCatalogDataType()`, all three complex-type branches (ARRAY, MAP, STRUCT) only called `validateNestedType` when the direct child type was `instanceof ScalarType`. When the child was itself a complex type (e.g. a MAP inside an ARRAY), the guard failed and the entire subtree was skipped — so BITMAP/HLL/JSONB/VARIANT used as elements at depth 3+ were silently accepted. Example: `ARRAY<MAP<BITMAP, INT>>` - ARRAY branch checks itemType (MAP) → not ScalarType → SKIP - Inner MAP is never validated → BITMAP as map key accepted silently Fix: remove the `instanceof ScalarType` guard; call `validateNestedType(parent, child)` for all child types regardless of whether they are scalar or complex. Also move the STRUCT duplicate field name check outside the former ScalarType guard so it applies to all field types. ### Release note BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep. ### Check List (For Author) - Test: Regression test / Unit Test - Behavior changed: Yes — depth-3+ nesting with BITMAP/HLL/JSONB/VARIANT now throws AnalysisException instead of being silently accepted - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ec9e37c to
fbdf959
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
Code review completed. I did not find any blocking correctness issues in this PR.
Critical checkpoint conclusions:
- Goal and proof: The change addresses the validation bypass by recursively validating complex child types, and the added unit/regression tests cover direct and depth-3 invalid subtypes plus one valid nested case.
- Scope: The implementation is small and focused on the existing validation path.
- Concurrency and lifecycle: No concurrency, locking, static initialization, or special lifecycle changes are introduced.
- Configuration and compatibility: No new configuration, storage format, protocol, or rolling-upgrade compatibility concern is introduced.
- Parallel paths: The modified Nereids validation path now recurses through ARRAY/MAP/STRUCT children consistently; I did not identify a missed equivalent path in this diff.
- Conditional checks: The removed ScalarType guard matches the intended invariant; unsupported child types are still enforced by supportSubType before recursion.
- Tests: Added FE unit coverage and a regression suite for the reported behavior. I attempted to run
./run-fe-ut.sh org.apache.doris.nereids.types.DataTypeTest, but the runner is missingthirdparty/installed/bin/protoc, so generated source setup failed before the test ran. - Observability: Not applicable; this is analyzer validation logic with existing user-visible AnalysisException messages.
- Transactions/persistence/data writes: Not applicable.
- FE-BE variable passing: Not applicable.
- Performance: The recursive validation is proportional to type tree size and only runs during type validation; no hot-path performance issue found.
- User focus: No additional user-provided review focus was present.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Review result: no blocking issues found. The recursive validation change is focused on the stated deep nested complex subtype bypass, and the added FE unit tests plus regression suite cover both direct and depth-3 rejected BITMAP/HLL/JSON cases and a valid nested complex case.
Critical checkpoint conclusions:
- Goal and proof: The PR addresses recursive validation for complex nested subtypes; tests demonstrate rejected invalid nested subtypes and accepted valid nesting.
- Scope: The GitHub PR file list is small and focused on validation and tests.
- Concurrency/lifecycle/config/compatibility: Not applicable; this is FE analysis-time validation only and adds no new config, persistence, protocol, or runtime lifecycle changes.
- Parallel paths: The recursive path now covers ARRAY, MAP, STRUCT, and VARIANT predefined fields through the shared catalog-type validation helper.
- Testing: FE unit coverage and regression coverage are present. I did not run the full test suite in this review runner.
- Observability/transactions/data writes: Not applicable.
- Performance: No meaningful runtime impact expected; validation recursion is analysis-time only and bounded by existing max nesting depth.
User focus: no additional user-provided review focus was present.
TPC-H: Total hot run time: 29244 ms |
TPC-DS: Total hot run time: 169539 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
) ## Summary Fixes: BITMAP/HLL/JSONB/VARIANT used as ARRAY/MAP/STRUCT sub-elements at depth 3+ were silently accepted instead of being rejected. ## Root cause In `DataType.validateCatalogDataType()`, all three complex-type branches (ARRAY, MAP, STRUCT) only called `validateNestedType` when the direct child type was `instanceof ScalarType`. When the child was itself complex (e.g. a MAP inside an ARRAY), the guard failed and the entire subtree was skipped. ``` ARRAY<MAP<BITMAP, INT>> └─ ARRAY branch: itemType = MAP → not ScalarType → SKIP ← bug └─ MAP<BITMAP, INT> never validated → BITMAP silently accepted ``` Depth-2 nesting was correctly rejected (`ARRAY<BITMAP>` → error), but depth-3+ bypassed the check. ## Fix Remove the `instanceof ScalarType` guard; call `validateNestedType(parent, child)` for **all** child types. Also move the STRUCT duplicate-field-name check outside the former ScalarType guard. ## Tests - 7 new unit tests in `DataTypeTest` covering all four Jira repro cases + regression guards for existing depth-2 rejection + valid deep nesting. - 1 new regression test suite `test_complex_disallowed_subtypes` with 6 `exception` cases and 1 valid-nesting acceptance case. ### What problem does this PR solve? Problem Summary: `validateCatalogDataType` skipped validation of complex child types due to an `instanceof ScalarType` guard, allowing BITMAP/HLL/JSONB/VARIANT at depth 3+ to bypass the sub-type allowlist check. ### Release note BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) ## Summary Fixes: BITMAP/HLL/JSONB/VARIANT used as ARRAY/MAP/STRUCT sub-elements at depth 3+ were silently accepted instead of being rejected. ## Root cause In `DataType.validateCatalogDataType()`, all three complex-type branches (ARRAY, MAP, STRUCT) only called `validateNestedType` when the direct child type was `instanceof ScalarType`. When the child was itself complex (e.g. a MAP inside an ARRAY), the guard failed and the entire subtree was skipped. ``` ARRAY<MAP<BITMAP, INT>> └─ ARRAY branch: itemType = MAP → not ScalarType → SKIP ← bug └─ MAP<BITMAP, INT> never validated → BITMAP silently accepted ``` Depth-2 nesting was correctly rejected (`ARRAY<BITMAP>` → error), but depth-3+ bypassed the check. ## Fix Remove the `instanceof ScalarType` guard; call `validateNestedType(parent, child)` for **all** child types. Also move the STRUCT duplicate-field-name check outside the former ScalarType guard. ## Tests - 7 new unit tests in `DataTypeTest` covering all four Jira repro cases + regression guards for existing depth-2 rejection + valid deep nesting. - 1 new regression test suite `test_complex_disallowed_subtypes` with 6 `exception` cases and 1 valid-nesting acceptance case. ### What problem does this PR solve? Problem Summary: `validateCatalogDataType` skipped validation of complex child types due to an `instanceof ScalarType` guard, allowing BITMAP/HLL/JSONB/VARIANT at depth 3+ to bypass the sub-type allowlist check. ### Release note BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Fixes: BITMAP/HLL/JSONB/VARIANT used as ARRAY/MAP/STRUCT sub-elements at depth 3+ were silently accepted instead of being rejected.
Root cause
In
DataType.validateCatalogDataType(), all three complex-type branches (ARRAY, MAP, STRUCT) only calledvalidateNestedTypewhen the direct child type wasinstanceof ScalarType. When the child was itself complex (e.g. a MAP inside an ARRAY), the guard failed and the entire subtree was skipped.Depth-2 nesting was correctly rejected (
ARRAY<BITMAP>→ error), but depth-3+ bypassed the check.Fix
Remove the
instanceof ScalarTypeguard; callvalidateNestedType(parent, child)for all child types. Also move the STRUCT duplicate-field-name check outside the former ScalarType guard.Tests
DataTypeTestcovering all four Jira repro cases + regression guards for existing depth-2 rejection + valid deep nesting.test_complex_disallowed_subtypeswith 6exceptioncases and 1 valid-nesting acceptance case.What problem does this PR solve?
Problem Summary:
validateCatalogDataTypeskipped validation of complex child types due to aninstanceof ScalarTypeguard, allowing BITMAP/HLL/JSONB/VARIANT at depth 3+ to bypass the sub-type allowlist check.Release note
BITMAP, HLL, JSONB, and VARIANT are now correctly rejected as ARRAY/MAP/STRUCT sub-elements even when nested 3 or more levels deep.
Check List (For Author)