[feat](Variant) Support NestedGroup public config#64680
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.
Adds public-facing support for Variant NestedGroup configuration, including SQL parsing/serialization updates and build-time module toggles for BE.
Changes:
- Allow
variant_enable_nested_groupproperty in VARIANT type definitions and serialize it back viatoSql(). - Add/adjust parser and type unit tests to validate NestedGroup property handling and conflicts.
- Introduce BE build flags and default provider behavior/tests for NestedGroup module enablement.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/variant_p0/test_variant_search_subcolumn.groovy | Adds context comment documenting the Variant SEARCH subcolumn binding requirement. |
| fe/fe-type/src/main/java/org/apache/doris/catalog/VariantType.java | Serializes variant_enable_nested_group into VARIANT properties when enabled. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/parser/NereidsParserTest.java | Updates tests to accept NestedGroup property and assert conflict behavior with doc mode. |
| fe/fe-core/src/test/java/org/apache/doris/catalog/TypeTest.java | Updates VariantType.toSql() expectations to include NestedGroup property. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/parser/LogicalPlanBuilder.java | Changes NestedGroup handling from “rejected” to “force-disable other variant options”. |
| build.sh | Adds nested_group feature flag visibility in build feature summary. |
| be/test/storage/segment/nested_group_provider_test.cpp | Strengthens default write-provider negative-path coverage and expected errors/messages. |
| be/test/CMakeLists.txt | Adds conditional inclusion of NestedGroup UT sources when module is enabled. |
| be/src/storage/segment/variant/nested_group_provider.cpp | Makes default write provider explicitly return NotSupported instead of OK. |
| be/src/storage/CMakeLists.txt | Swaps in NestedGroup module sources and removes default provider when enabled. |
| be/src/common/config.cpp | Adds variant_nested_group_max_depth and changes default conflict behavior flag. |
| be/CMakeLists.txt | Introduces ENABLE_NESTED_GROUP option and required module dir variable. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (enableNestedGroup) { | ||
| throw new NotSupportedException( | ||
| "variant_enable_nested_group is not supported now"); | ||
| enableVariantDocMode = false; | ||
| variantMaxSubcolumnsCount = 0; | ||
| enableTypedPathsToSparse = false; | ||
| variantMaxSparseColumnStatisticsSize = 0; | ||
| variantSparseHashShardCount = 0; | ||
| } |
| if (ENABLE_NESTED_GROUP) | ||
| file(GLOB_RECURSE NESTED_GROUP_UT_FILES CONFIGURE_DEPENDS | ||
| "${CMAKE_CURRENT_SOURCE_DIR}/${NESTED_GROUP_MODULE_DIR}/*.cpp") | ||
| list(APPEND UT_FILES ${NESTED_GROUP_UT_FILES}) | ||
| endif() |
| // Maximum depth of nested arrays to track with NestedGroup | ||
| // Reserved for future use when NestedGroup expansion moves to storage layer | ||
| // Deeper arrays will be stored as JSONB | ||
| DEFINE_mInt32(variant_nested_group_max_depth, "10"); | ||
| DEFINE_mBool(variant_nested_group_discard_scalar_on_conflict, "true"); |
d8b024b to
ad26bcd
Compare
|
run buildall |
TPC-H: Total hot run time: 29211 ms |
TPC-DS: Total hot run time: 173181 ms |
ClickBench: Total hot run time: 25.22 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ad26bcd to
78c41df
Compare
TPC-H: Total hot run time: 29467 ms |
TPC-DS: Total hot run time: 171864 ms |
ClickBench: Total hot run time: 25.19 s |
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
skip check_coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
skip check_coverage |
|
/review |
|
skip check_coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
/review |
There was a problem hiding this comment.
Review complete. I found three issues that should be fixed before merge.
Critical checkpoint conclusions:
- Goal and tests: the PR makes NestedGroup configuration public, but the current tests do not prove DDL round-trip correctness or the default create-and-write path.
- Scope: the code is small, but FE acceptance and BE provider availability are not aligned.
- Concurrency and lifecycle: no new concurrency, locking, transaction, or lifecycle hazard was found in the changed paths.
- Configuration: the new BE config is missing the normal public declaration; the existing review thread already covers its missing validator.
- Compatibility: nested-group VARIANT SQL serialization can produce DDL that the parser rejects, and default BE builds can accept table metadata they cannot write.
- Parallel paths: FE parser/catalog/thrift/PB propagation and BE default writer paths were checked; the default provider rejects writes.
- Test coverage: parser/provider tests were added, but they skip or miss the failing round-trip and create-vs-insert boundaries.
- User focus: no additional user-provided review focus was supplied.
Subagent conclusions: OR-1 became inline comment M-1; TSC-2 and TSC-3 became inline comments M-2 and M-3; TSC-1 was dismissed as duplicate of existing parser conflict thread discussion_r3450648268. Convergence round 2 ended with both subagents reporting NO_NEW_VALUABLE_FINDINGS for this final comment set.
| sb.append("\"variant_sparse_hash_shard_count\" = \"") | ||
| .append(String.valueOf(Math.max(1, variantSparseHashShardCount))).append("\""); | ||
| } | ||
| if (enableNestedGroup) { |
There was a problem hiding this comment.
When nested group is enabled the parser zeros variantMaxSparseColumnStatisticsSize above, but this catalog toSql() path still emits that value in the non-doc property set before adding variant_enable_nested_group. A table created with VARIANT<PROPERTIES("variant_enable_nested_group" = "true")> will therefore print variant_max_sparse_column_statistics_size = "0", and re-parsing that SQL fails because PropertyAnalyzer.analyzeVariantMaxSparseColumnStatisticsSize rejects explicit values below 1. Please serialize a nested-group property set that omits the disabled sparse/doc properties, or otherwise keeps all emitted values parser-valid, and add a DDL round-trip test for this type.
| if (enableNestedGroup) { | ||
| throw new NotSupportedException( | ||
| "variant_enable_nested_group is not supported now"); | ||
| enableVariantDocMode = false; |
There was a problem hiding this comment.
This now lets FE create a table whose VARIANT column has variant_enable_nested_group=true, but the default BE build still cannot write such a column. FE propagates the flag into tablet schema, and VariantColumnWriterImpl::finalize() calls the nested-group provider whenever the flag is true; the default provider's prepare/prepare_with_built_groups implementations now return NotSupported("NestedGroup write path is not available in this build"). That means a user can successfully create the table and only hit an insert/load failure later, while the new BE tests skip this path when the provider is unavailable. Please either keep DDL/session acceptance gated until the active BE write path is available, provide a safe fallback write behavior, or add an explicit negative end-to-end test that documents this create-vs-insert boundary.
| // Maximum depth of nested arrays to track with NestedGroup | ||
| // Reserved for future use when NestedGroup expansion moves to storage layer | ||
| // Deeper arrays will be stored as JSONB | ||
| DEFINE_mInt32(variant_nested_group_max_depth, "10"); |
There was a problem hiding this comment.
This new config is defined and registered, but it is not declared in config.h next to the other variant configs. DECLARE_mInt32 is what exposes config::... values to BE code outside config.cpp, so any code that tries to use config::variant_nested_group_max_depth through the normal config header will fail to compile even though the runtime config name exists. Please add the matching DECLARE_mInt32(variant_nested_group_max_depth); near the adjacent variant declarations.
cherry-pick #64679