branch-4.1 : [Refactor](Variant) add NestedGroup path metadata support#62782
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Syncs VARIANT nested-related changes into branch-4.1 while keeping variant_enable_nested_group unsupported at the FE layer, and updates tests/BE behavior to match the 4.1 expectations.
Changes:
- Adds FE tests to ensure
variant_enable_nested_groupis rejected and not serialized intoSql(). - Refines BE VARIANT read/compaction/indexing logic (including nested-group/doc-mode interactions) and extends BE test coverage.
- Removes verbose FE logging and adjusts a VARIANT property range check.
Reviewed changes
Copilot reviewed 24 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| fe/fe-core/src/test/java/org/apache/doris/nereids/parser/NereidsParserTest.java | Adds parser test asserting nested-group VARIANT property is rejected in 4.1. |
| fe/fe-core/src/test/java/org/apache/doris/catalog/TypeTest.java | Adds test ensuring catalog VariantType.toSql() does not serialize unsupported nested-group property. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/types/VariantType.java | Updates Nereids VARIANT SQL serialization and adds nested-group getter. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java | Adds fallback to derive inverted index from translated slot during search translation. |
| fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java | Adjusts variant_sparse_hash_shard_count validation to allow 0. |
| fe/fe-core/src/main/java/org/apache/doris/analysis/SearchPredicate.java | Removes debug/info logging from thrift translation. |
| fe/fe-common/src/main/java/org/apache/doris/catalog/ScalarType.java | Minor whitespace-only change. |
| be/test/storage/segment/variant_column_writer_reader_test.cpp | Updates VARIANT writer/reader tests; adds explicit doc-compact writer roundtrip scenario. |
| be/test/storage/segment/nested_group_provider_test.cpp | Updates ColumnVariant::create usage to new overload. |
| be/test/storage/segment/hierarchical_data_iterator_test.cpp | Formatting-only change. |
| be/src/storage/tablet/tablet_schema.h | Adds helpers to detect materialized regular paths and safely probe path-set info. |
| be/src/storage/segment/variant/variant_streaming_compaction_writer.cpp | Avoids duplicate writers for already-materialized regular paths during streaming compaction. |
| be/src/storage/segment/variant/variant_doc_snpashot_compact_iterator.h | Formatting-only change. |
| be/src/storage/segment/variant/variant_column_writer_impl.cpp | Refactors streaming-compaction enablement check (no functional change intended). |
| be/src/storage/segment/variant/variant_column_reader.cpp | Adds iterator owner-lifetime wrapper; improves doc-mode hierarchical planning and NG index path/type handling. |
| be/src/storage/segment/segment_iterator.cpp | Uses extracted-column path (when present) for variant index field_name generation. |
| be/src/storage/segment/column_writer.cpp | Fixes/optimizes offset writing for array/map null appends; clarifies offset contract comment. |
| be/src/storage/rowset/vertical_beta_rowset_writer.h | Formatting-only change. |
| be/src/storage/index/index_writer.cpp | Uses extracted-column path (when present) for variant index field_name generation. |
| be/src/storage/compaction/compaction.cpp | Propagates input rowset readers into writer context for nested-group streaming compaction. |
| be/src/exprs/function/function_variant_element.cpp | Restores 4.1 behavior: non-doc-mode extraction writes sparse data; doc-mode writes doc_value. |
| be/src/exprs/function/cast/cast_to_variant.h | Improves casting from nullable VARIANT and nullable wrapping/unwrapping behavior. |
| be/src/exec/common/variant_util.cpp | Adds/uses nested-group compaction materialization planning and preserves existing extracted columns. |
| be/src/core/data_type/data_type_variant.h | Adds convenience ctor DataTypeVariant(int32_t) defaulting doc mode off. |
| be/src/core/data_type/data_type_variant.cpp | Implements new convenience ctor and formatting-only change. |
| be/src/core/column/column_variant.h | Adds new ctors/overloads and root-visibility helper for nested JSON. |
| be/src/core/column/column_variant.cpp | Refines NG conflict handling, restores root-only copy fast path, and improves “empty nested JSON” detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (enableNestedGroup) { | ||
| sb.append(","); | ||
| sb.append("\"variant_enable_nested_group\" = \"") | ||
| .append(String.valueOf(enableNestedGroup)).append("\""); | ||
| } |
There was a problem hiding this comment.
org.apache.doris.catalog.VariantType#toSql() intentionally does not serialize variant_enable_nested_group (see fe-common/src/main/java/org/apache/doris/catalog/VariantType.java:155-190, and the new TypeTest asserting this). But Nereids VariantType#toSql() now appends "variant_enable_nested_group" when enableNestedGroup is true, which can emit DDL that FE will reject in 4.1 (LogicalPlanBuilder throws NotSupportedException when nested group is enabled). Consider removing this property from Nereids toSql() as well (or gating it behind an explicit “nested group supported” flag) to keep serialization consistent and avoid generating unsupported SQL.
| if (enableNestedGroup) { | |
| sb.append(","); | |
| sb.append("\"variant_enable_nested_group\" = \"") | |
| .append(String.valueOf(enableNestedGroup)).append("\""); | |
| } |
| if (bucketNum < 1 || bucketNum > 1024) { | ||
| throw new AnalysisException("variant_sparse_hash_shard_count must between 1 and 1024 "); | ||
| if (bucketNum < 0 || bucketNum > 1024) { | ||
| throw new AnalysisException("variant_sparse_hash_shard_count must between 0 and 1024 "); |
There was a problem hiding this comment.
The updated error message has a trailing space and is grammatically incorrect ("must between"). Since this message is user-facing, consider changing it to something like "variant_sparse_hash_shard_count must be between 0 and 1024" (no trailing whitespace).
| throw new AnalysisException("variant_sparse_hash_shard_count must between 0 and 1024 "); | |
| throw new AnalysisException("variant_sparse_hash_shard_count must be between 0 and 1024"); |
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
89790c9 to
ec240ac
Compare
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
7d9419c to
3e0ce01
Compare
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
3e0ce01 to
54fee65
Compare
Keep branch-4.1 aligned with the selected modified files from 232cf720b while preserving the unsupported NestedGroup gate. Add targeted FE tests for the 4.1 behavior. [fix](variant) align inherited subcolumn index path refactor nested group inverted path fix test
54fee65 to
b00b8a7
Compare
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
cherry-pick #62848