Skip to content

branch-4.1 : [Refactor](Variant) add NestedGroup path metadata support#62782

Merged
yiguolei merged 2 commits into
apache:branch-4.1from
eldenmoon:branch-4.1-sync-nested
May 7, 2026
Merged

branch-4.1 : [Refactor](Variant) add NestedGroup path metadata support#62782
yiguolei merged 2 commits into
apache:branch-4.1from
eldenmoon:branch-4.1-sync-nested

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented Apr 24, 2026

cherry-pick #62848

@eldenmoon eldenmoon requested a review from yiguolei as a code owner April 24, 2026 06:17
Copilot AI review requested due to automatic review settings April 24, 2026 06:17
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 24, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Syncs VARIANT nested-related changes into branch-4.1 while keeping variant_enable_nested_group unsupported at the FE layer, and updates tests/BE behavior to match the 4.1 expectations.

Changes:

  • Adds FE tests to ensure variant_enable_nested_group is rejected and not serialized in toSql().
  • Refines BE VARIANT read/compaction/indexing logic (including nested-group/doc-mode interactions) and extends BE test coverage.
  • Removes verbose FE logging and adjusts a VARIANT property range check.

Reviewed changes

Copilot reviewed 24 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
fe/fe-core/src/test/java/org/apache/doris/nereids/parser/NereidsParserTest.java Adds parser test asserting nested-group VARIANT property is rejected in 4.1.
fe/fe-core/src/test/java/org/apache/doris/catalog/TypeTest.java Adds test ensuring catalog VariantType.toSql() does not serialize unsupported nested-group property.
fe/fe-core/src/main/java/org/apache/doris/nereids/types/VariantType.java Updates Nereids VARIANT SQL serialization and adds nested-group getter.
fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java Adds fallback to derive inverted index from translated slot during search translation.
fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java Adjusts variant_sparse_hash_shard_count validation to allow 0.
fe/fe-core/src/main/java/org/apache/doris/analysis/SearchPredicate.java Removes debug/info logging from thrift translation.
fe/fe-common/src/main/java/org/apache/doris/catalog/ScalarType.java Minor whitespace-only change.
be/test/storage/segment/variant_column_writer_reader_test.cpp Updates VARIANT writer/reader tests; adds explicit doc-compact writer roundtrip scenario.
be/test/storage/segment/nested_group_provider_test.cpp Updates ColumnVariant::create usage to new overload.
be/test/storage/segment/hierarchical_data_iterator_test.cpp Formatting-only change.
be/src/storage/tablet/tablet_schema.h Adds helpers to detect materialized regular paths and safely probe path-set info.
be/src/storage/segment/variant/variant_streaming_compaction_writer.cpp Avoids duplicate writers for already-materialized regular paths during streaming compaction.
be/src/storage/segment/variant/variant_doc_snpashot_compact_iterator.h Formatting-only change.
be/src/storage/segment/variant/variant_column_writer_impl.cpp Refactors streaming-compaction enablement check (no functional change intended).
be/src/storage/segment/variant/variant_column_reader.cpp Adds iterator owner-lifetime wrapper; improves doc-mode hierarchical planning and NG index path/type handling.
be/src/storage/segment/segment_iterator.cpp Uses extracted-column path (when present) for variant index field_name generation.
be/src/storage/segment/column_writer.cpp Fixes/optimizes offset writing for array/map null appends; clarifies offset contract comment.
be/src/storage/rowset/vertical_beta_rowset_writer.h Formatting-only change.
be/src/storage/index/index_writer.cpp Uses extracted-column path (when present) for variant index field_name generation.
be/src/storage/compaction/compaction.cpp Propagates input rowset readers into writer context for nested-group streaming compaction.
be/src/exprs/function/function_variant_element.cpp Restores 4.1 behavior: non-doc-mode extraction writes sparse data; doc-mode writes doc_value.
be/src/exprs/function/cast/cast_to_variant.h Improves casting from nullable VARIANT and nullable wrapping/unwrapping behavior.
be/src/exec/common/variant_util.cpp Adds/uses nested-group compaction materialization planning and preserves existing extracted columns.
be/src/core/data_type/data_type_variant.h Adds convenience ctor DataTypeVariant(int32_t) defaulting doc mode off.
be/src/core/data_type/data_type_variant.cpp Implements new convenience ctor and formatting-only change.
be/src/core/column/column_variant.h Adds new ctors/overloads and root-visibility helper for nested JSON.
be/src/core/column/column_variant.cpp Refines NG conflict handling, restores root-only copy fast path, and improves “empty nested JSON” detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +187 to +191
if (enableNestedGroup) {
sb.append(",");
sb.append("\"variant_enable_nested_group\" = \"")
.append(String.valueOf(enableNestedGroup)).append("\"");
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.doris.catalog.VariantType#toSql() intentionally does not serialize variant_enable_nested_group (see fe-common/src/main/java/org/apache/doris/catalog/VariantType.java:155-190, and the new TypeTest asserting this). But Nereids VariantType#toSql() now appends "variant_enable_nested_group" when enableNestedGroup is true, which can emit DDL that FE will reject in 4.1 (LogicalPlanBuilder throws NotSupportedException when nested group is enabled). Consider removing this property from Nereids toSql() as well (or gating it behind an explicit “nested group supported” flag) to keep serialization consistent and avoid generating unsupported SQL.

Suggested change
if (enableNestedGroup) {
sb.append(",");
sb.append("\"variant_enable_nested_group\" = \"")
.append(String.valueOf(enableNestedGroup)).append("\"");
}

Copilot uses AI. Check for mistakes.
if (bucketNum < 1 || bucketNum > 1024) {
throw new AnalysisException("variant_sparse_hash_shard_count must between 1 and 1024 ");
if (bucketNum < 0 || bucketNum > 1024) {
throw new AnalysisException("variant_sparse_hash_shard_count must between 0 and 1024 ");
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated error message has a trailing space and is grammatically incorrect ("must between"). Since this message is user-facing, consider changing it to something like "variant_sparse_hash_shard_count must be between 0 and 1024" (no trailing whitespace).

Suggested change
throw new AnalysisException("variant_sparse_hash_shard_count must between 0 and 1024 ");
throw new AnalysisException("variant_sparse_hash_shard_count must be between 0 and 1024");

Copilot uses AI. Check for mistakes.
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24875956678

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.57% (6/21) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 21.51% (100/465) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.53% (20079/36820)
Line Coverage 35.49% (181773/512203)
Region Coverage 29.72% (133348/448662)
Branch Coverage 31.24% (59368/190028)

@morningman
Copy link
Copy Markdown
Contributor

run buildall

@morningman morningman force-pushed the branch-4.1-sync-nested branch from 89790c9 to ec240ac Compare April 24, 2026 20:23
@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/21) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 37.63% (175/465) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.22% (20014/37605)
Line Coverage 36.69% (188651/514147)
Region Coverage 33.05% (146661/443806)
Branch Coverage 34.12% (64170/188095)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.32% (234/465) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.87% (26099/36825)
Line Coverage 53.07% (271875/512330)
Region Coverage 46.61% (209194/448824)
Branch Coverage 49.94% (94926/190082)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.57% (6/21) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon force-pushed the branch-4.1-sync-nested branch 2 times, most recently from 7d9419c to 3e0ce01 Compare April 26, 2026 02:50
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 54.82% (256/467) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.48% (26336/36845)
Line Coverage 54.41% (279164/513071)
Region Coverage 51.60% (231335/448336)
Branch Coverage 53.03% (100171/188908)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.57% (6/21) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon force-pushed the branch-4.1-sync-nested branch from 3e0ce01 to 54fee65 Compare April 26, 2026 07:26
Keep branch-4.1 aligned with the selected modified files from 232cf720b while preserving the unsupported NestedGroup gate.

Add targeted FE tests for the 4.1 behavior.

[fix](variant) align inherited subcolumn index path

refactor nested group inverted path

fix test
@eldenmoon eldenmoon force-pushed the branch-4.1-sync-nested branch from 54fee65 to b00b8a7 Compare April 26, 2026 07:32
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 51.73% (224/433) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.37% (27034/36845)
Line Coverage 56.94% (292149/513039)
Region Coverage 54.44% (244046/448297)
Branch Coverage 56.10% (105970/188882)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.57% (6/21) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon changed the title [chore](variant) sync variant nested changes for branch 4.1 branch-4.1 : [Refactor](Variant) add NestedGroup path metadata support Apr 27, 2026
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@yiguolei yiguolei merged commit a5ddccb into apache:branch-4.1 May 7, 2026
27 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants