Skip to content

feat(compact): add FieldListaggAgg aggregate function#224

Merged
lxy-9602 merged 10 commits into
alibaba:mainfrom
Zouxxyy:dev/listagg-fix
Apr 16, 2026
Merged

feat(compact): add FieldListaggAgg aggregate function#224
lxy-9602 merged 10 commits into
alibaba:mainfrom
Zouxxyy:dev/listagg-fix

Conversation

@Zouxxyy
Copy link
Copy Markdown
Contributor

@Zouxxyy Zouxxyy commented Apr 13, 2026

Summary

Add listagg aggregate function for MergeTree compaction. Concatenates
string values with a configurable delimiter, with optional distinct mode
to deduplicate tokens.

Test plan

Unit tests cover null handling, empty strings, custom delimiters,
distinct dedup, and type validation.

🤖 Generated with [Claude Code]

Copilot AI review requested due to automatic review settings April 13, 2026 12:45
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

@Zouxxyy Zouxxyy changed the title feat(compact): add FieldListaggAgg aggregate functions feat(compact): add FieldListaggAgg aggregate function Apr 13, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new listagg field aggregate for MergeTree compaction to concatenate string values with configurable delimiter support, including an optional distinct/dedup mode.

Changes:

  • Introduce FieldListaggAgg aggregator and wire it into FieldAggregatorFactory.
  • Add new field-level options for list-agg-delimiter and distinct via CoreOptions / Options.
  • Add unit tests covering delimiter/null/empty/distinct/type-validation scenarios.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h Implements listagg aggregation, including distinct token deduplication logic.
src/paimon/core/mergetree/compact/aggregate/field_aggregator_factory.h Registers listagg in the aggregator factory.
src/paimon/core/core_options.h Adds CoreOptions accessors for listagg delimiter and distinct mode.
src/paimon/core/core_options.cpp Implements option parsing for delimiter and distinct mode.
include/paimon/defs.h Declares new option keys (distinct, list-agg-delimiter).
src/paimon/common/defs.cpp Defines the new option key constants.
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp Adds unit tests for listagg behavior and validation.
src/paimon/CMakeLists.txt Adds the new test file to the test build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
Comment thread src/paimon/core/core_options.h
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp Outdated
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
- Reject empty delimiter when distinct is enabled to prevent
  infinite loop in AggDistinctImpl (find("") always returns 0)
- Pre-reserve string capacity and use append() instead of +=
  in AggDistinctImpl to avoid repeated reallocations
- Replace assert() with gtest ADD_FAILURE() in test CreateOptions()
  to ensure failures are reported properly even under NDEBUG

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp Outdated
Comment thread include/paimon/defs.h
Comment thread src/paimon/core/core_options.cpp
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h Outdated
@lxy-9602
Copy link
Copy Markdown
Collaborator

Great work! Thanks for enhancing the aggregation functionality!

Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h Outdated
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp Outdated
Comment thread src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
@lxy-9602
Copy link
Copy Markdown
Collaborator

By the way, your change helped us identify a bug in the Java implementation:
In distinct mode, it uses contains() to check for duplicates, which can cause false positives due to substring matching.
For example, agg("ab", "a") would incorrectly treat "a" as duplicate even though it's not.

We're preparing a PR today to fix this on the Java side — thanks for catching this discrepancy!

Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lxy-9602 lxy-9602 merged commit 1d6b843 into alibaba:main Apr 16, 2026
9 checks passed
@Zouxxyy
Copy link
Copy Markdown
Contributor Author

Zouxxyy commented Apr 16, 2026

@lxy-9602 @lszskye Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants