test(workflow-operator): add unit test coverage for AggregationOperation#4908
Conversation
Pin the contract of `AggregationOperation`'s public API: result-attribute typing per aggregation kind, error paths for unsupported types and null aggFunction, the per-kind `DistributedAggregation` semantics (init, iterate, merge, finalAgg) for SUM/COUNT/AVERAGE/CONCAT including the null-handling differences between count-all and count-non-null and the "average of zero values is null" rule, and the `getFinal` rewrite that turns COUNT into SUM over the partial column while leaving other aggregations' aggFunction intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4908 +/- ##
============================================
+ Coverage 42.10% 46.90% +4.79%
- Complexity 2174 2179 +5
============================================
Files 980 844 -136
Lines 36298 27319 -8979
Branches 3783 2536 -1247
============================================
- Hits 15283 12813 -2470
+ Misses 20096 13726 -6370
+ Partials 919 780 -139
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds focused unit test coverage for AggregationOperation (workflow-operator) to pin key parts of its public contract and distributed aggregation semantics, addressing issue #4907.
Changes:
- Introduces
AggregationOperationSpecwith tests for result-column typing by aggregation kind (SUM/COUNT/AVERAGE/MIN/MAX/CONCAT) and null-aggFunction error handling. - Adds tests for
getAggFuncvalidation and semantics for SUM/COUNT/AVERAGE/CONCAT, including init/iterate/merge/finalAgg behaviors. - Adds tests for
getFinalrewrite behavior (COUNT → SUM over partial/result column).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ping coverage Address review feedback on apache#4908: - (Yicong-Huang / Copilot) The original spec broadly duplicated AggregateOpSpec's getAggregationAttribute, per-kind iterate, and getFinal coverage. Removed the duplicate cases and added a header comment documenting what AggregateOpSpec already covers so future contributors don't re-add them. - (Copilot) Misleading CONCAT description that said "skip null" while asserting null contributes a delimiter — removed (the iterate behavior is already covered in AggregateOpSpec). - (Yicong-Huang) Add partial+final pipeline coverage. New end-to-end tests run a real worker COUNT / SUM, emit partial output tuples, then re-aggregate them via getFinal to confirm the two-stage result matches a single-pass aggregation. What this spec uniquely covers (vs AggregateOpSpec): - getAggFunc validation errors (non-numeric on SUM/MIN/MAX, null aggFunction). - CONCAT merge (AggregateOpSpec exercises CONCAT iterate, not merge directly). - Two-stage worker→final pipeline for COUNT and SUM. - AveragePartialObj field/equality contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tes and test names Address Copilot feedback on apache#4908: - Coverage-notes header: drop the inaccurate "merge" claim about AggregateOpSpec (it exercises iterate/finalAgg but never calls merge directly). - Coverage-notes header: replace "AveragePartialObj value-class exposure" — AveragePartialObj is a plain case class, not a value class (does not extend AnyVal). Reword to "case-class field exposure / value equality". - Test names: rename "non-numeric types on SUM/MIN/MAX" to "unsupported attribute types on SUM/MIN/MAX". SUM/MIN/MAX accept TIMESTAMP alongside the numerics, so "non-numeric" was inaccurate. Added inline comments listing the actual accepted types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What changes were proposed in this PR?
Adds
AggregationOperationSpeccovering the public surface ofAggregationOperation(common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/aggregate/AggregationOperation.scala), which previously had no test coverage.The new spec pins:
getAggregationAttributeper kind: SUM/MIN/MAX preserve the input type, COUNT → INTEGER, AVERAGE → DOUBLE, CONCAT → STRING; null aggFunction throwsRuntimeException.getAggFuncvalidation: SUM/MIN/MAX throwUnsupportedOperationExceptionfor non-numeric attribute types; null aggFunction throws.DistributedAggregationsemantics (init / iterate / merge / finalAgg) for SUM, COUNT, AVERAGE, CONCAT — including the null-handling differences between count-all (attribute = null) and count-non-null, and the rule that AVERAGE of zero values projects tonull.getFinal: COUNT is rewritten to SUM over the partial column; non-COUNT aggregations keep their kind but rebind bothattributeandresultAttributeto the partial column.AveragePartialObj: case-class field exposure and value equality.No production code changed; this is test-only.
Any related issues, documentation, discussions?
Closes #4907
How was this PR tested?
Added 19 new unit tests in
AggregationOperationSpec. Verified locally:Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code