-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce in-Segment Trim for GroupBy OrderBy Query #6991
Conversation
# Conflicts: # pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wuwenw thanks for the PR. Yet to fully go through it. Please give a day to go through it.
Codecov Report
@@ Coverage Diff @@
## master #6991 +/- ##
============================================
- Coverage 73.23% 65.42% -7.82%
Complexity 12 12
============================================
Files 1439 1454 +15
Lines 71333 72091 +758
Branches 10334 10441 +107
============================================
- Hits 52243 47162 -5081
- Misses 15578 21517 +5939
+ Partials 3512 3412 -100
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
Outdated
Show resolved
Hide resolved
This reverts commit 116e74c.
…bator-pinot into master" This reverts commit 544e9a8, reversing changes made to 7d0b896.
pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/IntermediateRecord.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/core/query/aggregation/groupby/DictionaryBasedGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderByMultiValueQueriesTest.java
Outdated
Show resolved
Hide resolved
...t-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderBySingleValueQueriesTest.java
Outdated
Show resolved
Hide resolved
...t-core/src/test/java/org/apache/pinot/queries/InterSegmentOrderBySingleValueQueriesTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please revise the tests as discussed offline
...re/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/operator/query/AggregationGroupByOrderByOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
@@ -52,39 +53,53 @@ | |||
import org.slf4j.LoggerFactory; | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) remove
@@ -38,28 +44,34 @@ | |||
@SuppressWarnings("rawtypes") | |||
public class AggregationGroupByOrderByOperator extends BaseOperator<IntermediateResultsBlock> { | |||
private static final String OPERATOR_NAME = "AggregationGroupByOrderByOperator"; | |||
private static final int TRIM_OFF = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not used?
* Currently tests 'max' functions, and can be easily extended to | ||
* test other conditions such as GroupBy without OrderBy | ||
*/ | ||
public class GroupByInSegmentTrimTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could be flaky, fix as discussed offline
Description
One of the major bottlenecks for the current GroupBy OrderBy query on high cardinality columns is the merge phase. Essentially every segment brings a large number of intermediate results to a global concurrent map for further aggregation and merge, which takes up a lot of space and is very time-consuming. This PR introduces an optimization option that each segment trims its intermediate results to a given size. The size is configurable by the user and is guaranteed to be max(limit N * 5, 5000). It won't affect accuracy much but reduces the running time for high cardinality dataset. ~5 times faster for String data with 10M cardinality. This option is turned off by default to ensure backward compatibility.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Optimized GroupBy OrderBy queries by introducing an in-segment trim option that can significantly reduce the size of intermediate results and speed up the execution.
Documentation
GroupBy OrderBy In-Segment Trim
This option is used to apply an additional filter in the segments for the GroupBy OrderBy query. Based on a configurable trim_size k, each segment will sort and send back the top k results only. It will slightly decrease the accuracy but boost up the performance for high-cardinality data.
Related query options:
SEGMENT TRIM ON
SEGMENT TRIM SIZE
Related server config:
enable.segment.group.trim
size.segment.group.trim
Related query keywords:
LIMIT N
SEGMENT TRIM ON and enable.segment.group.trim are used to enable/disable this feature. By default, it is turned off and segments will send back all results.
If it is enabled, then actual trim_size is calculated based on the parameters (SEGMENT TRIM SIZE and size.segment.group.trim) and query keyword LIMIT N.
Trim_size = max(5*LIMIT, minTrimSize) where minTrimSize is set up by the user (Default value: 5000)