-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouping Engine fix when a limit spec with different order by columns is applied #16534
Conversation
@@ -686,13 +686,27 @@ public Sequence<ResultRow> processSubtotalsSpec( | |||
processingConfig.intermediateComputeSizeBytes() | |||
); | |||
|
|||
List<String> queryDimNames = baseSubtotalQuery.getDimensions().stream().map(DimensionSpec::getOutputName) | |||
List<String> queryDimNamesInOrder = baseSubtotalQuery.getDimensions().stream().map(DimensionSpec::getOutputName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand the reasons/etc here - but if these are the column the query's output will be ordered; can't we place these logics inside a member method as GroupByQuery#getOrderColumnNames
or something ?
@@ -577,6 +577,7 @@ private Ordering<ResultRow> getRowOrderingForPushDown( | |||
final List<Boolean> needsReverseList = new ArrayList<>(); | |||
final List<ColumnType> dimensionTypes = new ArrayList<>(); | |||
final List<StringComparator> comparators = new ArrayList<>(); | |||
final List<DimensionSpec> dimensionsInOrder = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this double-book-keeping is a pretty wierd way to supply this detail - but it works...
I think it would be more beneficial
- identify the differences between
getRowOrderingForPushDown
andgetRowOrdering
- merge the differing behaviour into the
getRowOrderingForPushDown
method- isn't the main difference is that
getRowOrdering
ignores thelimitSpec
regardless its set?
- isn't the main difference is that
- possibly also eat-up the
RowBasedGrouperHelper
which is another copy of the process of creating the comparators - I think correctness of the execution depends on knowing on which columns we ordered and in what order....having just one source of truth could reduce the amount of issues we face
all of the above are refactors - which might delay the fix of this bug....
@@ -565,7 +565,7 @@ private boolean canDoLimitPushDown( | |||
* limit/order spec (unlike non-push down case where the results always use the default natural ascending order), | |||
* so when merging these partial result streams, the merge needs to use the same ordering to get correct results. | |||
*/ | |||
private Ordering<ResultRow> getRowOrderingForPushDown( | |||
private OrderingAndDimensions getRowOrderingAndDimensionsForPushDown( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I think you could keep the old method with its signature; and service it like: return getOrderingAndDimensions(false).getRowOrdering()
.getDimensions() | ||
.stream() | ||
.map(DimensionSpec::getOutputName) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: can this be hidden somewhere - like a method named: getDimentsionColumnNames
or something
Should this have the Bug fix label on it too? |
Description
fixed the grouping engine for a query with grouping sets when limit is applied with order by columns being different from the query dimensions in order to merge the subtotal results correctly.
for example a query like :
we need to sort by the 2nd dimension first before applying the merge function.
Release note
Key changed/added classes in this PR
GroupingEngine
This PR has: