-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contributing Moving-Average Query to open source. #6430
Conversation
@nishantmonu51 Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build fails due to useDefaultValueForNull=false (A flag introduced in 0.13.0).
I will update the tests accordingly.
…nfiguration parameter.
@nishantmonu51 the build passes successfully now. The extension doesn't support useDefaultValueForNull=false yet. In the meantime I had to explicitly turn off the flag in unit tests. (See issue #6472). |
@niketh would you like to review? |
Very useful feature. Right now, moving averages are doable in SQL via nested |
.../main/java/org/apache/druid/query/movingaverage/DefaultMovingAverageQueryMetricsFactory.java
Show resolved
Hide resolved
* Remove NullDimensionSelector. * Apply changes of RequestLogger. * Apply changes of TimelineServerView.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurmix thanks for the nice PR! I'm still looking at, but have a question for the general design.
Why do we need a new query type for moving average? Rather, can we generalize Averager
and PostAverager
to support any types of queries like PostAggregator
? This question is related to MovingAverageQueryRunner
. It internally generates a query and computes moving average on top of it. I don't think this is necessary, but we can generalize this logic similar to PostAggregator
.
...ry/src/test/java/org/apache/druid/query/movingaverage/averagers/BaseAveragerFactoryTest.java
Outdated
Show resolved
Hide resolved
docs/content/development/extensions-contrib/moving-average-query.md
Outdated
Show resolved
Hide resolved
docs/content/development/extensions-contrib/moving-average-query.md
Outdated
Show resolved
Hide resolved
...ge-query/src/main/java/org/apache/druid/query/movingaverage/MovingAverageQueryToolChest.java
Outdated
Show resolved
Hide resolved
...ge-query/src/main/java/org/apache/druid/query/movingaverage/MovingAverageQueryToolChest.java
Outdated
Show resolved
Hide resolved
...ge-query/src/main/java/org/apache/druid/query/movingaverage/MovingAverageQueryToolChest.java
Outdated
Show resolved
Hide resolved
...erage-query/src/main/java/org/apache/druid/query/movingaverage/MovingAverageQueryRunner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still reviewing, but left some more comments.
...average-query/src/test/java/org/apache/druid/query/movingaverage/MovingAverageQueryTest.java
Outdated
Show resolved
Hide resolved
...ving-average-query/src/main/java/org/apache/druid/query/movingaverage/RowBucketIterable.java
Outdated
Show resolved
Hide resolved
...ving-average-query/src/main/java/org/apache/druid/query/movingaverage/RowBucketIterable.java
Show resolved
Hide resolved
...-average-query/src/test/java/org/apache/druid/query/movingaverage/RowBucketIterableTest.java
Outdated
Show resolved
Hide resolved
...ving-average-query/src/main/java/org/apache/druid/query/movingaverage/RowBucketIterable.java
Outdated
Show resolved
Hide resolved
{ | ||
|
||
private final Collection<DimensionSpec> dims; | ||
private final Map<Map<String, Object>, Collection<Averager<?>>> averagers = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a brief description of what are keys and values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...average-query/src/main/java/org/apache/druid/query/movingaverage/PostAveragerCalculator.java
Outdated
Show resolved
Hide resolved
...average-query/src/main/java/org/apache/druid/query/movingaverage/averagers/BaseAverager.java
Outdated
Show resolved
Hide resolved
public class LongMaxAverager extends BaseAverager<Number, Long> | ||
{ | ||
|
||
private int startFrom = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment (see commit e2a5317
).
private Set<Map<String, Object>> seenKeys = new HashSet<>(); | ||
private Row saveNext; | ||
private Map<String, AggregatorFactory> aggMap; | ||
private Map<String, Object> fakeEvents; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is fakeEvents
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment (see commit 4b425b2
).
# Conflicts: # pom.xml
I completely agree the work could be incorporated within the general Query. The reason it was created as a separate query type is for the sake of easier implementation (due to the separation of concerns). Calling the underlying query through an internal API is easier than making those changes directly to |
@jihoonson BTW, I have a couple of my own design concerns and I welcome your thoughts on the matter:
|
@yurmix I see. It makes sense to me.
It looks complex, but I think it would be easier to understand if you can add more comments. But, simpler implementation would be great if possible. Would you tell me a bit of details of what kind of refactoring you think?
It sounds good, but I'm wondering we can use our |
…ollowing once DI conflicts with datasketches are resolved.
* Remove unused variables/prarameters.
@jihoonson, thanks so much for your effort on this thorough review and sorry it took me that long to complete my response. I have addressed all comments, feel free to review and raise other concerns. |
Ping to keep open |
@drcrallen thanks for the ping. @yurmix I'll take another look once 0.14.0 release is finalized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurmix thank you for the fix. Left some more comments.
...ge-query/src/main/java/org/apache/druid/query/movingaverage/MovingAverageQueryToolChest.java
Outdated
Show resolved
Hide resolved
...ving-average-query/src/main/java/org/apache/druid/query/movingaverage/RowBucketIterable.java
Outdated
Show resolved
Hide resolved
...ving-average-query/src/main/java/org/apache/druid/query/movingaverage/RowBucketIterable.java
Outdated
Show resolved
Hide resolved
...-average-query/src/main/java/org/apache/druid/query/movingaverage/MovingAverageIterable.java
Outdated
Show resolved
Hide resolved
private final List<AveragerFactory<?, ?>> factories; | ||
private final Map<String, PostAggregator> postAggMap; | ||
private final Map<String, AggregatorFactory> aggMap; | ||
private final Map<String, Object> fakeEvents; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe emptyEvents
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
* | ||
* <p>Usually, the contents of key will be contained by the row R being passed in, but in the case of a | ||
* dummy row, its possible that the dimensions will be known but the row empty. Hence, the values are | ||
* passed as two separate arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure why it should accept key
and r
separately instead of accepting MapBasedRow
. Would you elaborate more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row's key
(only dimensions, no metrics) is required but is not provided by Row
's interface.
We use MovingAverageHelper.getDimKeyFromRow(dims, r)
to extract the key.
I was able to remove the redundant parameter by calling getDimKeyFromRow
inside computeMovingAverage()
as well.
final I[] buckets; | ||
private int index; | ||
|
||
/* startFrom is needed because `buckets` field is a fixed array, not a list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think we don't use multi line comments widely. How about changing it to javadoc? I think it's more clear anyway.
@@ -0,0 +1,2 @@ | |||
druid.processing.buffer.sizeBytes=655360 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please let me know what tests failed without this file? I think you should be able to set these properties in each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the property (druid.processing.buffer.sizeBytes=655360
) to MovingAverageQueryTest
and removed runtime.properties.
// standard case. return regular row | ||
yielder = yielder.next(currentBucket); | ||
expectedBucket = expectedBucket.plus(period); | ||
return currentBucket; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me add some more details. yielder
is updated in this if
clause which should be used to iterate all values in it. However, in hasNext()
, it only checks expectedBucket
is less than endTime
. Since expectedBucket
is also updated in this if
clause, hasNext()
can return false even though yielder
is not used yet.
@yurmix sorry, I've just checked your last update. Most of my last comments were addressed except these two: #6430 (comment), #6430 (comment). Would you please check them? |
Thanks for reminding me, I have addressed one of them and currently reviewing the other. |
The reason we need I think there could be a more elegant way for traversing over two levels (intervals/periods and rows) when one does not directly contain the other, but I won't be able to refactor it for this release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there could be a more elegant way for traversing over two levels (intervals/periods and rows) when one does not directly contain the other, but I won't be able to refactor it for this release.
It sounds good to me. I don't think this refactoring is strictly required for this PR.
The latest change looks good to me. The CI failure looks a flaky Travis timeout and I just restarted it. +1 after CI.
@yurmix thank you for all your hard work!
@jihoonson thanks for your dedication and for the insightful review! |
Implements #6320
Includes documentation and comments but I'd be glad to add more if needed.