-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support descending time ordering for time series query #2014
Conversation
Can you please provide a description of what functionality you are implementing as well as a high level overview of the approach? |
final long timeEnd = Math.min(interval.getEndMillis(), gran.next(input)); | ||
while (baseOffset.withinBounds()) { | ||
long current = timestamps.getLongSingleValueRow(baseOffset.getOffset()); | ||
if (descending ? current < timeEnd : current >= timeStart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is going to add a branch inside this tight loop checking for descending
. It's true that branch prediction will likely do a good job with this, but we can completely eliminate this branch by doing it earlier.
Let's move the Function
to a variable. Then, in the conditional on lines 250:252, we can override the reference to an implementation that is specific to the descending case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but can do that later? There are so much issues to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a feature rather than a bug fix I'd rather not incur technical debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheddar @drcrallen Sorry for delay. Addressed comment. Thanks.
This looks like a great start on re-ordering data to return in reverse-time order. One big thing that this highlighted for me, however, is how the
After going through all these options and realizing that I don't like any of them, what do people think about introduce a new element to the
As we get more things related to how a specific query processes time, we could add them here. What do people think? Am I over-thinking it? |
One thing I wonder about is how the merge happens when we go in a different time-sort order. I notice that you haven't made any changes to the comparators used for the merges that take place. I think those will also have to be adjusted for reversing the time order, but maybe not. |
Implemented |
@cheddar I've also thought on the position of For merging works, I thing I've changed related codes(comparators, etc.) but cannot sure I've done it right. I'll add more tests on current supported query types and also will make group-by and select queries support descending. Thanks. |
@cheddar I'm not sure how I feel about a "timeSpec" as the grouping. "intervals" is technically a filter and "granularity" and "direction" both define the formatting of the result. I think a "resultSpec" or "applySpec" is more intuitive to users. At some point we should have a "filterSpec" with "intervals" and "dimensionFilters" in it. Would love to get some feedback from @vogievetsky as well. |
I think of all operations in terms of (filter-)split-apply-combine where for timeseries: Filter - self explanatory ( The proposed |
For example, descending processing of time-series queries will make descending ordered result(good). But for group-by queries, the order of processing has no meaning and there are another ordering spec for result ordering in it. For search queries? because druid just lookups index and dictionary, it just not have meaning of order of processing. I think time-series and search queries can make differences by this. But for others, I don't know. |
Ok, I kinda like the I wonder if maybe we should look at creating a query with the chunks as Vadim has them laid out "split/apply/combine/filter". We could rewrite that into whatever queries we have right now as an initial implementation and then eventually implement it to actually run against segments too? If we were to take this approach, I think that the way you are doing it now would still make sense for the long-term ('cause eventually maybe those queries would be hidden behind the split/apply/combine/filter query?). What do you guys think? |
@cheddar +1. The way I always think about Druid's current query API is that it is the low level API and over time we should migrate a higher level API that is much easier to reason about and extend. However, what do you want to do about the changes required in this current PR? /druid/v3 would be cool :). What was /druid/v1? |
If we want to take the approach of trying to do a split/apply/combine/filter query to replace them "all", then I think having it at the query level like it is can make sense. So let's just leave it there and maybe try to get @vogievetsky to propose how he would prefer to specify his split/apply/combine/filter queries? |
Yes, |
* @return the sequence of merged results | ||
*/ | ||
public abstract Sequence<ResultType> mergeSequences(Sequence<Sequence<ResultType>> seqOfSequences); | ||
public abstract Sequence<ResultType> mergeSequences(Sequence<Sequence<ResultType>> seqOfSequences, boolean descending); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this interface is something that someone can extend in an extension, so this change is going to mean that this can't go out until 0.9.0 (which is our next planned release, so no big deal, really). I say this just to make sure that we tag this PR as 0.9.0 and include something in the release notes about the compatibility change.
This generally makes sense to me. I think that the way you went about adding the |
Ok, yeah, I was right in that there was a simpler way to do things. It required a bit of butchering of interfaces. Tthere were methods on QueryToolChest that shouldn't've been there. Those methods were breaking the abstraction and needed to be eliminated in order to make the simplified changes. It probably wasn't readily apparent that the problem was the bad methods on the interface, but once they are cleaned up, the code cleans up quite a bit. I did a PR against your PR branch, you can see the changes here: Let me know what you think. |
Merged @cheddar's patch and rebased on master. Let's see the test results. |
There's still two comments I'd like to see addressed, but once they are in I'll be 👍 |
@navis I have a comment to please update the documentation so people can know how to get results in reverse order |
802401a
to
3c8bdb3
Compare
@nishantmonu51 do you have any more comments? |
{ | ||
public static <T> int getContextPriority(Query<T> query, int defaultValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all those method are marked as @deprecated
in the Query interface, should we mark them deprecated here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all those methods in Query
interface are changed to static method and that can be regarded as committing the deprecation, because it's not backward compatible anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. We deprecated them because we planned avoid using string parsing going forward, but that may warrant a separate discussion. I'm fine leaving as is.
return true; | ||
} | ||
long current = timestamps.getLongSingleValueRow(baseOffset.getOffset()); | ||
return current >= timeStart && current < timeEnd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we technically only need to check one of those two conditions based on whether the query is descending or not, is it faster to do a check based on the descending flag, to always check both, or is there maybe a benefit to do the branching outside of the loop, i.e have something like a DescendingTimestampCheckingOffset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made Ascending/DescendingTimestampCheckingOffset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navis we don't necessarily need to separate it out if it doesn't make a difference. My question was mainly whether we branch prediction would help us more than doing both checks, or maybe it doesn't make a difference at all, in which case we should leave the simplest code
@navis there's some merge conflicts now, hopefully they are small |
@navis any chance of resolving merge conflicts and finishing this one up? |
@fjy fixed conflict and addressed comments. |
I'll squash commits when @cheddar approves. |
private static class ArrayIntIterator implements IntIterator { | ||
|
||
private final int[] array; | ||
private transient int index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is index
transient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's some kind of habit of me. I'll remove that.
I have verified that all of navis's changes since I looked last are good on the basic functionality. I did not verify that the caching looks good, but I'm happy with everyone else's eyes on that. so I'm 👍 |
Rebased on trunk & squshed. |
Support descending time ordering for time series query
@fjy my comment here was not addressed https://github.com/druid-io/druid/pull/2014/files#r49129298 |
No description provided.