Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support descending time ordering for time series query #2014

Merged
merged 1 commit into from
Jan 13, 2016

Conversation

navis
Copy link
Contributor

@navis navis commented Nov 26, 2015

No description provided.

@cheddar
Copy link
Contributor

cheddar commented Nov 30, 2015

Can you please provide a description of what functionality you are implementing as well as a high level overview of the approach?

final long timeEnd = Math.min(interval.getEndMillis(), gran.next(input));
while (baseOffset.withinBounds()) {
long current = timestamps.getLongSingleValueRow(baseOffset.getOffset());
if (descending ? current < timeEnd : current >= timeStart) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is going to add a branch inside this tight loop checking for descending. It's true that branch prediction will likely do a good job with this, but we can completely eliminate this branch by doing it earlier.

Let's move the Function to a variable. Then, in the conditional on lines 250:252, we can override the reference to an implementation that is specific to the descending case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but can do that later? There are so much issues to be addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a feature rather than a bug fix I'd rather not incur technical debt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheddar @drcrallen Sorry for delay. Addressed comment. Thanks.

@cheddar
Copy link
Contributor

cheddar commented Nov 30, 2015

This looks like a great start on re-ordering data to return in reverse-time order.

One big thing that this highlighted for me, however, is how the descending flag is being delivered. It definitely works to add it as has been done, but every time we want to add a new flag like this, we shouldn't have to extend every single query to support it. So that got me thinking which of the parts of the query would be best to house that flag. Immediate options as I see them are

  1. granularity - given that the descending flag applies to how the time grain is ordered, adding it here could make sense. But, at the same time, that reason seems weak.
  2. intervals - kinda the same logic as granularity: it's something that deals with time...
  3. context - it's an extra flag that changes the behavior of queries. But, so far things in context are somewhat meta. They are extra things specific to a query type or specific to how Brokers/Historicals comunicate, etc.

After going through all these options and realizing that I don't like any of them, what do people think about introduce a new element to the Query to replace granularity and intervals. We could call it the timeSpec and it would have 3 fields for now

  1. granularity
  2. intervals
  3. ascending

As we get more things related to how a specific query processes time, we could add them here. What do people think? Am I over-thinking it?

@cheddar
Copy link
Contributor

cheddar commented Nov 30, 2015

One thing I wonder about is how the merge happens when we go in a different time-sort order. I notice that you haven't made any changes to the comparators used for the merges that take place. I think those will also have to be adjusted for reversing the time order, but maybe not.

@navis
Copy link
Contributor Author

navis commented Dec 1, 2015

Implemented descending order only for search/timeseries/timebound/topn queries and caching is disabled for them. It's already grown too big for me to handle.

@navis
Copy link
Contributor Author

navis commented Dec 2, 2015

@cheddar I've also thought on the position of descending field and QuerySegmentSpec in BaseQuery looked good position of that (granularity also can be included in it). But I've decided not to make too many changes for now. We can discuss on this when the functionality is settled.

For merging works, I thing I've changed related codes(comparators, etc.) but cannot sure I've done it right. I'll add more tests on current supported query types and also will make group-by and select queries support descending. Thanks.

@fjy
Copy link
Contributor

fjy commented Dec 2, 2015

@cheddar I'm not sure how I feel about a "timeSpec" as the grouping. "intervals" is technically a filter and "granularity" and "direction" both define the formatting of the result. I think a "resultSpec" or "applySpec" is more intuitive to users. At some point we should have a "filterSpec" with "intervals" and "dimensionFilters" in it. Would love to get some feedback from @vogievetsky as well.

@vogievetsky
Copy link
Contributor

I think of all operations in terms of (filter-)split-apply-combine where for timeseries:

Filter - self explanatory (intervals + filter)
Split - how to bucket the data (granularity)
Apply - what to compute per bucket (aggregations + postAggregations)
Combine - how to sort, order, and limit the buckets

The proposed descending flag fits squarely in the combine category of my clarification.
As such I would advice against a timeSpec as it is arbitrary to try and group all time related stuff together.
I would recommend there to be a sortSpec or that direction just be a top level flag on the timeseries query only (just like threshold is on topN)

@navis
Copy link
Contributor Author

navis commented Dec 4, 2015

descending would be about the order of processing, not about ordering of result. So it can be applied to all kind of queries but whether it has any meaning is a different question.

For example, descending processing of time-series queries will make descending ordered result(good). But for group-by queries, the order of processing has no meaning and there are another ordering spec for result ordering in it. For search queries? because druid just lookups index and dictionary, it just not have meaning of order of processing.

I think time-series and search queries can make differences by this. But for others, I don't know.

@cheddar
Copy link
Contributor

cheddar commented Dec 9, 2015

Ok, timeSpec is a horrible idea ;).

I kinda like the combine idea, but @navis 's comments about the ordering affecting the processing order rather than the response order has also got me thinking. As I think about it, I think I've convinced myself that even though it is affecting the processing order and not just the result order, I think that's an implementation detail/optimization...

I wonder if maybe we should look at creating a query with the chunks as Vadim has them laid out "split/apply/combine/filter". We could rewrite that into whatever queries we have right now as an initial implementation and then eventually implement it to actually run against segments too?

If we were to take this approach, I think that the way you are doing it now would still make sense for the long-term ('cause eventually maybe those queries would be hidden behind the split/apply/combine/filter query?). What do you guys think?

@fjy
Copy link
Contributor

fjy commented Dec 11, 2015

@cheddar +1. The way I always think about Druid's current query API is that it is the low level API and over time we should migrate a higher level API that is much easier to reason about and extend. However, what do you want to do about the changes required in this current PR?

/druid/v3 would be cool :). What was /druid/v1?

@cheddar
Copy link
Contributor

cheddar commented Dec 11, 2015

If we want to take the approach of trying to do a split/apply/combine/filter query to replace them "all", then I think having it at the query level like it is can make sense. So let's just leave it there and maybe try to get @vogievetsky to propose how he would prefer to specify his split/apply/combine/filter queries?

@navis
Copy link
Contributor Author

navis commented Dec 14, 2015

Yes, descending-processing would be a part of execution environment for S/A/C/F, not to be accessed by user directly. But can we include this just for time-series queries for now? There are some requests for it.

* @return the sequence of merged results
*/
public abstract Sequence<ResultType> mergeSequences(Sequence<Sequence<ResultType>> seqOfSequences);
public abstract Sequence<ResultType> mergeSequences(Sequence<Sequence<ResultType>> seqOfSequences, boolean descending);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this interface is something that someone can extend in an extension, so this change is going to mean that this can't go out until 0.9.0 (which is our next planned release, so no big deal, really). I say this just to make sure that we tag this PR as 0.9.0 and include something in the release notes about the compatibility change.

@cheddar
Copy link
Contributor

cheddar commented Dec 23, 2015

This generally makes sense to me. I think that the way you went about adding the descending property to all of the function calls is actually not necessary, but I won't know without messing around with the code a bit. I'm gonna try forking your branch and editing it up some to see if I'm smoking crack or not.

@cheddar
Copy link
Contributor

cheddar commented Dec 24, 2015

Ok, yeah, I was right in that there was a simpler way to do things. It required a bit of butchering of interfaces.

Tthere were methods on QueryToolChest that shouldn't've been there. Those methods were breaking the abstraction and needed to be eliminated in order to make the simplified changes. It probably wasn't readily apparent that the problem was the bad methods on the interface, but once they are cleaned up, the code cleans up quite a bit. I did a PR against your PR branch, you can see the changes here:

navis#1

Let me know what you think.

@navis
Copy link
Contributor Author

navis commented Dec 28, 2015

Merged @cheddar's patch and rebased on master. Let's see the test results.

@navis navis changed the title [WIP] fix for #2013 Support descending ordered queries Support descending time ordering for time series query Dec 28, 2015
@cheddar
Copy link
Contributor

cheddar commented Dec 29, 2015

There's still two comments I'd like to see addressed, but once they are in I'll be 👍

@fjy
Copy link
Contributor

fjy commented Dec 29, 2015

@navis I have a comment to please update the documentation so people can know how to get results in reverse order

@navis navis force-pushed the DRUID-2013 branch 2 times, most recently from 802401a to 3c8bdb3 Compare December 30, 2015 08:46
@fjy
Copy link
Contributor

fjy commented Jan 6, 2016

@nishantmonu51 do you have any more comments?

{
public static <T> int getContextPriority(Query<T> query, int defaultValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all those method are marked as @deprecated in the Query interface, should we mark them deprecated here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all those methods in Query interface are changed to static method and that can be regarded as committing the deprecation, because it's not backward compatible anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. We deprecated them because we planned avoid using string parsing going forward, but that may warrant a separate discussion. I'm fine leaving as is.

return true;
}
long current = timestamps.getLongSingleValueRow(baseOffset.getOffset());
return current >= timeStart && current < timeEnd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we technically only need to check one of those two conditions based on whether the query is descending or not, is it faster to do a check based on the descending flag, to always check both, or is there maybe a benefit to do the branching outside of the loop, i.e have something like a DescendingTimestampCheckingOffset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made Ascending/DescendingTimestampCheckingOffset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis we don't necessarily need to separate it out if it doesn't make a difference. My question was mainly whether we branch prediction would help us more than doing both checks, or maybe it doesn't make a difference at all, in which case we should leave the simplest code

@fjy
Copy link
Contributor

fjy commented Jan 7, 2016

@navis there's some merge conflicts now, hopefully they are small

@fjy
Copy link
Contributor

fjy commented Jan 10, 2016

@navis any chance of resolving merge conflicts and finishing this one up?

@navis
Copy link
Contributor Author

navis commented Jan 11, 2016

@fjy fixed conflict and addressed comments.

@xvrl
Copy link
Member

xvrl commented Jan 11, 2016

@navis latest changes looks fine to me, can we squash commits before merging. @cheddar there have been quite a few changes since your last thumbs up, can you have a second look?

@navis
Copy link
Contributor Author

navis commented Jan 11, 2016

I'll squash commits when @cheddar approves.

private static class ArrayIntIterator implements IntIterator {

private final int[] array;
private transient int index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is index transient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's some kind of habit of me. I'll remove that.

@cheddar
Copy link
Contributor

cheddar commented Jan 13, 2016

I have verified that all of navis's changes since I looked last are good on the basic functionality. I did not verify that the caching looks good, but I'm happy with everyone else's eyes on that. so I'm 👍

@navis
Copy link
Contributor Author

navis commented Jan 13, 2016

Rebased on trunk & squshed.

fjy added a commit that referenced this pull request Jan 13, 2016
Support descending time ordering for time series query
@fjy fjy merged commit dfc631c into apache:master Jan 13, 2016
@xvrl
Copy link
Member

xvrl commented Jan 22, 2016

@fjy my comment here was not addressed https://github.com/druid-io/druid/pull/2014/files#r49129298
Can we please make sure to review all comments before hitting merge?

@navis
Copy link
Contributor Author

navis commented Jan 25, 2016

@xvrl Totally my bad, sorry. I've added patch for it in #2326.

@fjy fjy modified the milestone: 0.9.0 Feb 4, 2016
@fjy fjy mentioned this pull request Feb 5, 2016
@navis navis deleted the DRUID-2013 branch February 13, 2016 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants