Support descending time ordering for time series query #2014

navis · 2015-11-26T07:31:09Z

No description provided.

cheddar · 2015-11-30T21:02:18Z

Can you please provide a description of what functionality you are implementing as well as a high level overview of the approach?

cheddar · 2015-11-30T22:22:36Z

processing/src/main/java/io/druid/segment/QueryableIndexStorageAdapter.java

+                  final long timeEnd = Math.min(interval.getEndMillis(), gran.next(input));
+                  while (baseOffset.withinBounds()) {
+                    long current = timestamps.getLongSingleValueRow(baseOffset.getOffset());
+                    if (descending ? current < timeEnd : current >= timeStart) {


This change is going to add a branch inside this tight loop checking for descending. It's true that branch prediction will likely do a good job with this, but we can completely eliminate this branch by doing it earlier.

Let's move the Function to a variable. Then, in the conditional on lines 250:252, we can override the reference to an implementation that is specific to the descending case.

I agree, but can do that later? There are so much issues to be addressed.

If it is a feature rather than a bug fix I'd rather not incur technical debt.

@cheddar @drcrallen Sorry for delay. Addressed comment. Thanks.

cheddar · 2015-11-30T22:36:41Z

This looks like a great start on re-ordering data to return in reverse-time order.

One big thing that this highlighted for me, however, is how the descending flag is being delivered. It definitely works to add it as has been done, but every time we want to add a new flag like this, we shouldn't have to extend every single query to support it. So that got me thinking which of the parts of the query would be best to house that flag. Immediate options as I see them are

granularity - given that the descending flag applies to how the time grain is ordered, adding it here could make sense. But, at the same time, that reason seems weak.
intervals - kinda the same logic as granularity: it's something that deals with time...
context - it's an extra flag that changes the behavior of queries. But, so far things in context are somewhat meta. They are extra things specific to a query type or specific to how Brokers/Historicals comunicate, etc.

After going through all these options and realizing that I don't like any of them, what do people think about introduce a new element to the Query to replace granularity and intervals. We could call it the timeSpec and it would have 3 fields for now

granularity
intervals
ascending

As we get more things related to how a specific query processes time, we could add them here. What do people think? Am I over-thinking it?

cheddar · 2015-11-30T22:38:14Z

One thing I wonder about is how the merge happens when we go in a different time-sort order. I notice that you haven't made any changes to the comparators used for the merges that take place. I think those will also have to be adjusted for reversing the time order, but maybe not.

navis · 2015-12-01T08:42:53Z

Implemented descending order only for search/timeseries/timebound/topn queries and caching is disabled for them. It's already grown too big for me to handle.

navis · 2015-12-02T00:08:34Z

@cheddar I've also thought on the position of descending field and QuerySegmentSpec in BaseQuery looked good position of that (granularity also can be included in it). But I've decided not to make too many changes for now. We can discuss on this when the functionality is settled.

For merging works, I thing I've changed related codes(comparators, etc.) but cannot sure I've done it right. I'll add more tests on current supported query types and also will make group-by and select queries support descending. Thanks.

fjy · 2015-12-02T23:42:11Z

@cheddar I'm not sure how I feel about a "timeSpec" as the grouping. "intervals" is technically a filter and "granularity" and "direction" both define the formatting of the result. I think a "resultSpec" or "applySpec" is more intuitive to users. At some point we should have a "filterSpec" with "intervals" and "dimensionFilters" in it. Would love to get some feedback from @vogievetsky as well.

vogievetsky · 2015-12-02T23:55:13Z

I think of all operations in terms of (filter-)split-apply-combine where for timeseries:

Filter - self explanatory (intervals + filter)
Split - how to bucket the data (granularity)
Apply - what to compute per bucket (aggregations + postAggregations)
Combine - how to sort, order, and limit the buckets

The proposed descending flag fits squarely in the combine category of my clarification.
As such I would advice against a timeSpec as it is arbitrary to try and group all time related stuff together.
I would recommend there to be a sortSpec or that direction just be a top level flag on the timeseries query only (just like threshold is on topN)

navis · 2015-12-04T07:56:13Z

descending would be about the order of processing, not about ordering of result. So it can be applied to all kind of queries but whether it has any meaning is a different question.

For example, descending processing of time-series queries will make descending ordered result(good). But for group-by queries, the order of processing has no meaning and there are another ordering spec for result ordering in it. For search queries? because druid just lookups index and dictionary, it just not have meaning of order of processing.

I think time-series and search queries can make differences by this. But for others, I don't know.

cheddar · 2015-12-09T21:38:44Z

Ok, timeSpec is a horrible idea ;).

I kinda like the combine idea, but @navis 's comments about the ordering affecting the processing order rather than the response order has also got me thinking. As I think about it, I think I've convinced myself that even though it is affecting the processing order and not just the result order, I think that's an implementation detail/optimization...

I wonder if maybe we should look at creating a query with the chunks as Vadim has them laid out "split/apply/combine/filter". We could rewrite that into whatever queries we have right now as an initial implementation and then eventually implement it to actually run against segments too?

If we were to take this approach, I think that the way you are doing it now would still make sense for the long-term ('cause eventually maybe those queries would be hidden behind the split/apply/combine/filter query?). What do you guys think?

fjy · 2015-12-11T00:20:34Z

@cheddar +1. The way I always think about Druid's current query API is that it is the low level API and over time we should migrate a higher level API that is much easier to reason about and extend. However, what do you want to do about the changes required in this current PR?

/druid/v3 would be cool :). What was /druid/v1?

cheddar · 2015-12-11T17:55:24Z

If we want to take the approach of trying to do a split/apply/combine/filter query to replace them "all", then I think having it at the query level like it is can make sense. So let's just leave it there and maybe try to get @vogievetsky to propose how he would prefer to specify his split/apply/combine/filter queries?

navis · 2015-12-14T00:42:59Z

Yes, descending-processing would be a part of execution environment for S/A/C/F, not to be accessed by user directly. But can we include this just for time-series queries for now? There are some requests for it.

cheddar · 2015-12-23T22:38:29Z

processing/src/main/java/io/druid/query/QueryToolChest.java

   * @return the sequence of merged results
   */
-  public abstract Sequence<ResultType> mergeSequences(Sequence<Sequence<ResultType>> seqOfSequences);
+  public abstract Sequence<ResultType> mergeSequences(Sequence<Sequence<ResultType>> seqOfSequences, boolean descending);


Technically, this interface is something that someone can extend in an extension, so this change is going to mean that this can't go out until 0.9.0 (which is our next planned release, so no big deal, really). I say this just to make sure that we tag this PR as 0.9.0 and include something in the release notes about the compatibility change.

cheddar · 2015-12-23T22:56:43Z

This generally makes sense to me. I think that the way you went about adding the descending property to all of the function calls is actually not necessary, but I won't know without messing around with the code a bit. I'm gonna try forking your branch and editing it up some to see if I'm smoking crack or not.

cheddar · 2015-12-24T01:55:05Z

Ok, yeah, I was right in that there was a simpler way to do things. It required a bit of butchering of interfaces.

Tthere were methods on QueryToolChest that shouldn't've been there. Those methods were breaking the abstraction and needed to be eliminated in order to make the simplified changes. It probably wasn't readily apparent that the problem was the bad methods on the interface, but once they are cleaned up, the code cleans up quite a bit. I did a PR against your PR branch, you can see the changes here:

navis#1

Let me know what you think.

navis · 2015-12-28T05:30:40Z

Merged @cheddar's patch and rebased on master. Let's see the test results.

cheddar · 2015-12-29T17:24:28Z

There's still two comments I'd like to see addressed, but once they are in I'll be 👍

fjy · 2015-12-29T18:13:48Z

@navis I have a comment to please update the documentation so people can know how to get results in reverse order

fjy · 2016-01-06T20:42:35Z

@nishantmonu51 do you have any more comments?

xvrl · 2016-01-06T21:21:30Z

processing/src/main/java/io/druid/query/BaseQuery.java

 {
+  public static <T> int getContextPriority(Query<T> query, int defaultValue)


all those method are marked as @deprecated in the Query interface, should we mark them deprecated here too?

all those methods in Query interface are changed to static method and that can be regarded as committing the deprecation, because it's not backward compatible anymore.

I see what you mean. We deprecated them because we planned avoid using string parsing going forward, but that may warrant a separate discussion. I'm fine leaving as is.

xvrl · 2016-01-07T00:14:16Z

processing/src/main/java/io/druid/segment/QueryableIndexStorageAdapter.java

+        return true;
+      }
+      long current = timestamps.getLongSingleValueRow(baseOffset.getOffset());
+      return current >= timeStart && current < timeEnd;


given that we technically only need to check one of those two conditions based on whether the query is descending or not, is it faster to do a check based on the descending flag, to always check both, or is there maybe a benefit to do the branching outside of the loop, i.e have something like a DescendingTimestampCheckingOffset?

Made Ascending/DescendingTimestampCheckingOffset

@navis we don't necessarily need to separate it out if it doesn't make a difference. My question was mainly whether we branch prediction would help us more than doing both checks, or maybe it doesn't make a difference at all, in which case we should leave the simplest code

fjy · 2016-01-07T20:57:54Z

@navis there's some merge conflicts now, hopefully they are small

fjy · 2016-01-10T17:25:14Z

@navis any chance of resolving merge conflicts and finishing this one up?

navis · 2016-01-11T00:40:57Z

@fjy fixed conflict and addressed comments.

xvrl · 2016-01-11T18:31:28Z

@navis latest changes looks fine to me, can we squash commits before merging. @cheddar there have been quite a few changes since your last thumbs up, can you have a second look?

navis · 2016-01-11T23:46:43Z

I'll squash commits when @cheddar approves.

guobingkun · 2016-01-12T23:15:48Z

processing/src/main/java/io/druid/segment/BitmapOffset.java

+  private static class ArrayIntIterator implements IntIterator {
+
+    private final int[] array;
+    private transient int index;


why is index transient?

It's some kind of habit of me. I'll remove that.

cheddar · 2016-01-13T01:48:31Z

I have verified that all of navis's changes since I looked last are good on the basic functionality. I did not verify that the caching looks good, but I'm happy with everyone else's eyes on that. so I'm 👍

navis · 2016-01-13T03:26:22Z

Rebased on trunk & squshed.

Support descending time ordering for time series query

xvrl · 2016-01-22T18:36:43Z

@fjy my comment here was not addressed https://github.com/druid-io/druid/pull/2014/files#r49129298
Can we please make sure to review all comments before hitting merge?

navis · 2016-01-25T08:13:22Z

@xvrl Totally my bad, sorry. I've added patch for it in #2326.

cheddar reviewed Nov 30, 2015
View reviewed changes

navis force-pushed the DRUID-2013 branch from 6ce1d9b to 9467f43 Compare December 1, 2015 08:38

navis force-pushed the DRUID-2013 branch from 9467f43 to c8fd649 Compare December 1, 2015 09:13

navis force-pushed the DRUID-2013 branch from c8fd649 to 2073148 Compare December 14, 2015 01:30

cheddar reviewed Dec 23, 2015
View reviewed changes

navis force-pushed the DRUID-2013 branch from ab54ef3 to 8d95952 Compare December 28, 2015 05:29

navis changed the title ~~[WIP] fix for #2013 Support descending ordered queries~~ Support descending time ordering for time series query Dec 28, 2015

navis force-pushed the DRUID-2013 branch 2 times, most recently from 802401a to 3c8bdb3 Compare December 30, 2015 08:46

navis force-pushed the DRUID-2013 branch from bda5272 to 29d10f6 Compare January 6, 2016 04:53

xvrl reviewed Jan 6, 2016
View reviewed changes

navis added the Release Notes label Jan 7, 2016

navis force-pushed the DRUID-2013 branch from 29d10f6 to f4efb1f Compare January 7, 2016 00:10

xvrl reviewed Jan 7, 2016
View reviewed changes

navis force-pushed the DRUID-2013 branch from 15a4e06 to 1f4d9e4 Compare January 11, 2016 00:37

guobingkun reviewed Jan 12, 2016
View reviewed changes

time-descending result of timeseries queries

18479bb

navis force-pushed the DRUID-2013 branch from 1f4d9e4 to 18479bb Compare January 13, 2016 03:25

fjy added a commit that referenced this pull request Jan 13, 2016

Merge pull request #2014 from navis/DRUID-2013

dfc631c

Support descending time ordering for time series query

fjy merged commit dfc631c into apache:master Jan 13, 2016

navis mentioned this pull request Jan 15, 2016

time-descending result of select queries #2271

Merged

fjy modified the milestone: 0.9.0 Feb 4, 2016

fjy mentioned this pull request Feb 5, 2016

druid-0.9.0 release notes #2404

Closed

navis deleted the DRUID-2013 branch February 13, 2016 03:01

		{
		public static <T> int getContextPriority(Query<T> query, int defaultValue)

Support descending time ordering for time series query #2014

Support descending time ordering for time series query #2014

Conversation

navis commented Nov 26, 2015

cheddar commented Nov 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheddar commented Nov 30, 2015

cheddar commented Nov 30, 2015

navis commented Dec 1, 2015

navis commented Dec 2, 2015

fjy commented Dec 2, 2015

vogievetsky commented Dec 2, 2015

navis commented Dec 4, 2015

cheddar commented Dec 9, 2015

fjy commented Dec 11, 2015

cheddar commented Dec 11, 2015

navis commented Dec 14, 2015

Choose a reason for hiding this comment

cheddar commented Dec 23, 2015

cheddar commented Dec 24, 2015

navis commented Dec 28, 2015

cheddar commented Dec 29, 2015

fjy commented Dec 29, 2015

fjy commented Jan 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjy commented Jan 7, 2016

fjy commented Jan 10, 2016

navis commented Jan 11, 2016

xvrl commented Jan 11, 2016

navis commented Jan 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheddar commented Jan 13, 2016

navis commented Jan 13, 2016

xvrl commented Jan 22, 2016

navis commented Jan 25, 2016