-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support filtering on long columns (including __time) #3180
Conversation
|
||
String jsFn = "function(x) { return(x === 'Wednesday' || x === 'Thursday') }"; | ||
assertFilterMatches( | ||
new JavaScriptDimFilter(Column.TIME_COLUMN_NAME, jsFn, exfn, JavaScriptConfig.getDefault()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we don't automatically convert to strings for javascript, can we add a JavaScriptDimFilter test that operates on the time column values directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xvrl I have one in the testTimeFilterAsLong()
test, the JavascriptDimFilter there compares directly on longs
@jon-wei can you add some docs also with an example doing filtering on a time column ? |
@nishantmonu51 I've added a section to the docs on filtering on __time |
a18ec5d
to
0e1cdda
Compare
### Filtering on the Timestamp Column | ||
Filters can also be applied to the timestamp column. The timestamp column has long millisecond values. | ||
|
||
To refer to the timestamp column, use the string "__time" as the dimension name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`__time`
would probably mess with the syntax highlighting less
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm cool, fixed the quotes
minor comments to be fixed, but 👍 after those are addressed |
**Example** | ||
|
||
Filtering on a long timestamp value: | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json highlighting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm Added json highlightning
@fjy I've moved the common logic to functions in Filters, addressed the other comments as well |
Ran the ValueMatcher-using benchmarks again, results haven't changed:
|
@@ -157,6 +163,45 @@ public void remove() | |||
); | |||
} | |||
|
|||
public static ValueMatcher getTimeValueMatcher( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling this getLongValueMatcher
so it can be used with other long dims too, when the time comes?
/** | ||
* Compound predicate class that accepts all supported types | ||
*/ | ||
public interface DruidPredicate extends Predicate<Object>, DruidLongPredicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like long predicate should extend druid predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i dont understand the purpose of interfaces with no methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjy To support separate String/Long ValueMatchers on the factory, the Filter side needs to pass in a different kind of predicate for each type, so this interface is used to combine the predicate implementations into a single object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a more descriptive name than DruidPredicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjy renamed this to DruidCompositePredicate
/** | ||
* Composite predicate class that can accept all supported types | ||
* | ||
* The apply() method inherited from Predicate<Object> is intended for String values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the Predicate<Object>
seems like premature generalization. We aren't getting any use out of this now (we don't have non-primitive Object types that filters support), and generally the filters all call toString on this object anyway. How do you feel about making this a Predicate<String>
?
going to change DruidCompositePredicate to DruidPredicateFactory, will update this PR |
* | ||
* A separate method is present for each supported primitive type to avoid boxing (for performance reasons) | ||
*/ | ||
public interface DruidCompositePredicate extends Predicate<Object>, DruidLongPredicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is unused now and could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm ah, my bad, should've put a WIP on the last commit, I wasn't quite done with the PR changes yet but wasn't expecting you to review so soon :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, okay, I'll wait for you to finish :)
Updated patch benchmarks in original comment |
public InFilter(String dimension, Set<String> values, ExtractionFn extractionFn) | ||
{ | ||
this.dimension = dimension; | ||
this.values = values; | ||
this.extractionFn = extractionFn; | ||
setLongValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it might be worth only doing this if a long predicate is actually requested. I could see the parsing taking a while for long IN filters. But make sure we only do this once even if getLongPredicate
is called many times.
@@ -44,6 +45,11 @@ | |||
private final String value; | |||
private final ExtractionFn extractionFn; | |||
|
|||
private Object initLock = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initLock should be final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm fixed non-final lock object
👍 after travis |
Stil 👍 from me |
Fixes #2816
This PR adds support for filtering on long columns, including __time, using the non-bitmap indexed column filtering support added by #3018.
This patch changes the interface of the ValueMatcherFactory regarding predicate handling. Filters will now create a DruidPredicateFactory, an object that can create a predicate suitable for each filterable column type (currently String and long only).
I have included some benchmarks to check performance of predicate matching, using a set of the affected filters in an OrFilter, applied during an IncrementalIndex read, also during a TimeseriesQuery with FilteredAggregators on both types of indexes.
Patch
Master
Benchmarks for basic queries are shown below:
Patch
Master