Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filters to use extraction functions #2690

Merged
merged 1 commit into from Apr 5, 2016
Merged

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Mar 19, 2016

This PR allows the user to define an extraction function on each of the filters (except SpatialFilter); the extraction function will be applied to the input before the filter criteria is applied.

This merges the SelectorFilter and ExtractionFilter into a single class (ExtractionDimFilter is deprecated in this patch, ExtractionFilter has been deleted).


Fixes #2643

Also fixes #2772, #2775, #2776

@jon-wei
Copy link
Contributor Author

jon-wei commented Mar 21, 2016

updated docs

@jon-wei jon-wei added this to the 0.9.1 milestone Mar 21, 2016
@jon-wei jon-wei changed the title Allow all filters to use extraction functions, allow FilteredAggregators to use all filter types Allow filters to use extraction functions Mar 23, 2016
@jon-wei
Copy link
Contributor Author

jon-wei commented Mar 23, 2016

rebased, removed conflicts with #2711

@navis
Copy link
Contributor

navis commented Mar 24, 2016

I think JavaScriptFilter and SelectorFilter also can extend DimensionPredicateFilter, removing some duplicated codes.


public DimensionPredicateFilter(
String dimension,
Predicate<String> predicate
Predicate<String> predicate,
ExtractionFn extractionFn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cause extraction itself is handled in predicate, I think we don't need this here (can be confusing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis thanks, extractionFn was unnecessary here, removed it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now it's necessary again, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's being used now

@gianm
Copy link
Contributor

gianm commented Mar 24, 2016

sorry @jon-wei, more conflicts!

@jon-wei
Copy link
Contributor Author

jon-wei commented Mar 24, 2016

@navis

re: changing SelectorFilter and JavascriptFilter to implement DimensionPredicateFilter

There are some performance considerations if that change were made:

  • getBitmapIndex() for DimensionPredicateFilter requires a scan through the column's dictionary values to determine which values matched the predicate. For SelectorFilter without an extraction function, the current getBitmapIndex() is much simpler/faster since it can just call selector.getBitmapIndex(dimension, value).
  • For JavascriptFilter, the current implementation allows getBitmapIndex() to reuse the Context object across the scan of the dictionary. Using the JavascriptPredicate with DimensionPredicateFilter's getBitmapIndex() function would result in a new Context object being created on every apply() call. There's a note regarding this performance consideration at:
    https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/segment/filter/JavaScriptFilter.java#L92

I think, if that change is desirable, it's better handled in a separate PR.

@jon-wei
Copy link
Contributor Author

jon-wei commented Mar 24, 2016

@gianm np, rebasing now

@jon-wei
Copy link
Contributor Author

jon-wei commented Mar 24, 2016

rebased

@navis
Copy link
Contributor

navis commented Mar 24, 2016

@jon-wei Pair enough. 👍

this.extractionFn = extractionFn;
}

public BoundDimFilter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this constructor needed? If it's only used for tests then usually we modify the tests rather than add another constructor.

(similar comment for all the other filters)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO one nice thing about only having one constructor is that it makes it harder to accidentally create an object with a critical parameter missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm removed extra constructors

@jon-wei jon-wei force-pushed the filter_support branch 2 times, most recently from 60a896a to a34c2f1 Compare April 1, 2016 23:11
@jon-wei jon-wei added the Bug label Apr 1, 2016
@jon-wei
Copy link
Contributor Author

jon-wei commented Apr 1, 2016

Addressed comments, rebasing now

This PR also fixes a few issues related to filtering, I've opened issues for them:
#2772
#2775
#2776

@jon-wei
Copy link
Contributor Author

jon-wei commented Apr 2, 2016

rebased

@@ -98,11 +102,25 @@ public DimFilter optimize()
{
if (this.getExtractionFn() instanceof LookupExtractionFn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return new SelectorDimFilter(dimension, value, extractionFn).optimize();?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm good idea, I'll change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm changed to use SelectorDimFilter optimize()


@JsonCreator
public InDimFilter(@JsonProperty("dimension") String dimension, @JsonProperty("values") List<String> values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be optimized in a similar way to the SelectorDimFilter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm Can we handle the InDimFilter optimization in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, you can file an issue for anything you don't think is appropriate for this PR.

@jon-wei jon-wei mentioned this pull request Apr 5, 2016
// in the lookup map. Match on the selector value as well.
// If the selector value is overwritten in the lookup map, don't add selector value to keys.
if (exFn.isRetainMissingValue() && mappingForValue == null) {
keys.add(convertedValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys isn't guaranteed to be mutable, so it'd be better to copy it and add convertedValue to the copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me other than that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gianm
Copy link
Contributor

gianm commented Apr 5, 2016

lgtm 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants