Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should we deprecate fieldExpression and extractionFn field in various Aggregators, DimensionSpecs, Filters etc #8242

Open
himanshug opened this issue Aug 5, 2019 · 6 comments

Comments

@himanshug
Copy link
Contributor

himanshug commented Aug 5, 2019

they were added before VirtualColumn was introduced and I think a better alternative now could be to have a extractionFn based VirtualColumn that could be used and expression based virtual column exists already.
That way, all the filter and aggregator implementations don't need to handle them in different places.

@clintropolis
Copy link
Member

sgtm 👍

@gianm
Copy link
Contributor

gianm commented Aug 6, 2019

I’m a bit nervous about removing something so widely used. I wonder if we can introduce a compat shim that translates these kinds of filters?

@himanshug
Copy link
Contributor Author

@gianm I think we could do something on the line of rewriting the query in early stage of processing pipeline at broker to... for each aggregator/filter/whatever, auto generate the virtual columns spec and set extractionFn field to null in it.

that said, doing and testing it is overhead if we decide to "deprecate and remove" it. OTOH if you mean that we should not deprecate extractionFn field and will always keep the compatibility layer, then it may make sense to write it.

@gianm
Copy link
Contributor

gianm commented Aug 6, 2019

IMO if the deprecation cycle is long-ish (multiple release cycles) and generally aligned with moving most users to SQL then it makes sense to me. The thoughts guiding me here are:

  • ExtractionFns are generally a nicer API for users writing native queries manually than virtual columns.
  • If most users that write queries manually switch to SQL, then the above point doesn't matter, since it's handled by the SQL layer.
  • Virtual columns are probably a nicer API for users that write native queries programmatically.
  • Getting rid of ExtractionFns on DimensionSpec and DimFilter interfaces would simplify the implementations, which is great. We would need to review to make sure we aren't losing any efficiency (this is a nice test of how good the VirtualColumn interface is).
  • Druid users that have already written integrations based on extractionFns are going to have a hard time upgrading to a Druid version that removes them. The feature has been around for a long time and is widely used, so we should give people a generous amount of time to migrate off.
  • A compatibility layer could be useful, it would be very user-friendly but complicate the implementation. Would it be worth the de-complexification of removing the need for individual filters, dimension specs, selectors, etc, from needing to worry about extractionFns? I'm not sure now, but it might be. If so I would consider it. If not then probably just do a long deprecation cycle.

We could consider getting rid of the expression field on AggregatorFactories too. Similar observations apply.

@himanshug
Copy link
Contributor Author

I agree with all observations you made and road to "remove" will be a very long one as plenty of time needs to be given before they are removed.

main motivator for me really is to simplify the implementation of filters and aggregators etc. I don't have anything against the API except limiting the ways of doing same things in API or else user is confused whether to use ExtractionFnVirtualColumn or to use extractionFn field, what is the difference etc etc.

We could consider getting rid of the expression field on AggregatorFactories too. Similar observations apply.

actually I wrote this issue while working on #8243 where the context was actually expression and forgot to mention that one.
updating the title and description.

@himanshug himanshug changed the title should we deprecate extractionFn field in various Aggregators and Filters should we deprecate fieldExpression and extractionFn field in various Aggregators, Filters etc Aug 6, 2019
@himanshug himanshug changed the title should we deprecate fieldExpression and extractionFn field in various Aggregators, Filters etc should we deprecate fieldExpression and extractionFn field in various Aggregators, DimensionSpecs, Filters etc Sep 3, 2019
@himanshug
Copy link
Contributor Author

I did a bit more digging and found that many filter implementations exploit the fact that whether they are working on values with extraction fn applied and make few optimizations . so, I think it is ok for filters to have extraction fn as fields.
however, DimensionSpec also has knowledge of extractionFn and getExtractionFn() method, which looks like replaceable by usage of the future ExtractionFnVirtualColumn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants