Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "regex" extractionFn needs a "retainMissingValue" to be useful #2064

Closed
vogievetsky opened this issue Dec 8, 2015 · 4 comments
Closed

Comments

@vogievetsky
Copy link
Contributor

Ref: http://druid.io/docs/latest/querying/dimensionspecs.html

The current Regular Expression Extraction Function is almost useful but it needs the extra trimmings that can be found on the lookup extraction function. Specifically the fact that it "If there is no match, it returns the dimension value as is." is not useful. Ideally I want it to send anything that does not match the regexp to "null".

I believe that this functionality can be achieved (without breaking backwards compatibility) by adding the "retainMissingValue", "injective", and "replaceMissingValueWith" properties that can be found on the lookup (retainMissingValue should be true by default to preserve backwards computability).

This is my use case:

Say I have a dimensions of files the were downloaded from my web server:

Files:

  • index.html
  • the_end_is_near_2.html
  • kafka-0.6.2.tar.gz
  • kafka-0.6.1.tar.gz
  • kafka-0.5.9.tar.gz

I would like to extract the version number (make a derived dimension) at query time.
I want to run this regexp: (\d+\.\d+\.\d+) and I want index.html and the_end_is_near_2.html to be transformed to null (not kept as is).

@gianm
Copy link
Contributor

gianm commented Dec 9, 2015

Usually we have things be false by default, so maybe we should call retainMissingValue whatever the opposite of retainMissingValue is. Or make a new extraction function and deprecate "regex"…

@vogievetsky
Copy link
Contributor Author

Yep, I was aware of that when I suggested and honestly 'false' is a more useful default IMO but it would break backwards compatibility. I have a crazy alt. idea: leave "regex" like it is and make a new "regexp" function. Or call it "match".

I think being consistent about retainMissingValue should be top priority.

@jon-wei
Copy link
Contributor

jon-wei commented Dec 9, 2015

I'll take a look at this

@gianm
Copy link
Contributor

gianm commented Jul 21, 2016

Fixed by #2075

@gianm gianm closed this as completed Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants