-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JsonPath functions in JsonPath expressions #11722
Conversation
This reverts commit 5a5484b.
CI failes when testing ORC extension which might has something to do with this PR. I will check it. |
Marked as WIP because a change to a binary ORC message file was lost. I have to re-do the change. |
Hi, @FrankChen021, we've run into this issue, and I was wondering if we can help get this PR over the finish line. Do you think you'd have the time to update the PR for the lost change to the ORC file? If not, we can take a look. At a quick glance, it seems like the failing assertion is: Assert.assertEquals("2", Iterables.getOnlyElement(row.getDimension("struct_list_struct_intlistLength"))); I'm wondering if this is a spurious check, and the assertion for |
@dkoepke There must be some reasons that I added a new field to the ORC example file and that check. But I can't remember now. I will take a look in this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FrankChen021 !
Fixes #11291
Description
This PR allows users to use JsonPath functions in JsonPath expressions during ingestion. Currently, JsonPath is used to extract values inside a JSON object. However, JsonPath supports a bunch of function expressions which are not supported by Druid now. For example, '$.property_name.length()' can be used to get the length of a Json array object 'property_name'. This function would be useful in some cases.
The reason why Druid does not support JsonPath functions is very simple, the original code assumes that the JsonPath expressions would return a Json object. However, if a JsonPath function is applied, the return type is the raw object instead of Json object.
So fixing the bug of 'length()' function also brings support to other functions. I also brings the support of these functions to orc/avro/parquet data format.
Following matrix shows the current supported JsonPath functions and corresponding data formats.
append
,concat
are not fully supported, because I don't see there's strong need to use them.keys()
is not supported because it's not supported by current JsonPath library used by Druid. And also I don't see there's a strong need to use this JsonPath function.This PR has: