Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JsonPath functions in JsonPath expressions #11722

Merged
merged 14 commits into from
Dec 10, 2021

Conversation

FrankChen021
Copy link
Member

@FrankChen021 FrankChen021 commented Sep 19, 2021

Fixes #11291

Description

This PR allows users to use JsonPath functions in JsonPath expressions during ingestion. Currently, JsonPath is used to extract values inside a JSON object. However, JsonPath supports a bunch of function expressions which are not supported by Druid now. For example, '$.property_name.length()' can be used to get the length of a Json array object 'property_name'. This function would be useful in some cases.

The reason why Druid does not support JsonPath functions is very simple, the original code assumes that the JsonPath expressions would return a Json object. However, if a JsonPath function is applied, the return type is the raw object instead of Json object.

So fixing the bug of 'length()' function also brings support to other functions. I also brings the support of these functions to orc/avro/parquet data format.
Following matrix shows the current supported JsonPath functions and corresponding data formats.

Function Description Output type json orc avro parquet
min() Provides the min value of an array of numbers Double
max() Provides the max value of an array of numbers Double
avg() Provides the average value of an array of numbers Double
stddev() Provides the standard deviation value of an array of numbers Double
length() Provides the length of an array Integer
sum() Provides the sum value of an array of numbers Double
concat(X) Provides a concatenated version of the path output with a new item like input
append(X) add an item to the json path output array like input
keys() Provides the property keys (An alternative for terminal tilde ~) Set
  • append, concat are not fully supported, because I don't see there's strong need to use them.
  • keys() is not supported because it's not supported by current JsonPath library used by Druid. And also I don't see there's a strong need to use this JsonPath function.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@FrankChen021
Copy link
Member Author

CI failes when testing ORC extension which might has something to do with this PR. I will check it.

@FrankChen021
Copy link
Member Author

Marked as WIP because a change to a binary ORC message file was lost. I have to re-do the change.

@dkoepke
Copy link
Contributor

dkoepke commented Nov 18, 2021

Hi, @FrankChen021, we've run into this issue, and I was wondering if we can help get this PR over the finish line. Do you think you'd have the time to update the PR for the lost change to the ORC file? If not, we can take a look.

At a quick glance, it seems like the failing assertion is:

Assert.assertEquals("2", Iterables.getOnlyElement(row.getDimension("struct_list_struct_intlistLength")));

I'm wondering if this is a spurious check, and the assertion for struct_list_struct_middleListLength just below it is the intended check. I don't see a JSON path spec that'd produce the struct_list_struct_intlistLength in the row, but I'm definitely missing some context here.

@FrankChen021
Copy link
Member Author

@dkoepke There must be some reasons that I added a new field to the ORC example file and that check. But I can't remember now. I will take a look in this weekend.

@FrankChen021 FrankChen021 removed the WIP label Nov 23, 2021
@zachjsh zachjsh self-requested a review December 2, 2021 23:04
Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FrankChen021 !

@FrankChen021
Copy link
Member Author

@zachjsh @jon-wei Thanks for your review.

@FrankChen021 FrankChen021 merged commit 58245b4 into apache:master Dec 10, 2021
@FrankChen021 FrankChen021 deleted the json_path branch December 10, 2021 02:53
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
@pagrawal10 pagrawal10 mentioned this pull request Feb 9, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSONPath length() function does not work in flattenSpec
5 participants