Adding jsonExtractScalar function to extract field from json object #4597

xiangfu0 · 2019-09-09T09:06:43Z

Right now we could put a JSON blob into a string field.
This udf leverages JsonPath (https://github.com/json-path/JsonPath) DSL to read from a JSON string.

Adding Transform function: jsonExtractScalar

Function Syntax:

jsonExtractScalar(JSON_STRING_FIELD, JSON_PATH, OUTPUT_FORMAT)

Sample queries:

Select jsonExtractScalar(myJsonMapStr,'$.k1','STRING') from myTable  where jsonExtractScalar(myJsonMapStr,'$.k1','STRING') = 'value-k1-0'";

Select sum(jsonExtractScalar(complexMapStr,'$.k4.met','INT')) from myTable group by jsonExtractScalar(complexMapStr,'$.k1','STRING')

Adding Transform function: jsonExtractKey to extract the paths for a given pattern of a json.
E.g. for a given json:

{ "k1": "v1", "k2": "v2", "k3": "v3" }

The result of jsonExtractKey(jsonField, '$.*') will return a list a string of matched json paths pattern. In above example, the result is ["$['k1']", "$['k2']", "$['k3']"]

Normalized transform function to resolve both jsonExtractScalar and json_extract_scalar function names to the same function.

codecov-io · 2019-09-09T21:13:50Z

Codecov Report

Merging #4597 into master will increase coverage by 0.42%.
The diff coverage is 73.83%.

@@            Coverage Diff             @@
##           master    #4597      +/-   ##
==========================================
+ Coverage   66.08%   66.50%   +0.42%     
==========================================
  Files        1072     1077       +5     
  Lines       54668    54978     +310     
  Branches     8152     8213      +61     
==========================================
+ Hits        36125    36565     +440     
+ Misses      15895    15717     -178     
- Partials     2648     2696      +48

Impacted Files	Coverage Δ
...ava/org/apache/pinot/common/utils/SchemaUtils.java	`9.85% <0.00%> (ø)`
...ger/realtime/Server2ControllerSegmentUploader.java	`71.42% <ø> (ø)`
...edicate/BaseDictionaryBasedPredicateEvaluator.java	`54.16% <ø> (ø)`
...predicate/BaseRawValueBasedPredicateEvaluator.java	`87.87% <ø> (ø)`
...e/operator/dociditerators/MVScanDocIdIterator.java	`59.37% <14.28%> (-6.73%)`	⬇️
...not/core/data/recordtransformer/PinotDataType.java	`95.08% <33.33%> (-1.03%)`	⬇️
...m/function/JsonExtractScalarTransformFunction.java	`48.80% <48.80%> (ø)`
...rg/apache/pinot/broker/routing/RoutingManager.java	`80.91% <50.00%> (-0.24%)`	⬇️
...e/pinot/common/function/TransformFunctionType.java	`82.85% <50.00%> (-4.24%)`	⬇️
...indexsegment/generator/SegmentGeneratorConfig.java	`72.44% <50.00%> (+10.18%)`	⬆️
... and 103 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44b0a91...fa1bfdc. Read the comment docs.

...c/main/java/org/apache/pinot/core/operator/transform/function/JsonPathTransformFunction.java

mcvsubbu · 2019-09-10T20:40:53Z

Can you also add documentation on how json columns are to be used? I suppose the json columns cannot be used in filters (yet) right?

xiangfu0 · 2019-09-13T12:29:23Z

Will add more documents, this could be used in selection/filtering/groupby, please refer to integration tests.

Current implementation requires the column to be a string column and the content is the json string in order to be used（e.g. the AVRO field needs to be String type）. I'm still investigating how to directly ingest a record/map type into the field.

kishoreg

Now that we have support for udf in filter predicates, this can be an amazing feature.
Can we change the function names to match with presto

xiangfu0 · 2020-05-03T02:51:16Z

Now that we have support for udf in filter predicates, this can be an amazing feature.
Can we change the function names to match with presto

https://prestodb.io/docs/current/functions/json.html

https://docs.aws.amazon.com/athena/latest/ug/querying-JSON.html

Rename function name to json_extract_scalar also added a new function to extract json_extract_key to extract the paths for a given json.

kishoreg · 2020-05-10T20:44:16Z

Can you also add documentation on how json columns are to be used? I suppose the json columns cannot be used in filters (yet) right?

We can know after we added the support for expressions in the filter predicates

1. Fix the compilation error introduced because of the merge of apache#4597 and apache#5240 2. Fix the bug of not loading the range index if both inverted index and range index exist TODO: The range index triggeres another severe issue of accessing closed DataBuffer which can cause JVM crash. Will address in a separate PR

1. Fix the compilation error introduced because of the merge of #4597 and #5240 2. Fix the bug of not loading the range index if both inverted index and range index exist TODO: The range index triggeres another severe issue of accessing closed DataBuffer which can cause JVM crash. Will address in a separate PR

xiangfu0 force-pushed the jsonpath_function branch 2 times, most recently from 8e4c8ba to af7bdc1 Compare September 9, 2019 20:30

mcvsubbu reviewed Sep 10, 2019

View reviewed changes

...c/main/java/org/apache/pinot/core/operator/transform/function/JsonPathTransformFunction.java Outdated Show resolved Hide resolved

xiangfu0 force-pushed the jsonpath_function branch from af7bdc1 to 21e2810 Compare September 10, 2019 20:07

xiangfu0 force-pushed the jsonpath_function branch 2 times, most recently from f5c17ef to 0f94d9f Compare September 16, 2019 06:13

xiangfu0 changed the title ~~[WIP] Adding json_path function to extract field from json object~~ Adding json_path function to extract field from json object Sep 17, 2019

xiangfu0 requested review from Jackie-Jiang, kishoreg and mcvsubbu September 17, 2019 02:39

xiangfu0 force-pushed the jsonpath_function branch 5 times, most recently from 9e171e1 to 0644c0f Compare May 2, 2020 13:32

kishoreg reviewed May 2, 2020

View reviewed changes

xiangfu0 force-pushed the jsonpath_function branch 3 times, most recently from 3fca70e to 9ed2e19 Compare May 3, 2020 02:46

xiangfu0 changed the title ~~Adding json_path function to extract field from json object~~ Adding jsonExtractScalar function to extract field from json object May 3, 2020

xiangfu0 requested review from kishoreg and npawar May 3, 2020 02:55

xiangfu0 force-pushed the jsonpath_function branch from 9ed2e19 to fa1bfdc Compare May 10, 2020 12:16

kishoreg approved these changes May 10, 2020

View reviewed changes

xiangfu0 added 4 commits May 12, 2020 12:33

Adding json_path function to extract field from json object

9ac6c4c

Adding integration test

56ec6ce

Adding more tests

e03d8e7

Adding jsonPathKey Function

23a18e6

xiangfu0 force-pushed the jsonpath_function branch from 8a16890 to 78ae585 Compare May 12, 2020 19:33

Adding toJsonMapStr function

0004249

xiangfu0 force-pushed the jsonpath_function branch from 78ae585 to 0004249 Compare May 12, 2020 23:56

xiangfu0 merged commit a6fe685 into master May 13, 2020

xiangfu0 deleted the jsonpath_function branch May 13, 2020 17:37

Jackie-Jiang mentioned this pull request May 14, 2020

Fix the compilation error and bug introduced in Range Index #5389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding jsonExtractScalar function to extract field from json object #4597

Adding jsonExtractScalar function to extract field from json object #4597

xiangfu0 commented Sep 9, 2019 •

edited

codecov-io commented Sep 9, 2019 •

edited

mcvsubbu commented Sep 10, 2019

xiangfu0 commented Sep 13, 2019 •

edited

kishoreg left a comment

xiangfu0 commented May 3, 2020

kishoreg commented May 10, 2020

Adding jsonExtractScalar function to extract field from json object #4597

Adding jsonExtractScalar function to extract field from json object #4597

Conversation

xiangfu0 commented Sep 9, 2019 • edited

codecov-io commented Sep 9, 2019 • edited

Codecov Report

mcvsubbu commented Sep 10, 2019

xiangfu0 commented Sep 13, 2019 • edited

kishoreg left a comment

Choose a reason for hiding this comment

xiangfu0 commented May 3, 2020

kishoreg commented May 10, 2020

xiangfu0 commented Sep 9, 2019 •

edited

codecov-io commented Sep 9, 2019 •

edited

xiangfu0 commented Sep 13, 2019 •

edited