DRILL-7683: Add "message parsing" to new JSON loader#2045
DRILL-7683: Add "message parsing" to new JSON loader#2045paul-rogers wants to merge 2 commits intoapache:masterfrom
Conversation
arina-ielchiieva
left a comment
There was a problem hiding this comment.
Overall looks good, just two really minor comments.
@paul-rogers you can go ahead and squash right away after the fix, no need in additional CR commit.
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/loader/JsonLoaderImpl.java
Outdated
Show resolved
Hide resolved
...va-exec/src/test/java/org/apache/drill/exec/store/easy/json/parser/TestJsonParserBasics.java
Outdated
Show resolved
Hide resolved
paul-rogers
left a comment
There was a problem hiding this comment.
@arina-ielchiieva, thanks much for the review. Addressed your comments. Also added a couple of small fixes. We just added this code to the HTTP plugin PR, where it seems to work fine. (Not yet reflected in that PR because I'm working out exactly how I push to Charles' branch...)
|
This PR also includes support for the |
|
The Java 1.8 build failed in |
|
@paul-rogers |
|
+1, LGTM. Please squash the commits. @paul-rogers let's for the future try to address comments without adding new changes, here is were just a couple of files, in other PRs were much more, a little hard for the code review to review new stuff with each commit :) |
|
@paul-rogers In any event, I don't want to hold up this PR, but would you mind please writing up a paragraph documenting this functionality? |
|
@paul-rogers |
|
@paul-rogers could you please address @cgivre comment about documentation? |
|
@arina-ielchiieva, thanks for the reminder. Updated the description. Basically, this is the first step toward providing an easier-to-use solution in the context of the REST API. Wanted to get this first step committed so we can try things out and figure out what we need in the next level, which is what developers will actually use. |
|
Build apparently failed during JDBC unit tests, perhaps due to timeout? Unlikely to be caused by this PR since nothing calls any of this code yet. @arina-ielchiieva, anything else I should do for this one? Would be helpful if we can first merge this one, then we can rebase the REST API PR on top of the updated master (since that PR uses code from this PR.) |
|
@paul-rogers please squash the commits and see if all tests pass. If not, try to return the jobs, ideally all tests should pass. |
Adds the ability to parse "extra" JSON around the data payload, as often needed for a REST API.
|
@arina-ielchiieva, thanks for the review. Squashed commits. Builds cleanly in my environment. |
|
@arina-ielchiieva, @cgivre, since this is taken a while to review; went ahead and added the simplified form of the message parsing which we'll use in the REST storage plugin. If we can commit this one before the REST plugin, I'll go ahead and update that PR to use this feature. |
|
Changes look ok, but I don't think we should do this since PR has been already reviewed. +1, LGTM. |
Adds the ability to parse "extra" JSON around the data payload, as often needed for a REST API. closes apache#2045
DRILL-7683: Add "message parsing" to new JSON loader
Description
Worked on a project that uses the new JSON loader to parse a REST response that includes a set of "wrapper" fields around the JSON payload. Example:
To solve this cleanly, added the ability to specify a "message parser" to consume JSON tokens up to the start of the data. This parser can be written as needed for each different data source.
When working on the REST data source, it became clear we need a no-code way to handle the same issue. So, extended the message parser to handle the simple case, a path to the data. In the above, the path would be just
results. The path can contain any number of slash-separated elements:response/body/rowsfor example.Since this change adds two more parameters to the JSON structure parser, added builders to gather the needed parameters rather than making the constructor even larger.
Note that, aside from the private plugin mentioned above, no other code uses the JSON loader yet.
Developer Documentation
This PR is part of the "new" V2 EVF-based JSON parser. An example of usage appears in PR #1892 (REST storage plugin.) To use the simple path-based form of message parsing, add the following option to the JSON parser builder:
The tail element should be the one that holds an array of JSON records.
To add custom message parsing (to check return status, say), use a different option of the builder:
Then implement the
MessageParserclass to do the parsing. The present version works at the level of JSON tokens: you must use the provided "tokenizer" to read each token and do the right thing.Since working at the token level is tedious, the goal is to provide a read-made parser that takes a path to the data, such as "response.data" and skips all fields except those in the path.
The goal here is to get the mechanism added to the JSON parser so we can then try it in the REST plugin and work out exactly what we need in that higher-level parser level.
User Documentation
N/A
Testing
Added unit tests. Reran all existing tests.