Skip to content

DRILL-7683: Add "message parsing" to new JSON loader#2045

Closed
paul-rogers wants to merge 2 commits intoapache:masterfrom
paul-rogers:DRILL-7683
Closed

DRILL-7683: Add "message parsing" to new JSON loader#2045
paul-rogers wants to merge 2 commits intoapache:masterfrom
paul-rogers:DRILL-7683

Conversation

@paul-rogers
Copy link
Copy Markdown
Contributor

@paul-rogers paul-rogers commented Mar 31, 2020

DRILL-7683: Add "message parsing" to new JSON loader

Description

Worked on a project that uses the new JSON loader to parse a REST response that includes a set of "wrapper" fields around the JSON payload. Example:

{ "status": "ok", "results: [ data here ]}

To solve this cleanly, added the ability to specify a "message parser" to consume JSON tokens up to the start of the data. This parser can be written as needed for each different data source.

When working on the REST data source, it became clear we need a no-code way to handle the same issue. So, extended the message parser to handle the simple case, a path to the data. In the above, the path would be just results. The path can contain any number of slash-separated elements: response/body/rows for example.

Since this change adds two more parameters to the JSON structure parser, added builders to gather the needed parameters rather than making the constructor even larger.

Note that, aside from the private plugin mentioned above, no other code uses the JSON loader yet.

Developer Documentation

This PR is part of the "new" V2 EVF-based JSON parser. An example of usage appears in PR #1892 (REST storage plugin.) To use the simple path-based form of message parsing, add the following option to the JSON parser builder:

    .dataPath("path/to/data")

The tail element should be the one that holds an array of JSON records.

To add custom message parsing (to check return status, say), use a different option of the builder:

      .messageParser(parser)

Then implement the MessageParser class to do the parsing. The present version works at the level of JSON tokens: you must use the provided "tokenizer" to read each token and do the right thing.

Since working at the token level is tedious, the goal is to provide a read-made parser that takes a path to the data, such as "response.data" and skips all fields except those in the path.

The goal here is to get the mechanism added to the JSON parser so we can then try it in the REST plugin and work out exactly what we need in that higher-level parser level.

User Documentation

N/A

Testing

Added unit tests. Reran all existing tests.

Copy link
Copy Markdown
Member

@arina-ielchiieva arina-ielchiieva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just two really minor comments.
@paul-rogers you can go ahead and squash right away after the fix, no need in additional CR commit.

Copy link
Copy Markdown
Contributor Author

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arina-ielchiieva, thanks much for the review. Addressed your comments. Also added a couple of small fixes. We just added this code to the HTTP plugin PR, where it seems to work fine. (Not yet reflected in that PR because I'm working out exactly how I push to Charles' branch...)

@paul-rogers
Copy link
Copy Markdown
Contributor Author

This PR also includes support for the ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER added to the "old" JSON parser in #1663.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

The Java 1.8 build failed in TestAggregateFunctions, however those work in my own build. Any idea what might be wrong?

@cgivre
Copy link
Copy Markdown
Contributor

cgivre commented Apr 1, 2020

@paul-rogers
I have an unrelated PR (#2040) that was also failing in the same spot. Not sure why.

@arina-ielchiieva
Copy link
Copy Markdown
Member

arina-ielchiieva commented Apr 1, 2020

+1, LGTM. Please squash the commits.

@paul-rogers let's for the future try to address comments without adding new changes, here is were just a couple of files, in other PRs were much more, a little hard for the code review to review new stuff with each commit :)

@cgivre
Copy link
Copy Markdown
Contributor

cgivre commented Apr 1, 2020

@paul-rogers
Thank you very much for submitting this. It's funny, the original REST plugin did this but not very well. This is a really useful functionality.

In any event, I don't want to hold up this PR, but would you mind please writing up a paragraph documenting this functionality?

@cgivre
Copy link
Copy Markdown
Contributor

cgivre commented Apr 3, 2020

@paul-rogers
Regarding the Java 1.8 build failing in TestAggregateFunctions, this problem seemed to go away after rebaseing. Not sure what it was, but the time_bucket PR and the REST plugin both now pass the tests.

@arina-ielchiieva
Copy link
Copy Markdown
Member

@paul-rogers could you please address @cgivre comment about documentation?

@paul-rogers
Copy link
Copy Markdown
Contributor Author

@arina-ielchiieva, thanks for the reminder. Updated the description. Basically, this is the first step toward providing an easier-to-use solution in the context of the REST API. Wanted to get this first step committed so we can try things out and figure out what we need in the next level, which is what developers will actually use.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

Build apparently failed during JDBC unit tests, perhaps due to timeout? Unlikely to be caused by this PR since nothing calls any of this code yet.

@arina-ielchiieva, anything else I should do for this one? Would be helpful if we can first merge this one, then we can rebase the REST API PR on top of the updated master (since that PR uses code from this PR.)

@arina-ielchiieva
Copy link
Copy Markdown
Member

@paul-rogers please squash the commits and see if all tests pass. If not, try to return the jobs, ideally all tests should pass.

Adds the ability to parse "extra" JSON around the data payload,
as often needed for a REST API.
@paul-rogers
Copy link
Copy Markdown
Contributor Author

@arina-ielchiieva, thanks for the review. Squashed commits. Builds cleanly in my environment.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

@arina-ielchiieva, @cgivre, since this is taken a while to review; went ahead and added the simplified form of the message parsing which we'll use in the REST storage plugin. If we can commit this one before the REST plugin, I'll go ahead and update that PR to use this feature.

@arina-ielchiieva
Copy link
Copy Markdown
Member

Changes look ok, but I don't think we should do this since PR has been already reviewed.
I will send follow-up email with the proposal how we can speed up merge process.

+1, LGTM.

@asfgit asfgit closed this in 912698b Apr 11, 2020
@paul-rogers paul-rogers reopened this Apr 18, 2020
cgivre pushed a commit to cgivre/drill that referenced this pull request Mar 25, 2026
Adds the ability to parse "extra" JSON around the data payload,
as often needed for a REST API.

closes apache#2045
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants