-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-8204: Allow Provided Schema for HTTP Plugin in JSON Mode #2526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also support Specifying the Schema as Table Function Parameter and resolve the target schema if they both are specified, so some users could customize data types without accessing storage configs.
This comment is similar to the last one from @vvysotskyi except that I was experimenting with the metastore. While I cannot see that
Note that the above command works for local CSV but I got a Calcite error for a local JSON file. I did not test it with the HTTP plugin. Some information about provided schema priority, should it be of interest: https://drill.apache.org/docs/using-drill-metastore/#schema-priority. |
Metastore with JSON should also work fine, here is the unit test that checks it: |
@vvysotskyi I think that the HTTP plugin makes use of the same readers as the easy format plugins (CSV, JSON, XML)? Does that mean that metastore might already work with HTTP, or are there likely to be pieces missing? |
@vvysotskyi Thanks for the review. I added the logic to the I created DRILL-8205 to address this. I left the logic and a unit test so once we have a fix for DRILL-8205 it should all work as expected. |
@vvysotskyi Thanks for your comments and assistance! I got this to work and the HTTP plugin now supports inline schema! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making changes, +1
…e#2526) * Initial commit * Map working * WIP * Added builder * Lists in maps working * Add documentation * Cleaned up UT * Final Revision * Fix checkstyle * Minor tweak * removed extra test file * Removed unused import * Added inline schema support * Addressed review comments * Removed unused import * Removed json string * Final Revisions * Fixed unit test
DRILL-8204: Allow Provided Schema for HTTP Plugin in JSON Mode
Description
See below. 👇
Documentation
Schema Provisioning
One of the challenges of querying APIs is inconsistent data. Drill allows you to provide a schema for individual endpoints. You can do this in one of three ways:
The schema provisioning currently supports complex types of Arrays and Maps at any nesting level.
Example Schema Provisioning:
Dealing With Inconsistent Schemas
One of the major challenges of interacting with JSON data is when the schema is inconsistent. Drill has a
UNION
data type which is marked as experimental. At the time ofwriting, the HTTP plugin does not support the
UNION
, however supplying a schema can solve a lot of those issues.Json Mode
Drill offers the option of reading all JSON values as a string. While this can complicate downstream analytics, it can also be a more memory-efficient way of reading data with
inconsistent schema. Unfortunately, at the time of writing, JSON-mode is only available with a provided schema. However, future work will allow this mode to be enabled for
any JSON data.
Enabling JSON Mode:
You can enable JSON mode simply by adding the
drill.json-mode
property with a value ofjson
to a field, as shown below:Testing
Added unit tests.