-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source File: fix schema generation for json files containing an array #16772
Source File: fix schema generation for json files containing an array #16772
Conversation
/test connector=connectors/source-file
Build PassedTest summary info:
|
if "items" in result: | ||
# this means we have a json list e.g. [{...}, {...}] | ||
# but need to emit schema of an inside dict | ||
result = result["items"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but is there any thing to do / other validation if the json file is an array of anything other than object?
For example, if the json file is just ["apple", "orange", "kiwi"]
this is probably invalid. (maybe a validation in check
should be added here, but feel free to make a different issue for this and tackle it separately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be valid
>>> builder = genson.SchemaBuilder()
>>> builder.add_object(['a', 'b', 'c'])
>>> builder.to_schema()["items"]
{'type': 'string'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the schema is technically valid for this, but what I mean is this will also cause issues with normalization since we expect each stream to have an object with properties as its schema (in this case it will just be a string). We probably shouldn't accept json files that aren't objects or an array of objects. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see now
sure it makes sense, I will create a separate issue to work on this problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/publish connector=connectors/source-file
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-file-secure
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…airbytehq#16772) * airbytehq#547 oncall Source File: fix schema generation for json files containing arrays * source file: upda changelog * airbytehq#547 oncall: source file - upgrade source-file-secure * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
…airbytehq#16772) * airbytehq#547 oncall Source File: fix schema generation for json files containing arrays * source file: upda changelog * airbytehq#547 oncall: source file - upgrade source-file-secure * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
What
https://github.com/airbytehq/oncall/issues/547
When a json file schema is discovered, it returns a schema of an array if it is a root element of a file.
How
Instead, return a schema of an inside dict elements