Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source File: Discovering a JSON file schema fails #2102

Closed
sherifnada opened this issue Feb 17, 2021 · 0 comments · Fixed by #2118
Closed

Source File: Discovering a JSON file schema fails #2102

sherifnada opened this issue Feb 17, 2021 · 0 comments · Fixed by #2118
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Feb 17, 2021

Expected Behavior

Report from a user:

I have a table that's exported from BigQuery to GCS via EXPORT DATA DML as json file. I believe the files are called nljson format.
I'm trying to setup a file source and a postgres destination. However when setting up the connection, I fail with "Failed to fetch schema. Please try again".

These are the reader_options:
{"lines":true}

I've also tried to use chunksize=1 and orient=records as additional options, but with no luck. The above reader option is enough to read the json in pandas...

This is the log:

2021-02-17 12:50:41 INFO (/tmp/workspace/41/0) WorkerRun(call):58 - Executing worker wrapper...
2021-02-17 12:50:41 INFO (/tmp/workspace/41/0) LineGobbler(voidCall):69 - Checking if airbyte/source-file:0.1.9 exists...
2021-02-17 12:50:41 DEBUG (/tmp/workspace/41/0) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/41/0 --network host airbyte/source-file:0.1.9 discover --config tap_config.json
2021-02-17 12:50:41 INFO (/tmp/workspace/41/0) LineGobbler(voidCall):69 - airbyte/source-file:0.1.9 was found locally.
2021-02-17 12:50:43 INFO (/tmp/workspace/41/0) DefaultAirbyteStreamFactory(internalLog):110 - Discovering schema of article_scoring at gs://mz-dev-recommendations/2021-02-17T09:28:03.184Z_articles_000000000000.json...
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) DefaultAirbyteStreamFactory(internalLog):108 - Failed to discover schemas of article_scoring at gs://mz-dev-recommendations/2021-02-17T09:28:03.184Z_articles_000000000000.json: JSONDecodeError('Extra data: line 2 column 1 (char 59)')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/source_file/source.py", line 121, in discover
streams = list(client.streams)
File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 339, in streams
"properties": self._stream_properties(),
File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 323, in _stream_properties
return self.load_nested_json_schema(fp)
File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 236, in load_nested_json_schema
builder.add_object(json.load(fp))
File "/usr/local/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.7/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 59)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - Traceback (most recent call last):
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/bin/base-python", line 8, in <module>
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - sys.exit(main())
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/base_python/entrypoint.py", line 135, in main
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - launch(source, sys.argv[1:])
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/base_python/entrypoint.py", line 120, in launch
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - AirbyteEntrypoint(source).start(args)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/base_python/entrypoint.py", line 104, in start
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - catalog = self.source.discover(logger, config)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/source.py", line 125, in discover
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - raise err
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/source.py", line 121, in discover
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - streams = list(client.streams)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 339, in streams
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - "properties": self._stream_properties(),
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 323, in _stream_properties
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - return self.load_nested_json_schema(fp)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 236, in load_nested_json_schema
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - builder.add_object(json.load(fp))
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/json/__init__.py", line 296, in load
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - return _default_decoder.decode(s)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/json/decoder.py", line 340, in decode
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - raise JSONDecodeError("Extra data", s, end)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 59)
2021-02-17 12:50:44 DEBUG (/tmp/workspace/41/0) DefaultDiscoverCatalogWorker(run):99 - Discover job subprocess finished with exit code 1

Steps to Reproduce

TODO

Severity of the bug for you

High - blocking for using airbyte

Airbyte Version

source file 0.1.9

Additional context

Slack conversation

@sherifnada sherifnada added the type/bug Something isn't working label Feb 17, 2021
@sherifnada sherifnada self-assigned this Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant