You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a table that's exported from BigQuery to GCS via EXPORT DATA DML as json file. I believe the files are called nljson format.
I'm trying to setup a file source and a postgres destination. However when setting up the connection, I fail with "Failed to fetch schema. Please try again".
These are the reader_options:
{"lines":true}
I've also tried to use chunksize=1 and orient=records as additional options, but with no luck. The above reader option is enough to read the json in pandas...
This is the log:
2021-02-17 12:50:41 INFO (/tmp/workspace/41/0) WorkerRun(call):58 - Executing worker wrapper...
2021-02-17 12:50:41 INFO (/tmp/workspace/41/0) LineGobbler(voidCall):69 - Checking if airbyte/source-file:0.1.9 exists...
2021-02-17 12:50:41 DEBUG (/tmp/workspace/41/0) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/41/0 --network host airbyte/source-file:0.1.9 discover --config tap_config.json
2021-02-17 12:50:41 INFO (/tmp/workspace/41/0) LineGobbler(voidCall):69 - airbyte/source-file:0.1.9 was found locally.
2021-02-17 12:50:43 INFO (/tmp/workspace/41/0) DefaultAirbyteStreamFactory(internalLog):110 - Discovering schema of article_scoring at gs://mz-dev-recommendations/2021-02-17T09:28:03.184Z_articles_000000000000.json...
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) DefaultAirbyteStreamFactory(internalLog):108 - Failed to discover schemas of article_scoring at gs://mz-dev-recommendations/2021-02-17T09:28:03.184Z_articles_000000000000.json: JSONDecodeError('Extra data: line 2 column 1 (char 59)')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/source_file/source.py", line 121, in discover
streams = list(client.streams)
File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 339, in streams
"properties": self._stream_properties(),
File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 323, in _stream_properties
return self.load_nested_json_schema(fp)
File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 236, in load_nested_json_schema
builder.add_object(json.load(fp))
File "/usr/local/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.7/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 59)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - Traceback (most recent call last):
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/bin/base-python", line 8, in <module>
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - sys.exit(main())
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/base_python/entrypoint.py", line 135, in main
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - launch(source, sys.argv[1:])
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/base_python/entrypoint.py", line 120, in launch
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - AirbyteEntrypoint(source).start(args)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/base_python/entrypoint.py", line 104, in start
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - catalog = self.source.discover(logger, config)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/source.py", line 125, in discover
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - raise err
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/source.py", line 121, in discover
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - streams = list(client.streams)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 339, in streams
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - "properties": self._stream_properties(),
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 323, in _stream_properties
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - return self.load_nested_json_schema(fp)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/site-packages/source_file/client.py", line 236, in load_nested_json_schema
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - builder.add_object(json.load(fp))
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/json/__init__.py", line 296, in load
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - return _default_decoder.decode(s)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - File "/usr/local/lib/python3.7/json/decoder.py", line 340, in decode
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - raise JSONDecodeError("Extra data", s, end)
2021-02-17 12:50:43 ERROR (/tmp/workspace/41/0) LineGobbler(voidCall):69 - json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 59)
2021-02-17 12:50:44 DEBUG (/tmp/workspace/41/0) DefaultDiscoverCatalogWorker(run):99 - Discover job subprocess finished with exit code 1
Expected Behavior
Report from a user:
I have a table that's exported from BigQuery to GCS via EXPORT DATA DML as json file. I believe the files are called nljson format.
I'm trying to setup a file source and a postgres destination. However when setting up the connection, I fail with "Failed to fetch schema. Please try again".
These are the reader_options:
{"lines":true}
I've also tried to use chunksize=1 and orient=records as additional options, but with no luck. The above reader option is enough to read the json in pandas...
This is the log:
Steps to Reproduce
TODO
Severity of the bug for you
High - blocking for using airbyte
Airbyte Version
source file 0.1.9
Additional context
Slack conversation
The text was updated successfully, but these errors were encountered: