Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14273] Add integration tests for BQ JSON type for Python BigQueryIO connector #17431

Merged
merged 17 commits into from May 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 17 additions & 0 deletions sdks/python/apache_beam/io/gcp/bigquery.py
Expand Up @@ -2190,6 +2190,23 @@ def expand(self, pcoll):
'A schema must be provided when writing to BigQuery using '
'Avro based file loads')

if self.schema and type(self.schema) is dict:

def find_in_nested_dict(schema):
for field in schema['fields']:
if field['type'] == 'JSON':
raise ValueError(
'Found JSON type in table schema. JSON data '
'insertion is currently not supported with '
'FILE_LOADS write method. This is supported with '
'STREAMING_INSERTS. For more information: '
'https://cloud.google.com/bigquery/docs/reference/'
'standard-sql/json-data#ingest_json_data')
elif field['type'] == 'STRUCT':
find_in_nested_dict(field)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is not supported for AVRO nor JSON file loads. Is that correct? Can you perhaps post a reference to BQ docs in the ValueError? This would make it easiest for customers to check when this issue surfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct. I can add the bq json type doc#batchloads to the message. Should i also mention "because Beam doesn't support CSV format"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... maybe not : P - wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like the sound of it either, but thought of suggesting because the link doesn't add that nuance (in case the user wants to know why it's not supported yet).

we could just mention that it is supported for the other write methods and put the link for more details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - yeah, let's do that!


find_in_nested_dict(self.schema)

from apache_beam.io.gcp import bigquery_file_loads
# Only cast to int when a value is given.
# We only use an int for BigQueryBatchFileLoads
Expand Down