New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-14273] Add integration tests for BQ JSON type for Python BigQueryIO connector #17431
Conversation
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
a3c47f8
to
c1e41bc
Compare
Codecov Report
@@ Coverage Diff @@
## master #17431 +/- ##
==========================================
+ Coverage 73.83% 73.88% +0.05%
==========================================
Files 686 689 +3
Lines 90143 90488 +345
==========================================
+ Hits 66555 66857 +302
- Misses 22406 22449 +43
Partials 1182 1182
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
R: @chamikaramj |
90fd2e2
to
942b94e
Compare
942b94e
to
0acd961
Compare
@@ -2190,6 +2190,12 @@ def expand(self, pcoll): | |||
'A schema must be provided when writing to BigQuery using ' | |||
'Avro based file loads') | |||
|
|||
if self.schema and 'JSON' in str(self.schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm what if there's a string field called dataJSON
or something like that? We should probably verify this differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a unit test for this particular behavior please?
Looking at the test report from the postcommit, it doesn't seem like the test ran: Maybe there's something missing? |
Run Python 3.8 PostCommit |
From the test report (and also running locally), only a few of the tests that exist in apache_beam/io/gcp are running. For example, only 5/11 bigquery related tests are running. Any ideas why this is the case? I don't see a list of included/excluded tests. |
'insertion is currently not supported with ' | ||
'FILE_LOADS write method.') | ||
elif field['type'] == 'STRUCT': | ||
find_in_nested_dict(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is not supported for AVRO nor JSON file loads. Is that correct? Can you perhaps post a reference to BQ docs in the ValueError
? This would make it easiest for customers to check when this issue surfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is correct. I can add the bq json type doc#batchloads to the message. Should i also mention "because Beam doesn't support CSV format"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... maybe not : P - wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't like the sound of it either, but thought of suggesting because the link doesn't add that nuance (in case the user wants to know why it's not supported yet).
we could just mention that it is supported for the other write methods and put the link for more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right - yeah, let's do that!
Run Python PreCommit |
there's a complaint from the autoformatter: https://ci-beam.apache.org/job/beam_PreCommit_PythonFormatter_Commit/10956/console You can run it locally with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Run Python 3.8 PostCommit |
BigQuery now natively supports JSON data type [1]. This PR adds integration tests for using the JSON data type with Python BigQueryIO read and write methods.
Batch loads currently only support CSV loads, so FILE_LOADS write method is not included.
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/json-data#query_json_data