Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14273] Add integration tests for BQ JSON type for Python BigQueryIO connector #17431

Merged
merged 17 commits into from May 3, 2022

Conversation

ahmedabu98
Copy link
Contributor

BigQuery now natively supports JSON data type [1]. This PR adds integration tests for using the JSON data type with Python BigQueryIO read and write methods.

Batch loads currently only support CSV loads, so FILE_LOADS write method is not included.

[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/json-data#query_json_data

@asf-ci
Copy link

asf-ci commented Apr 21, 2022

Can one of the admins verify this patch?

3 similar comments
@asf-ci
Copy link

asf-ci commented Apr 21, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Apr 21, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Apr 21, 2022

Can one of the admins verify this patch?

@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #17431 (bfc5384) into master (e4d2050) will increase coverage by 0.05%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##           master   #17431      +/-   ##
==========================================
+ Coverage   73.83%   73.88%   +0.05%     
==========================================
  Files         686      689       +3     
  Lines       90143    90488     +345     
==========================================
+ Hits        66555    66857     +302     
- Misses      22406    22449      +43     
  Partials     1182     1182              
Flag Coverage Δ
python 83.70% <90.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery.py 63.87% <90.00%> (+0.23%) ⬆️
...pache_beam/runners/interactive/interactive_beam.py 78.54% <0.00%> (-3.28%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.84% <0.00%> (-2.12%) ⬇️
sdks/python/apache_beam/io/gcp/gcsfilesystem.py 88.23% <0.00%> (-1.77%) ⬇️
sdks/python/apache_beam/dataframe/frame_base.py 89.54% <0.00%> (-0.83%) ⬇️
sdks/python/apache_beam/transforms/core.py 91.84% <0.00%> (-0.81%) ⬇️
sdks/python/apache_beam/io/filesystem.py 88.76% <0.00%> (-0.65%) ⬇️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.63%) ⬇️
sdks/python/apache_beam/io/localfilesystem.py 90.97% <0.00%> (-0.50%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.03% <0.00%> (-0.39%) ⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4d2050...bfc5384. Read the comment docs.

@ahmedabu98
Copy link
Contributor Author

R: @chamikaramj
R: @johnjcasey
R: @pabloem

@@ -2190,6 +2190,12 @@ def expand(self, pcoll):
'A schema must be provided when writing to BigQuery using '
'Avro based file loads')

if self.schema and 'JSON' in str(self.schema):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm what if there's a string field called dataJSON or something like that? We should probably verify this differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test for this particular behavior please?

@pabloem
Copy link
Member

pabloem commented Apr 26, 2022

Looking at the test report from the postcommit, it doesn't seem like the test ran:
https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/443/testReport/

Maybe there's something missing?

@ahmedabu98
Copy link
Contributor Author

Run Python 3.8 PostCommit

@ahmedabu98
Copy link
Contributor Author

Looking at the test report from the postcommit, it doesn't seem like the test ran: https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/443/testReport/

From the test report (and also running locally), only a few of the tests that exist in apache_beam/io/gcp are running. For example, only 5/11 bigquery related tests are running.

Any ideas why this is the case? I don't see a list of included/excluded tests.

'insertion is currently not supported with '
'FILE_LOADS write method.')
elif field['type'] == 'STRUCT':
find_in_nested_dict(field)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is not supported for AVRO nor JSON file loads. Is that correct? Can you perhaps post a reference to BQ docs in the ValueError? This would make it easiest for customers to check when this issue surfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct. I can add the bq json type doc#batchloads to the message. Should i also mention "because Beam doesn't support CSV format"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... maybe not : P - wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like the sound of it either, but thought of suggesting because the link doesn't add that nuance (in case the user wants to know why it's not supported yet).

we could just mention that it is supported for the other write methods and put the link for more details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - yeah, let's do that!

@pabloem
Copy link
Member

pabloem commented Apr 29, 2022

Run Python PreCommit

@pabloem
Copy link
Member

pabloem commented May 2, 2022

there's a complaint from the autoformatter: https://ci-beam.apache.org/job/beam_PreCommit_PythonFormatter_Commit/10956/console

You can run it locally with tox -e py3-yapf

Copy link
Contributor

@johnjcasey johnjcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pabloem
Copy link
Member

pabloem commented May 3, 2022

Run Python 3.8 PostCommit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants