-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCP] [BigQuery] Handle totalBytesProcessed
NoneType
#27474
Conversation
In addition to the report in #22701, we started seeing the same failure in our pipelines. Error message from worker: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1571, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1454, in process
for part, size in self.restriction_provider.split_and_size(
File "/usr/local/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 331, in split_and_size
for part in self.split(element, restriction):
File "/usr/local/lib/python3.9/site-packages/apache_beam/io/iobase.py", line 1641, in split
estimated_size = restriction.source().estimate_size()
File "/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py", line 870, in estimate_size
size = int(job.statistics.totalBytesProcessed)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' |
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Codecov Report
@@ Coverage Diff @@
## master #27474 +/- ##
==========================================
+ Coverage 71.12% 71.17% +0.04%
==========================================
Files 860 861 +1
Lines 104573 104523 -50
==========================================
+ Hits 74378 74390 +12
+ Misses 28638 28585 -53
+ Partials 1557 1548 -9
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 28 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Assigning reviewers. If you would like to opt out of this review, comment R: @AnandInguva for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
1 similar comment
Assigning reviewers. If you would like to opt out of this review, comment R: @AnandInguva for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @AnandInguva @ahmedabu98 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM and is in line with BoundedSource documentation:
beam/sdks/python/apache_beam/io/iobase.py
Lines 156 to 158 in b54bf52
Returns: | |
estimated size of the source if the size can be determined, ``None`` | |
otherwise. |
@ahmedabu98 @Abacn Thank you for the approval and merge ❤️ |
Hi, 2.50.0 is scheduled in early September |
* [GCP] [BigQuery] Handle totalBytesProcessed NoneType * Update CHANGES.md * lint / whitespace --------- Co-authored-by: Yi Hu <yathu@google.com>
fixes #22701
Some queries may not have access to
totalBytesProcessed
as a result of row-level security.Per their docs:
If any maintainer has some advice on where a good place to implement tests for this is, please let me know :)