Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WriteToBigQuery ignores insert_retry_strategy on HttpErrors #21080

Closed
damccorm opened this issue Jun 4, 2022 · 8 comments
Closed

WriteToBigQuery ignores insert_retry_strategy on HttpErrors #21080

damccorm opened this issue Jun 4, 2022 · 8 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

insertAll will retry forever on a streaming pipeline running on 2.31.0, with insert_retry_strategy=RetryStrategy.RETRY_NEVER, and create_disposition=BigQueryDisposition.CREATE_NEVER.

Found while testing error handling for a pipeline by writing to a table that doesn't exist, ending up with no element in BigQueryWriteFn.FAILED_ROWS and these errors repeated in the logs:


Error message from worker: generic::unknown: Traceback (most recent call last):
  File "apache_beam/runners/common.py",
line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
  File "apache_beam/runners/common.py",
line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
  File "apache_beam/runners/common.py",
line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1268, in finish_bundle
    return self._flush_all_batches()
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1278, in _flush_all_batches
    for destination in list(self._rows_buffer.keys())
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1279, in <listcomp>
    if self._rows_buffer[destination]
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1312, in _flush_batch
    skip_invalid_rows=True)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 1125, in insert_rows
    project_id, dataset_id, table_id, final_rows, skip_invalid_rows)
  File
"/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
    return
fun(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 637, in _insert_all_rows
    response = self.client.tabledata.InsertAll(request)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
line 795, in InsertAll
    config, request, global_params=global_params)
  File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)

 File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse

   self.__ProcessHttpResponse(method_config, http_response, request))
  File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 604, in __ProcessHttpResponse
    http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpNotFoundError:
HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8',
'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection':
'0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length':
'344', '-content-encoding': 'gzip'}>, content <{
  "error": {
    "code": 404,
    "message": "Not
found: Table <REDACTED>:testdb__dbo__raw.customers",
    "errors": [
      {
        "message": "Not
found: Table <REDACTED>:testdb__dbo__raw.customers",
        "domain": "global",
        "reason":
"notFound"
      }
    ],
    "status": "NOT_FOUND"
  }
}
...

Possibly related to BEAM-12362. Had been running on 2.29.0 previously, which would send errors repeatedly with no trace:


There were errors inserting to BigQuery. Will not retry. Errors were []

2.31.0 is logging the errors but ignores retry strategy, preventing errors from being handled through FailedRows tag.

Imported from Jira BEAM-12783. Original Jira may contain additional context.
Reported by: ajdub980a.

@nervoussidd
Copy link

Hey Danny,
Can you please assign me this issue, as i would love to contribute in any manner.And one more request can please guide me while i am contributing.

@damccorm
Copy link
Contributor Author

damccorm commented Feb 1, 2023

Hey @nervoussidd you can find our contribution guide here - https://beam.apache.org/contribute/

For future reference, you can self assign issues by commenting .take-issue and a bot will assign it to you

@ajdub508
Copy link
Contributor

I am having trouble finding the higher level function that will handle this re-raised exception. Would anyone be able to point that out for me? I haven't found where that exception would lead to retry strategy evaluation.

Doing some research on this one and finding that the self.gcp_bq_client.insert_rows_json call here won't return a value in errors for errors such as a google.api_core.exceptions.NotFound error that is thrown when a table doesn't exist.

Those errors will raise a GoogleAPICallError, though, and they are caught and re-raised in the except here, with the expectation that it will be retried appropriately. I haven't tracked down where it will go from there, yet.

ajdub508 added a commit to ajdub508/beam that referenced this issue Aug 22, 2023
@ajdub508
Copy link
Contributor

.take-issue

@ajdub508
Copy link
Contributor

Submitted PR 28091, with comment here containing description of the fix and request for feedback.

ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 2, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 3, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 9, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 12, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 13, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 16, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 16, 2023
ajdub508 added a commit to ajdub508/beam that referenced this issue Sep 19, 2023
@ajdub508
Copy link
Contributor

This issue has been resolved with MR 28091.

@ahmedabu98
Copy link
Contributor

@ajdub508 you can close this issue by commenting ".close-issue" and a bot will close it

@ajdub508
Copy link
Contributor

.close-issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants