Skip to content

ThrottlingException from wait_query when when running multiple Athena queries in parallel #465

@dhorkel

Description

@dhorkel

Describe the bug
When running multiple (appears to be >8) parallel queries using wr.athena.read_sql_query() I get the following error (I'm using the multiprocessing package which is the first few lines of the traceback):

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.8/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.7.8/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-19-1ccc4a4f07b9>", line 11, in query_convert_save
    wr.athena.read_sql_query(sql=query,database='sensor-data-ingest',boto3_session=sess).to_csv(f'{eq_type}/{s_id}.csv.gz',index=False,compression='gzip')
  File "/usr/local/lib/python3.7/site-packages/awswrangler/_config.py", line 361, in wrapper
    return function(**args)
  File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_read.py", line 744, in read_sql_query
    boto3_session=session,
  File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_read.py", line 500, in _resolve_query_without_cache
    boto3_session=boto3_session,
  File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_read.py", line 385, in _resolve_query_without_cache_ctas
    categories=categories,
  File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_utils.py", line 176, in _get_query_metadata
    _query_execution_payload = wait_query(query_execution_id=query_execution_id, boto3_session=boto3_session)
  File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_utils.py", line 668, in wait_query
    response = client_athena.get_query_execution(QueryExecutionId=query_execution_id)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the GetQueryExecution operation (reached max retries: 5): Rate exceeded

I suspect that the _QUERY_WAIT_POLLING_DELAY used in wait_query() here https://github.com/awslabs/aws-data-wrangler/blob/c3cd501f3606d7ea1ae41c7446b14cc7468dedc6/awswrangler/athena/_utils.py#L642
results in too many requests when used in parallel for >8.

As a feature request, it would be useful if I could manually set _QUERY_WAIT_POLLING_DELAY and/or if wait_query() would gracefully handle a ThrottlingException and keep polling, rather than killing the whole function.

To Reproduce
This was using:
awswrangler 1.10.1
boto3 1.14.56
botocore 1.19.17
all installed using pip.

To reproduce start >8 parallel queries in the same Workgroup that all take a non-trivial amount of time (mine all took ~100s).

Thank you!

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmajor releaseWill be addressed in the next major releaseready to release

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions