-
Notifications
You must be signed in to change notification settings - Fork 722
Description
Describe the bug
When running multiple (appears to be >8) parallel queries using wr.athena.read_sql_query() I get the following error (I'm using the multiprocessing package which is the first few lines of the traceback):
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.8/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/Cellar/python/3.7.8/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "<ipython-input-19-1ccc4a4f07b9>", line 11, in query_convert_save
wr.athena.read_sql_query(sql=query,database='sensor-data-ingest',boto3_session=sess).to_csv(f'{eq_type}/{s_id}.csv.gz',index=False,compression='gzip')
File "/usr/local/lib/python3.7/site-packages/awswrangler/_config.py", line 361, in wrapper
return function(**args)
File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_read.py", line 744, in read_sql_query
boto3_session=session,
File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_read.py", line 500, in _resolve_query_without_cache
boto3_session=boto3_session,
File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_read.py", line 385, in _resolve_query_without_cache_ctas
categories=categories,
File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_utils.py", line 176, in _get_query_metadata
_query_execution_payload = wait_query(query_execution_id=query_execution_id, boto3_session=boto3_session)
File "/usr/local/lib/python3.7/site-packages/awswrangler/athena/_utils.py", line 668, in wait_query
response = client_athena.get_query_execution(QueryExecutionId=query_execution_id)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the GetQueryExecution operation (reached max retries: 5): Rate exceeded
I suspect that the _QUERY_WAIT_POLLING_DELAY used in wait_query() here https://github.com/awslabs/aws-data-wrangler/blob/c3cd501f3606d7ea1ae41c7446b14cc7468dedc6/awswrangler/athena/_utils.py#L642
results in too many requests when used in parallel for >8.
As a feature request, it would be useful if I could manually set _QUERY_WAIT_POLLING_DELAY and/or if wait_query() would gracefully handle a ThrottlingException and keep polling, rather than killing the whole function.
To Reproduce
This was using:
awswrangler 1.10.1
boto3 1.14.56
botocore 1.19.17
all installed using pip.
To reproduce start >8 parallel queries in the same Workgroup that all take a non-trivial amount of time (mine all took ~100s).
Thank you!