Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UnprocessedKeys bug when restoring from dynamodb #977

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

jfongatyelp
Copy link
Contributor

@jfongatyelp jfongatyelp commented Jun 18, 2024

This fixes an issue where we were appending a list of lists to a list of key dicts, which resulted in the following error since boto's batch_get_items expects only a list of key dicts:

Jun 17 17:07:22 tron3-uswest2bdevc tron[3313674]: ERROR pid=3313674 tid=140237946082880 tron.serialize.runstate.dynamodb_state_store _get_items:102 Encountered issues retrieving data from DynamoDB
Traceback (most recent call last):
  File "/opt/venvs/tron/lib/python3.8/site-packages/tron/serialize/runstate/dynamodb_state_store.py", line 91, in _get_items
    items.extend(resp.result()["Responses"][self.name])
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venvs/tron/lib/python3.8/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/venvs/tron/lib/python3.8/site-packages/botocore/client.py", line 974, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/opt/venvs/tron/lib/python3.8/site-packages/botocore/client.py", line 1048, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/opt/venvs/tron/lib/python3.8/site-packages/botocore/validate.py", line 381, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter RequestItems.pnw-devc-tron-state.Keys[0], value: [{'key': {'S': 'job_run_state connectors_batches.revenue_recognition_snapshot_daily.5'}, 'index': {'N': '0'}}, {'key': {'S': 'job_run_state connectors_batches.
revenue_recognition_snapshot_daily.9'}, 'index': {'N': '0'}}], type: <class 'list'>, valid types: <class 'dict'>
Jun 17 17:07:22 tron3-uswest2bdevc tron[3313674]: INFO pid=3313674 tid=140237962868288 tron.serialize.runstate.dynamodb_state_store _get_items:90 trying to grab items from resp ['_Future__get_result', '__class__', '__delattr__', '__dic
t__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_condition', '_done_callbacks', '_exception', '_invoke_callbacks', '_result', '_state', '_waiters', 'add_done_callback', 'cancel', 'cancelled', 'done', 'except
ion', 'result', 'running', 'set_exception', 'set_result', 'set_running_or_notify_cancel']
Jun 17 17:07:22 tron3-uswest2bdevc tron[3313674]: ERROR pid=3313674 tid=140238214373824 tron.serialize.runstate.statemanager restore:169 Unable to restore state for connectors_batches.revenue_recognition_snapshot_daily - exiting to avo
id corrupting data.

We suspect that we've been avoiding this issue because we chunk at 100 keys ourselves before sending to batch_get_items, but this particular job has an extremely large command and therefore set of data and we may have reached the 16MB limit first.

This patch allowed tron to restart cleanly in pnw-devc.

boto3 docs seem to imply Keys will always be an array, so extend seems ot be the right call here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/batch_get_item.html

UnprocessedKeys (dict) –

A map of tables and their respective keys that were not processed with the current response. The UnprocessedKeys value is in the same form as RequestItems, so the value can be provided directly to a subsequent BatchGetItem operation. For more information, see RequestItems in the Request Parameters section.

Each element consists of:

Keys - An array of primary key attribute values that define specific items in the table.

@jfongatyelp jfongatyelp merged commit 0a599d4 into master Jun 18, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants