Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry boto.swf connection to avoid frequent errors when using IAM roles #99

Merged
merged 2 commits into from
Jun 23, 2016

Conversation

jbbarth
Copy link
Collaborator

@jbbarth jbbarth commented Jun 23, 2016

This PR implements a dumb "retry" solution to #26. Centralizing things is not easy in a multi-process environment, and the built-in boto option is config dependent and I'm not sure how to handle it properly, while the proposed solution works in multiple cases.

I tested it by running the following script:

from swf.core import ConnectedSWFObject

for i in xrange(0, 1000):
    conn = ConnectedSWFObject()

... in parallel: I triggered it from the shell and left it in the background a dozen times, like this:

python foo.py &

Hence I had many concurrent processes trying to connect to SWF.

Without the fix, I got the following error once:

Traceback (most recent call last):
  File "foo.py", line 4, in <module>
    conn = ConnectedSWFObject()
  File "/usr/local/lib/python2.7/dist-packages/swf/core.py", line 40, in __init__
    boto.swf.connect_to_region(self.region, **settings_))
  File "/usr/local/lib/python2.7/dist-packages/boto/swf/__init__.py", line 45, in connect_to_region
    return region.connect(**kw_params)
  File "/usr/local/lib/python2.7/dist-packages/boto/regioninfo.py", line 187, in connect
    return self.connection_cls(region=self, **kw_params)
  File "/usr/local/lib/python2.7/dist-packages/boto/swf/layer1.py", line 85, in __init__
    debug, session_token, profile_name=profile_name)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 555, in __init__
    profile_name)
  File "/usr/local/lib/python2.7/dist-packages/boto/provider.py", line 200, in __init__
    self.get_credentials(access_key, secret_key, security_token, profile_name)
  File "/usr/local/lib/python2.7/dist-packages/boto/provider.py", line 376, in get_credentials
    self._populate_keys_from_metadata_server()
  File "/usr/local/lib/python2.7/dist-packages/boto/provider.py", line 395, in _populate_keys_from_metadata_server
    self._access_key = security['AccessKeyId']
TypeError: string indices must be integers, not str

And the following error 5 times:

Traceback (most recent call last):
  File "foo.py", line 4, in <module>
    conn = ConnectedSWFObject()
  File "/usr/local/lib/python2.7/dist-packages/swf/core.py", line 40, in __init__
    boto.swf.connect_to_region(self.region, **settings_))
  File "/usr/local/lib/python2.7/dist-packages/boto/swf/__init__.py", line 45, in connect_to_region
    return region.connect(**kw_params)
  File "/usr/local/lib/python2.7/dist-packages/boto/regioninfo.py", line 187, in connect
    return self.connection_cls(region=self, **kw_params)
  File "/usr/local/lib/python2.7/dist-packages/boto/swf/layer1.py", line 85, in __init__
    debug, session_token, profile_name=profile_name)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 569, in __init__
    host, config, self.provider, self._required_auth_capability())
  File "/usr/local/lib/python2.7/dist-packages/boto/auth.py", line 987, in get_auth_handler
    'Check your credentials' % (len(names), str(names)))
boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials

This is also an error we get from time to time in production.

With the fix, the retries kicked in so I just got notifications of the retries on stdout:

2016-06-23T13:32:26 INFO [process=MainProcess, pid=20239]: error "string indices must be integers, not str": retrying in 0 seconds
2016-06-23T13:32:29 INFO [process=MainProcess, pid=20263]: error "string indices must be integers, not str": retrying in 0 seconds
2016-06-23T13:32:33 INFO [process=MainProcess, pid=20263]: error "string indices must be integers, not str": retrying in 0 seconds
2016-06-23T13:32:55 INFO [process=MainProcess, pid=20239]: error "No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials": retrying in 0 seconds
2016-06-23T13:33:06 INFO [process=MainProcess, pid=20248]: error "string indices must be integers, not str": retrying in 0 seconds
2016-06-23T13:33:16 INFO [process=MainProcess, pid=20239]: error "string indices must be integers, not str": retrying in 0 seconds

I'm not sure it will be easy to test that properly in the test suite, so I leave it with no test (the "retry" decorator itself is tested obviously).

poke @ybastide @benjastudio

@ybastide
Copy link
Contributor

The retry decorator should use %.2f to display the delay 😉

👍

@jbbarth
Copy link
Collaborator Author

jbbarth commented Jun 23, 2016

The retry decorator should use %.2f to display the delay 😉

Oh, yep. I add it thanks.

@jbbarth jbbarth merged commit cf55937 into master Jun 23, 2016
@jbbarth jbbarth deleted the bugfix/26/add-retries-around-boto-connection branch June 23, 2016 14:00
@jbbarth jbbarth mentioned this pull request Jun 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants