-
Notifications
You must be signed in to change notification settings - Fork 923
[LIBCLOUD-728] Add SSLError to retry decorator exceptions #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks. Can you please provide more context on this change? Why would we want to retry on an SSL exception? SSL exception usually indicates some deeper issue which shouldn't simply be ignored and retried. As far as the tests go, yeah, they are currently not really testing retries (see #515 (comment)). /cc @cryptickp |
1499f11 to
1770df0
Compare
|
I see this exception somewhat frequently when using |
|
I see, thanks for the clarification. We should dig in and see from which exception it inherits. We definitely shouldn't catch If we can't find a good base exceptions which doesn't have negative security implications we could allow user to pass a callable to the |
|
Last I checked, there isn't a particularly rich hierarchy of SSL exceptions; in fact, it looks as though the base exception is the one being thrown. I don't see that it exacerbates security issues. We're not masking or ignoring the exception; just retrying the call until it times out. A bad cert will always keep throwing exceptions and will eventually time out, raising the original |
|
Note that there are GCE users reporting this error also. We're still investigating, but from the user's perspective, HTTP response is not coming back and after 180 seconds (default timeout), the See this thread: https://groups.google.com/forum/#!topic/gce-discussion/LSPnKn9a4zk FWIW, Google errors are not being reported via any other clients other than libcloud. Still researching... |
|
@Kami @thesquelched This test actually test retries, got tied up with other stuff but will fix incorrect test soon. |
Yeah, the "problem" is (or it might be) that on the first call an SSL exception is thrown (e.g. invalid cert or similar), but later on when we retry a different exception is thrown (e.g. timeout or connection refused). The retry code right now on timeout re-throws the last exception (by design) which means it could potentially "mask" the original SSL exception. This might not be an issue, but I'm always careful when I touch security related code and it's also possible that we are missing some other potentially dangerous edge cases or scenario. @pquerna @alex It would be great if you guys can have a look and chime in as well. |
That's rather weird, indeed. We don't do anything much different than other libraries. I'm kinda curious which version of the OpenSSL library those people are using. |
|
Is there no way to retry |
|
@alex Yeah, that's what I suggested above (also checking the message in addition to the exception type). |
1770df0 to
f5470f8
Compare
|
@Kami I added the |
|
Update on this? |
|
I saw this mentioned in irc today as not making the release. Afaict, @thesquelched addressed all the issues brought up. If there's additional feedback, please let him have it so we can get this fixed. |
|
I think this is fine, although I don't really understand the testcase. That often means it is too complex (too much mocking). I'm used to pretty heavy mocking but it took me three times to get it. Maybe some docs on how you mock the inner request but test the outer request? |
|
Also, sorry about the delay |
|
And finally, a big thank you for the effort. It can be disheartening when your work doesn't make it into a release, so. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have a fallback to raise the original exception? Dunno how I missed that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does now
f5470f8 to
4c7ab1a
Compare
|
@Kami @allardhoeve Any chance of this getting merged before the next release? Since it's only enabled by configuration, it should be safe even if it's not 100% perfect (although I believe @thesquelched has addressed all the issues now). |
|
seems fine to me |
|
Made some minor style changes and merged it intro trunk. Sorry for the delay and thanks! |
|
It looks like this issue could be related to the same underlying root cause which is described here - https://libcloud.apache.org/blog/2016/01/14/notice-for-linode-users.html As it only happens sometimes it could mean that one of the load-balancers the request is routed to only supports TLS >= 1.1 (or similar) and that's why the exception can't consistently be reproduced. If that's indeed the case we should revert this change - I was already very skeptical about this change from the beginning since it looks like it's just masking the symptoms and not addressing the actual root cause. |
|
That's getting |
Add
ssl.SSLErrorto the list of exceptions thatlibcloud.utils.misc.retrycatches, since thatSSLErrorcan be a transient error. Also:retrya bit, fixing an issue in which keyword arguments defaults are not appliedSee: https://issues.apache.org/jira/browse/LIBCLOUD-728