time-box the docker client to avoid ever getting stuck#773
Conversation
|
Would you still have a stuck thread? If so, doesn't this just hide a resource leak and eventually you could run out of threads or HTTP connections? |
|
From my testing while things were in the stuck state, interrupting the thread / killing the process will still stop it correctly. Maybe stuck was a bad word, the docker daemon call just never returns and never times out |
|
Interesting, maybe we should PR the spotify docker client if it's not respecting timeouts |
|
spotify docker client is just using apache http client underneath, default timeouts for connect and read get set to 5 and 30 seconds. I think the current bug with the docker daemon is that it keeps the connection open in such a way that the timeouts do no get hit, thus the usage of TimeLimiter here instead (docker cli calls and calls to docker daemon from any source/client hang as well) The bug is a rare case and the purpose of the PR is more to limit our executor from being pinned by it. If we can't launch something because docker is in a bad state, call it failed and move on. |
There was a problem hiding this comment.
could we include the timeout duration here? (i.e. Timed out trying to reach docker daemon after N seconds)
There was a problem hiding this comment.
You should use the {} instead of String.format
time-box the docker client to avoid ever getting stuck
It seems that we can get into an odd state when contacting the docker daemon. Even with the appropriate timeouts (default apache client in this case) set, we can still get into a situation where we try to contact the docker daemon then hang there forever. While the root problem lies in the docker daemon, it shouldn't stop the executor from being able to shut down properly.
This optionally creates a time limit for all calls using the docker daemon to avoid things hanging and causing an executor process that will simply wait forever doing nothing.