-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait_for: treat broken connections as "unready" #28839
Conversation
We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects.
We were missing an import and a space after the `#`
Is the wait_for_connection action plugin not what you need ? |
Ah! That's a neat new addition in 2.3. I'll take a look at that action, but I think still need this change for at least the case where we're restarting into a system that doesn't have python installed yet, so |
@sethp-nr From your description it isn't clear what you are testing or what connection you are talking about. |
Sorry for the confusion, we've not managed to isolate the ultimate cause but the general flow is as follows:
The issue this patch addresses is that often step 2 fails (spuriously) because the shutdown call in the module races with packets from the remote host that put the connection in the state that Darwin throws errno 59. Since that happens to say "we can't close this connection because it's not connected", it seems appropriate to say "ok, thanks Darwin, no problem" and continue on rather than raising that transient error and failing the play. For the common case, we can migrate to using the (relatively new) |
Right, for that case wait_for_connection was designed. Testing a TCP port is not sufficient to continue your playbook, you need to know if the transport works end-to-end. There is an open project I am doing to implement a platform-independent reboot module that would reuse the exact same logic, but would also verify if the system was effectively rebooted. And would have some additional functionality, you can find some information at #16186 There's some preliminary code. I wanted to finish it for v2.4 but too much on my plate... |
Indeed, though I'm not sure the changes here and there are mutually exclusive? Even if I was waiting for a service to become available on a port, I suspect the same race would occur between socket shutdown and a RST (or similar) from the peer. Wouldn't it make sense to consider those errors as spurious? Further, there is at least one case that we know the transport will not work end-to-end until the play continues (because the next thing the play does is install python, which I'm glad to hear you're working on the reboot module! That would indeed make our lives better – I read through that thread, and it wasn't clear to me how you were planning to handle our complex case (booting into a system that has SSH, but not much else). Should I leave a comment over there describing that difficulty? |
@dagwieers Just wanted to check in – this change is still important for us in at least the case I mentioned in my last comment, and doesn't seem harmful otherwise. Any thoughts? |
@sethp-nr I am not the person who can decide if this gets merged. If you want to bring this up to the core team, feel free to add your PR to the core team's agenda and be present on the next core meeting. ansible/community#263 |
Ah, thank you for the pointer! |
* wait_for: treat broken connections as "unready" We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects. * wait_for: fixup change We were missing an import and a space after the `#` (cherry picked from commit 402b095)
* wait_for: treat broken connections as "unready" We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects. * wait_for: fixup change We were missing an import and a space after the `#`
* wait_for: treat broken connections as "unready" We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects. * wait_for: fixup change We were missing an import and a space after the `#`
cherry-picked to the temp-staging-post-2.4.1 branch for the 2.4.2 release. |
* wait_for: treat broken connections as "unready" We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects. * wait_for: fixup change We were missing an import and a space after the `#` (cherry picked from commit 402b095)
* wait_for: treat broken connections as "unready" We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects. * wait_for: fixup change We were missing an import and a space after the `#` (cherry picked from commit 402b095)
* wait_for: treat broken connections as "unready" We have observed the following condition while waiting for hosts: ``` Traceback (most recent call last): File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 585, in <module> main() File "/var/folders/f8/23xp00654plcv2b2tcc028680000gn/T/ansible_8hxm4_/ansible_module_wait_for.py", line 535, in main s.shutdown(socket.SHUT_RDWR) File "/usr/local/opt/python/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 57] Socket is not connected ``` This appears to happen while the host is still starting; we believe something is accepting our connection but immediately resetting it. In these cases, we'd prefer to continue waiting instead of immediately failing the play. This patch has been applied locally for some time, and we have seen no adverse effects. * wait_for: fixup change We were missing an import and a space after the `#` (cherry picked from commit 402b095)
SUMMARY
We have observed the following condition while waiting for hosts:
This appears to happen while the host is still starting; we believe something is
accepting our connection but immediately resetting it. In these cases, we'd
prefer to continue waiting instead of immediately failing the play.
This patch has been applied locally for some time, and we have seen no adverse
effects.
ISSUE TYPE
COMPONENT NAME
wait_for module
ANSIBLE VERSION