-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
winrm - make command input more resiliant #81538
Conversation
I was using the following script to test out the server side behaviour when a send operation failed with an operation timeout. It verifies that the data is actually written to the stdin pipe and the 2nd attempt but need a delayed retry in case it is still busy writing. The changes in the PR have also been tested under a import base64
import time
import winrm
import winrm.exceptions
PWSH_CMD = r'''
if (Test-Path -Path C:\temp\test.txt) {
Remove-Item C:\temp\test.txt -Force
}
Add-Content C:\temp\test.txt -Value start
$input | ForEach-Object { $_ | Add-Content C:\temp\test.txt }
Add-Content C:\temp\test.txt -Value finish
'done'
'''
def main():
p = winrm.Protocol(
"http://192.168.121.151:5985/wsman",
transport="basic",
username="vagrant",
password="vagrant",
)
p.operation_timeout_sec = 2
shell_id = p.open_shell()
try:
cmd_id = p.run_command(
shell_id,
"powershell.exe",
["-EncodedCommand", base64.b64encode(PWSH_CMD.encode("utf-16-le")).decode()],
)
inputs = [
((b"a" * 250000) + b"end1", False),
((b"b" * 250000) + b"end2", True),
]
try:
p.operation_timeout_sec = 2
try:
for in_data, end in inputs:
while True:
try:
print("Sending input")
p.send_command_input(
shell_id,
cmd_id,
in_data,
end=end,
)
except winrm.exceptions.WinRMOperationTimeoutError:
print("Operation timeout, ignoring")
except winrm.exceptions.WinRMError as e:
print(f"Transport error: {e}")
if "'wsmanfault_code': '170'" not in str(e):
raise
time.sleep(5)
continue
break
finally:
p.operation_timeout_sec = winrm.Protocol.DEFAULT_OPERATION_TIMEOUT_SEC
print("Getting output")
stdout, stderr, rc = p.get_command_output(shell_id, cmd_id)
print(f"RC: {rc}\nSTDOUT:\n{stdout.decode('utf-8')}\nSTDERR:\n{stderr.decode('utf-8')}")
finally:
p.cleanup_command(shell_id, cmd_id)
finally:
p.close_shell(shell_id)
if __name__ == '__main__':
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the 5s delay was arrived at empirically- it feels somewhat long to me, but definitely not unreasonable, so if 3x 5s makes more sense than like 6-8x 2s, WFM.
Thanks for tracking this one down! Does pypsrp have a similar issue on input sends, or was it already more robust?
In testing I found that 2-3 seconds was typically not long enough for the first operation timeout to have finished before it would fail a second time saying the resource is busy (hence the 2nd exception check). Obviously this is more dependent on the host in question but doing 5 seconds seemed to at least be more "stable" in real world testing. In the end it's mostly there as an exceptional scenario to not fail if we could continue, I would say users should be increasing their operation/read timeout values if this is being hit (hence the warning) if they are receiving this warning multiple times.
While theoretically it sounds like it could suffer from the same problem in testing I never came across it. My guess is that |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
* winrm - make command input more resiliant * Expand warning message (cherry picked from commit f22231d)
* winrm - make command input more resiliant * Expand warning message (cherry picked from commit f22231d)
* winrm - make command input more resiliant * Expand warning message (cherry picked from commit f22231d)
Backport PRs stable-2.14 - https://github.com/ansible/ansible/pull/81792/files |
Thanks for this work. I do hope it gets accepted for backports so that RHEL 9 will see the update. |
SUMMARY
Attempts to improve the handling of failures with the winrm send stdin input functions to better handle timeout conditions on the server when it can be safely ignored.
Related: ansible-collections/ansible.windows#516
ISSUE TYPE
ADDITIONAL INFORMATION
The current behaviour is to treat any errors as being ignored so this new retry behaviour shouldn't introduce any extra issues. It should only improve the chances of being able to recover from these types of failures.