Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

powershell hangs / no timeout #274

Closed
ronansalmon opened this issue Sep 10, 2019 · 4 comments
Closed

powershell hangs / no timeout #274

ronansalmon opened this issue Sep 10, 2019 · 4 comments

Comments

@ronansalmon
Copy link

ronansalmon commented Sep 10, 2019

winrm is hanging on some windows servers and this cause by different reasons, ex:

  • wmi hanging because of the TiWorker
  • The following PS is hanging : Invoke-WmiMethod -EA Ignore -Name GetServerFeature -Namespace root\microsoft\windows\servermanager -Class MSFT_ServerManagerTasks
    Ex:
from winrm.protocol import Protocol

p = Protocol(
    endpoint='https://10.0.0.1:5986/wsman',
    transport='ntlm',
    username=r'user',
    password='password',
    server_cert_validation='ignore')
shell_id = p.open_shell()
command_id = p.run_command(shell_id, 'powershell', ['ipconfig /all'])
std_out, std_err, status_code = p.get_command_output(shell_id, command_id)
command_id = p.run_command(shell_id, 'powershell', ['Invoke-WmiMethod -EA Ignore -Name GetServerFeature -Namespace root\microsoft\windows\servermanager -Class MSFT_ServerManagerTasks'])
std_out, std_err, status_code = p.get_command_output(shell_id, command_id)

ipconfig /all works just fine, but the script will hang on the next command.
I understand that those servers may have stability issues, but how come the command never timed out ? Is it not supposed to be killed after operation_timeout_sec ?
How can we eventually automatically kill the running command ?

I have a server having this issue right now. If you need to run some test/debugging, please ask me asap as we are going to reboot it soon enough.

pywinrm 0.3.0
Python 2.7.5

Thanks

@nitzmahone
Copy link
Collaborator

This isn't really a pywinrm issue- WinRM operation timeouts are mostly about timing out a WinRM infrastructure call so it doesn't wait forever if WinRM itself is having issues (which in this case it isn't). The application has to decide how to implement logical timeouts on higher level operations. In the case of Ansible, it sounds like you want this: ansible/ansible#57818, but regardless, we can't just blindly set a timeout on everything, so you'd still have to somewhere say "this operation should fail after X min"- we have to default to assuming things can run forever, or we'll break the world. pywinrm doesn't currently support specifying a command timeout in get_command_output, so it will happily wait forever if a command is hung...

It's something that wouldn't be hard to add, but it'd still be up to the application to do the cleanup, at least with the "low-level" API.

@ronansalmon
Copy link
Author

Totally agreed that the default behaviour should be the same and should not kill anything by default.
But can pywinrm provide a mechanism to kill a command on demand, ex:

def run_command( self, shell_id, command, arguments=(), console_mode_stdin=True, skip_cmd_shell=False, timeout = False)
if timeout == False, do nothing else kick off a timeout function.
This could be useful to everyone, not just ansible.

@ronansalmon
Copy link
Author

After some digging, the issue seams to be coming from the catch protocol.py/get_command_output/WinRMOperationTimeoutError :

    def get_command_output(self, shell_id, command_id):
        stdout_buffer, stderr_buffer = [], []
        command_done = False
        while not command_done:
            try:
                stdout, stderr, return_code, command_done = \
                    self._raw_get_command_output(shell_id, command_id)
                stdout_buffer.append(stdout)
                stderr_buffer.append(stderr)
            except WinRMOperationTimeoutError:
                # this is an expected error when waiting for a long-running process, just silently retry
                pass
        return b''.join(stdout_buffer), b''.join(stderr_buffer), return_code

If I set operation_timeout_sec to 7, the header is correctly set :

<env:Header><w:OperationTimeout>PT7S</w:OperationTimeout>

But every 7 seconds, the catch silently ignore the timeout.

Is there a way to honor the operation_timeout ?

@nitzmahone
Copy link
Collaborator

That's how it's supposed to work- the system is already honoring the operation timeout (it's using it for polling). The operation in this case is "wait up to N sec for some output", and the documented pattern from Microsoft is to do exactly what we're doing (keep polling on timeout until the command has exited). It sounds like you're wanting to implement a higher-level "command timeout" that will kill the session if it exceeds the time. WinRM (and pywirnm) don't support such a thing, but you could implement it yourself by sending the low level output fetch messages yourself, keeping track of the total execution time, then doing $whatever when it's exceeded your threshold. We're unlikely to implement that on pywinrm directly though, because what someone would expect for the "what do we do when it times out?" will vary. Do we end the session? Send a kill command? Run some compensating action?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants