Support a long-running listener process on Windows to speed up Ansible #47707

trondhindenes · 2018-10-27T11:11:30Z

SUMMARY

Ansible currently uses "native" winrm. This means that each task Ansible sends to a windows node results in the target node spinning up a new process/runspace to host the command(s) invoked by Ansible. This is very slow.

Although it might be considered breaking the agentless nature of Ansible, it would be good to have an option where Ansible would self-install an ephemeral process that could accept Ansible tasks without spinning up a new process/runspace all the time. This process could self-destruct after a given time of inactivity.

ISSUE TYPE

Feature Idea

COMPONENT NAME

winrm

ADDITIONAL INFORMATION

Just an idea, I don't have any more information.

ansibot · 2018-10-27T11:19:33Z

Hi @trondhindenes, thank you for submitting this issue!

click here for bot help

ansibot · 2018-10-27T11:19:33Z

Files identified in the description:

lib/ansible/plugins/connection/winrm.py

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibot · 2018-10-27T11:19:34Z

cc @jborean93 @nitzmahone
click here for bot help

jborean93 · 2018-10-27T20:14:11Z

Hey @trondhindenes this is something we've been wanting to implement with persisted connections. Unfortunately the current persistent code that is used for Network modules (ansible-connection) turned out to be slower than starting a new shell for each task. We were unable to verify at the time why this was the case so we hit a bit of a roadblock for now. I believe @bcoca had some thoughts as to how to persist the connection plugin between tasks but I'm not sure whether they are just thoughts or active plans.

This is definitely something we want to move towards so I'll keep the issue open to start brainstorming idea.s

trondhindenes · 2018-10-28T08:49:49Z

nice, good to hear that there's movement here. I did some research a while ago trying to figure out if it's possible to configure the winrm service in such a way that invocations against reuse the same process instead of spinning up a new, but I didn't get anywhere. That said, Powershell has the notion of "sessions", so I guess it is supported (I didn't dig into the protocol spec for how sessions work, but I'm sure there exists a spec).

BTW efforts in this area needs to be balanced against the effort of making ssh an alternative to winrm, which could make this whole issue moot.

trondhindenes · 2018-10-28T09:04:23Z

This seems to document the "session" apis in WinRM:
https://docs.microsoft.com/en-us/windows/desktop/api/Wsman/nf-wsman-wsmancreatesession

trondhindenes · 2018-10-28T18:14:16Z

did a bit more digging - I have never really looked at the winrm stuff in Ansible before. I see it persists the shell_id in the winrm Connection in Ansible. Does that mean that the Connection object itself is instantiated again and again - once for each task? If so it's tempting to just hack a "persister" using local files or redis or something (custom connection plugin)

Sorry for this turning into a blog. After some more testing, I notice that if persisting the "shell id" between tasks there is perhaps a tiny performance gain, but each "command" still spins up a new powershell.exe process, which seems to be the heaviest part. This is the same beavior as when invoking multple winrm.Session.run_ps commands after each other with the same shell id.

I'll conclude for now that there's nothing to gain from just persisting the shell_id, we need to create an actual ps session to have a long-living process (e.g. a powershell session) in which we can fire off commands to the target.

Another thing I'm noticing when comparing commands from a "native" powershell client (with or without a session) is that powershell.exe is actually never started when a native client performs an operation - everything seems to be done thru a process called wsmprovhost.exe. Wondering if there are some quick wins to gain simply by making pywinrm behave more like native powershell clients when it invokes commands.

jborean93 · 2018-10-28T20:06:54Z

Does that mean that the Connection object itself is instantiated again and again - once for each task?

a new connection plugin is created on every task so for WinRM it will send at minimum these messages

Create Shell
Create Process
Send Process input
Receive process output (there could be multiple of these depending on the time and sizze)
Cleanup Process
Close Shell

If we were to share the connection plugin we could potentially eliminate the first and last message but the remaining will still be there and slow. We could use an agent based process to cache modules and keep PowerShell up and running in a local RunspacePool but this won't happen anytime soon (if it ever does).

I'll conclude for now that there's nothing to gain from just persisting the shell_id, we need to create an actual ps session to have a long-living process (e.g. a powershell session) in which we can fire off commands to the target.

I agree, the original winrm connection plugin would see little benefit to persisting the connection plugin. The newer psrp connection plugin is a bit different, it works around the following;

Create a Runspace Pool
Get Runspace Pool details
Create Pipeline to execute PS script (could be multiple depending on size of the payload)
Get Pipeline output
Close Runspace Pool

It also has the added benefit of having PowerShell up and running so the step to create the pipeline is just a matter of it creating a new thread and not a whole new PowerShell process. Right now it is slightly faster than winrm due to this behaviour but we could potentially irk out some more performance. I started looking at using ansible-connection to persist the shell/runspace but in my testing the overhead for ansible-connection outweighed the benefits so had to hold off on that.

The wsmprovhost.exe is part of PSRP and we can already take advantage of that as stated above. There's still some work to be done here but a lot would need to have some buy in from Ansible to achieve reliably. We could hack something in but then make this more of a nightmare to maintain and potentially open ourselves up to more server side attacks.

trondhindenes · 2018-10-28T23:06:10Z

Nice, thanks for the input. For us the cpu usage on targets are a bigger problem than speed actually. We try and run t3.micro for nonprod if there isn't any reason for using bigger instances, and they spike hard when we run Ansible against them. So even without the performance (as in speed) gains, a persistent runspace would be enough for us to consider switching connection plugins - at least it's something we'd be more than willing to test.

trondhindenes · 2018-10-28T23:08:28Z

oh and btw: My sincerest respect to you for diving into this. I spent 30 mins investigating winrm soap messages and had to stop because I was feeling dizzy ;-)

jborean93 · 2018-10-28T23:58:07Z

@trondhindenes have you tried ansible_connection: psrp with the t3.micro instances, docs for this are at psrp? I never really tested it against a cut down host so unsure whether that will help the CPU usage.

oh and btw: My sincerest respect to you for diving into this. I spent 30 mins investigating winrm soap messages and had to stop because I was feeling dizzy ;-)

Start looking into PSRP and you come across XML embedded in XML, does your head in :P. I'm just happy that it isn't ASN.1 encoded data, had to deal with that for CredSSP auth and that's enough for me.

trondhindenes · 2018-10-30T07:48:28Z

will try! That said, I think real perf gains will only be achieved by a long-living session - which is not implemented py psrp today I guess. Anyways, I'll try it!

trondhindenes · 2018-11-08T22:27:59Z

Just wanted to share a little bit more on this one:
I implemented a simple tcp listener using a long-running PS Runspace to see if that would speed things up (code at https://github.com/trondhindenes/AnsibleBackgroundWorker), along with a very rudimentary Ansible connection plugin using regular Python sockets. The speed increase is pretty staggering, its actually twice as fast as regular winrm.

cpu-wise tho, it's pretty much the same. Which leads me to think that the way Ansible sends in commands to a Windows node might be inefficient - as far as I can see, a simple win_service task sends in a string of 63450 characters - altho the win_service module itself is only 20k characters.

I'm guessing that Ansible will construct and execute the "global function" set upon each task execution (since there's no state to store the functions if using regular winrm), and that this eats up a lot of cpu. We're essentially re-importing the same set of functions again and again for each task that gets executed on a node.
With a stateful listener, such as the one I've crudely managed to carve out there is actually a persistent runspace that can "hold" these functions, which would make it possible to only send the actual "module code" - my assumption is that this would help speed up executions further.

Also noticing that the actual module seems to encoded and sent in a module_wrapper object which I'm guessing gets decoded and executed. I don't know if that process has been perf-tested, but something tells me this might not be the most efficient (cpu-wise) way of sending a few lines of PS over to a node. Even a 2-cpu vm on my fairly new laptop I can see a very noticeable cpu spike when Ansible does its thing - far more than a simple "Get-Service" call should (imho).

jborean93 · 2018-11-08T22:45:39Z

Yep that's pretty much what I expected and something we are looking to do with persistent connections in the future. Using psrp should mean we don't have to worry about managing and securing our own sockets but may take some hits on the WinRM payloads. Having a persistent runspace means we can;

Cache the modules on the remote side
Cache the module utils and exec wrappers
Cache the C# code that is dynamically compiled, no need to compile this for every run
Stop taking hits when starting a new powershell interpreter for normal exec processes
Lower the number of WinRM messages sent

trondhindenes · 2018-11-08T22:52:32Z

Yup. One area that probably needs fine-tuning is environment variable manipulation. This is easy in Ansible today, since envvars get "reloaded" on every task, but with a long-living session of any kind there would probably have to be some special sauce that took care of reloading the session upon changes in envvars. Shouldn't be too bad tho.

In any case, I welcome this. As we're scaling out our Windows estate and trying to keep instance sizes as small as possible, the current limitations in how winrm is used (both the slow execution and high cpu usage) are really painful for us.

trondhindenes · 2018-11-08T22:58:48Z

btw results from my testing:
same playbook, different connection methods. Target CPU usage is fairly similar for psrp and tcp hack, about 15-20% lower than with winrm. Very noticeable spikes in all 3 cases,

regular winrm
ansible-playbook test_playbook.yml -i hosts -vvv 1,10s user 0,27s system 10% cpu 12,658 total

psrp:
ansible-playbook test_playbook.yml -i hosts -vvv 2,35s user 0,50s system 21% cpu 13,111 total

trond's tcp hack thingy:
ansible-playbook test_playbook.yml -i hosts -vvv 0,87s user 0,20s system 12% cpu 8,896 total

jborean93 · 2018-11-08T23:01:50Z

Agreed, we definitely have a few ideas as to how to go about this. It's the next major goal in our minds for improving the performance and this is the area we are targeting.

trondhindenes · 2018-11-08T23:06:19Z

awesome, really good news. I'm fine with closing this if you want, please let me know if there's any testing or other type of input I can be of assistance with.

jborean93 · 2018-11-08T23:07:15Z

Let's keep it open, I don't think we have an issue already for this work so good to have something there. Thanks for your investigations so far, it's been really helpful.

trondhindenes · 2018-11-08T23:15:57Z

great! Will keep an eye on it!

jhawkesworth · 2018-11-23T08:14:45Z

I am happy to test too when there is something to try out.
I don't know enough to make any useful suggestions about implementation. I suppose its obvious to say that it would be possible to lash up something using a callback plugin although it wouldn't be something you'd want for a final implementation it might be enough to see if there are peformance gains to be had.

jhawkesworth · 2018-11-23T08:15:20Z

@dagwieers - I thought you might be interested in this thread too.

Richard-Payne · 2019-08-02T08:54:30Z

8 months down line, has anything come of this?

trondhindenes · 2019-12-06T11:31:01Z

I would love to see some movement here, we desperately need a more efficient way of communicating with Windows from Ansible.

trondhindenes · 2019-12-11T15:15:41Z

This is a t3.medium instance. We run Ansible against the node, the ansible playbook takes about 15 minutes even tho its a fairly limited number of steps. The instance is out of cpu credit balance way before ansible is done. This causes us to have to scale for provisioning instead of payload, which is a noticeable waste. We would be able to run way smaller instances if Ansible was more efficient (used less cpu)

doyl54 · 2020-01-14T10:43:14Z

Any news about this issue ? Ansible on Windows is really slower than on Linux. We don't know what to do to improve our timmings on Windows.

trondhindenes · 2020-01-14T11:04:36Z

IMHO the first thing that needs to be addressed is the jit compilations that occur on every task. From what I understand it should be possible to do a lot of optimizations there without affecting the overall connection architecture. Maybe @jborean93 has some thoughts?

doyl54 · 2020-01-14T12:56:37Z

I think Kerberos is a part of the waste of time that occurs on windows's nodes too.

trondhindenes · 2020-01-14T15:51:25Z

@doyl54 I haven't seen any indication of that.

doyl54 · 2020-01-15T10:20:20Z

Actually this is my main issue, i've noticed that every action that needs ansible to do is always long when he ask kerberos for his rights to do it or not. (i'm using Kerberos authentification as connection system to my nodes)

jborean93 · 2020-03-11T07:20:41Z

This isn't lost on us there's just no real way of implementing this in a common way and not just a hack for Windows. There are tentative plans to get working on trying to implement those ideas for a future Ansible version but right now a lot of the focus of our development to 2.10 is to enable collections support in Ansible.

briantist · 2020-03-11T13:41:52Z

I've been working on a side project that will enable Ansible to work against a JEA endpoint, but it works just as well without JEA so it has a fair amount of overlap with this kind of thing (many of the things @jborean93 mentioned like caching the modules, sending fewer WinRM messages, possibility of pre-compiling the C# utils).

I've got working prototypes just not anything that's quite in a state to share. The split to collections kind of put a wrench in the some of the ideas for how to version it and keep it aligned so I'm still waiting to see how a lot of that plays out.

petemounce · 2020-10-02T23:31:41Z

IMHO the first thing that needs to be addressed is the jit compilations that occur on every task. From what I understand it should be possible to do a lot of optimizations there without affecting the overall connection architecture. Maybe @jborean93 has some thoughts?

@trondhindenes have you seen https://docs.ansible.com/ansible/latest/user_guide/windows_performance.html ? I contributed that a little while ago. We do this inside of our cloud instance's startup script (and google Windows instances have it out of the box since GoogleCloudPlatform/compute-image-windows#174).

mattclay · 2023-08-09T19:51:02Z

While this is a feature we'd like to implement, we have no plans to do so in the near future.

ansibot removed the needs_triage Needs a first human triage before being processed. label Oct 27, 2018

trondhindenes mentioned this issue Oct 28, 2018

Powershell over SSH #25344

Closed

mattclay closed this as completed Aug 9, 2023

ansible locked and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a long-running listener process on Windows to speed up Ansible #47707

Support a long-running listener process on Windows to speed up Ansible #47707

trondhindenes commented Oct 27, 2018

ansibot commented Oct 27, 2018

ansibot commented Oct 27, 2018

ansibot commented Oct 27, 2018

jborean93 commented Oct 27, 2018

trondhindenes commented Oct 28, 2018

trondhindenes commented Oct 28, 2018

trondhindenes commented Oct 28, 2018 •

edited

jborean93 commented Oct 28, 2018

trondhindenes commented Oct 28, 2018 •

edited

trondhindenes commented Oct 28, 2018

jborean93 commented Oct 28, 2018

trondhindenes commented Oct 30, 2018 •

edited

trondhindenes commented Nov 8, 2018 •

edited

jborean93 commented Nov 8, 2018

trondhindenes commented Nov 8, 2018 •

edited

trondhindenes commented Nov 8, 2018 •

edited

jborean93 commented Nov 8, 2018

trondhindenes commented Nov 8, 2018

jborean93 commented Nov 8, 2018

trondhindenes commented Nov 8, 2018

jhawkesworth commented Nov 23, 2018

jhawkesworth commented Nov 23, 2018

Richard-Payne commented Aug 2, 2019

trondhindenes commented Dec 6, 2019

trondhindenes commented Dec 11, 2019 •

edited

doyl54 commented Jan 14, 2020

trondhindenes commented Jan 14, 2020

doyl54 commented Jan 14, 2020

trondhindenes commented Jan 14, 2020

doyl54 commented Jan 15, 2020

jborean93 commented Mar 11, 2020

briantist commented Mar 11, 2020

petemounce commented Oct 2, 2020

mattclay commented Aug 9, 2023

Support a long-running listener process on Windows to speed up Ansible #47707

Support a long-running listener process on Windows to speed up Ansible #47707

Comments

trondhindenes commented Oct 27, 2018

SUMMARY

ISSUE TYPE

COMPONENT NAME

ADDITIONAL INFORMATION

ansibot commented Oct 27, 2018

ansibot commented Oct 27, 2018

ansibot commented Oct 27, 2018

jborean93 commented Oct 27, 2018

trondhindenes commented Oct 28, 2018

trondhindenes commented Oct 28, 2018

trondhindenes commented Oct 28, 2018 • edited

jborean93 commented Oct 28, 2018

trondhindenes commented Oct 28, 2018 • edited

trondhindenes commented Oct 28, 2018

jborean93 commented Oct 28, 2018

trondhindenes commented Oct 30, 2018 • edited

trondhindenes commented Nov 8, 2018 • edited

jborean93 commented Nov 8, 2018

trondhindenes commented Nov 8, 2018 • edited

trondhindenes commented Nov 8, 2018 • edited

jborean93 commented Nov 8, 2018

trondhindenes commented Nov 8, 2018

jborean93 commented Nov 8, 2018

trondhindenes commented Nov 8, 2018

jhawkesworth commented Nov 23, 2018

jhawkesworth commented Nov 23, 2018

Richard-Payne commented Aug 2, 2019

trondhindenes commented Dec 6, 2019

trondhindenes commented Dec 11, 2019 • edited

doyl54 commented Jan 14, 2020

trondhindenes commented Jan 14, 2020

doyl54 commented Jan 14, 2020

trondhindenes commented Jan 14, 2020

doyl54 commented Jan 15, 2020

jborean93 commented Mar 11, 2020

briantist commented Mar 11, 2020

petemounce commented Oct 2, 2020

mattclay commented Aug 9, 2023

trondhindenes commented Oct 28, 2018 •

edited

trondhindenes commented Oct 28, 2018 •

edited

trondhindenes commented Oct 30, 2018 •

edited

trondhindenes commented Nov 8, 2018 •

edited

trondhindenes commented Nov 8, 2018 •

edited

trondhindenes commented Nov 8, 2018 •

edited

trondhindenes commented Dec 11, 2019 •

edited