Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a long-running listener process on Windows to speed up Ansible #47707

Closed
trondhindenes opened this issue Oct 27, 2018 · 34 comments
Closed
Labels
affects_2.8 This issue/PR affects Ansible v2.8 feature This issue/PR relates to a feature request. support:core This issue/PR relates to code supported by the Ansible Engineering Team. windows Windows community

Comments

@trondhindenes
Copy link
Contributor

SUMMARY

Ansible currently uses "native" winrm. This means that each task Ansible sends to a windows node results in the target node spinning up a new process/runspace to host the command(s) invoked by Ansible. This is very slow.

Although it might be considered breaking the agentless nature of Ansible, it would be good to have an option where Ansible would self-install an ephemeral process that could accept Ansible tasks without spinning up a new process/runspace all the time. This process could self-destruct after a given time of inactivity.

ISSUE TYPE
  • Feature Idea
COMPONENT NAME

winrm

ADDITIONAL INFORMATION

Just an idea, I don't have any more information.

@ansibot
Copy link
Contributor

ansibot commented Oct 27, 2018

Hi @trondhindenes, thank you for submitting this issue!

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Oct 27, 2018

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Oct 27, 2018

@ansibot ansibot added affects_2.8 This issue/PR affects Ansible v2.8 feature This issue/PR relates to a feature request. needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. windows Windows community labels Oct 27, 2018
@jborean93
Copy link
Contributor

Hey @trondhindenes this is something we've been wanting to implement with persisted connections. Unfortunately the current persistent code that is used for Network modules (ansible-connection) turned out to be slower than starting a new shell for each task. We were unable to verify at the time why this was the case so we hit a bit of a roadblock for now. I believe @bcoca had some thoughts as to how to persist the connection plugin between tasks but I'm not sure whether they are just thoughts or active plans.

This is definitely something we want to move towards so I'll keep the issue open to start brainstorming idea.s

@ansibot ansibot removed the needs_triage Needs a first human triage before being processed. label Oct 27, 2018
@trondhindenes
Copy link
Contributor Author

nice, good to hear that there's movement here. I did some research a while ago trying to figure out if it's possible to configure the winrm service in such a way that invocations against reuse the same process instead of spinning up a new, but I didn't get anywhere. That said, Powershell has the notion of "sessions", so I guess it is supported (I didn't dig into the protocol spec for how sessions work, but I'm sure there exists a spec).

BTW efforts in this area needs to be balanced against the effort of making ssh an alternative to winrm, which could make this whole issue moot.

@trondhindenes
Copy link
Contributor Author

This seems to document the "session" apis in WinRM:
https://docs.microsoft.com/en-us/windows/desktop/api/Wsman/nf-wsman-wsmancreatesession

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Oct 28, 2018

did a bit more digging - I have never really looked at the winrm stuff in Ansible before. I see it persists the shell_id in the winrm Connection in Ansible. Does that mean that the Connection object itself is instantiated again and again - once for each task? If so it's tempting to just hack a "persister" using local files or redis or something (custom connection plugin)

Sorry for this turning into a blog. After some more testing, I notice that if persisting the "shell id" between tasks there is perhaps a tiny performance gain, but each "command" still spins up a new powershell.exe process, which seems to be the heaviest part. This is the same beavior as when invoking multple winrm.Session.run_ps commands after each other with the same shell id.

I'll conclude for now that there's nothing to gain from just persisting the shell_id, we need to create an actual ps session to have a long-living process (e.g. a powershell session) in which we can fire off commands to the target.

Another thing I'm noticing when comparing commands from a "native" powershell client (with or without a session) is that powershell.exe is actually never started when a native client performs an operation - everything seems to be done thru a process called wsmprovhost.exe. Wondering if there are some quick wins to gain simply by making pywinrm behave more like native powershell clients when it invokes commands.

@jborean93
Copy link
Contributor

Does that mean that the Connection object itself is instantiated again and again - once for each task?

a new connection plugin is created on every task so for WinRM it will send at minimum these messages

  • Create Shell
  • Create Process
  • Send Process input
  • Receive process output (there could be multiple of these depending on the time and sizze)
  • Cleanup Process
  • Close Shell

If we were to share the connection plugin we could potentially eliminate the first and last message but the remaining will still be there and slow. We could use an agent based process to cache modules and keep PowerShell up and running in a local RunspacePool but this won't happen anytime soon (if it ever does).

I'll conclude for now that there's nothing to gain from just persisting the shell_id, we need to create an actual ps session to have a long-living process (e.g. a powershell session) in which we can fire off commands to the target.

I agree, the original winrm connection plugin would see little benefit to persisting the connection plugin. The newer psrp connection plugin is a bit different, it works around the following;

  • Create a Runspace Pool
  • Get Runspace Pool details
  • Create Pipeline to execute PS script (could be multiple depending on size of the payload)
  • Get Pipeline output
  • Close Runspace Pool

It also has the added benefit of having PowerShell up and running so the step to create the pipeline is just a matter of it creating a new thread and not a whole new PowerShell process. Right now it is slightly faster than winrm due to this behaviour but we could potentially irk out some more performance. I started looking at using ansible-connection to persist the shell/runspace but in my testing the overhead for ansible-connection outweighed the benefits so had to hold off on that.

The wsmprovhost.exe is part of PSRP and we can already take advantage of that as stated above. There's still some work to be done here but a lot would need to have some buy in from Ansible to achieve reliably. We could hack something in but then make this more of a nightmare to maintain and potentially open ourselves up to more server side attacks.

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Oct 28, 2018

Nice, thanks for the input. For us the cpu usage on targets are a bigger problem than speed actually. We try and run t3.micro for nonprod if there isn't any reason for using bigger instances, and they spike hard when we run Ansible against them. So even without the performance (as in speed) gains, a persistent runspace would be enough for us to consider switching connection plugins - at least it's something we'd be more than willing to test.

@trondhindenes
Copy link
Contributor Author

oh and btw: My sincerest respect to you for diving into this. I spent 30 mins investigating winrm soap messages and had to stop because I was feeling dizzy ;-)

@jborean93
Copy link
Contributor

@trondhindenes have you tried ansible_connection: psrp with the t3.micro instances, docs for this are at psrp? I never really tested it against a cut down host so unsure whether that will help the CPU usage.

oh and btw: My sincerest respect to you for diving into this. I spent 30 mins investigating winrm soap messages and had to stop because I was feeling dizzy ;-)

Start looking into PSRP and you come across XML embedded in XML, does your head in :P. I'm just happy that it isn't ASN.1 encoded data, had to deal with that for CredSSP auth and that's enough for me.

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Oct 30, 2018

will try! That said, I think real perf gains will only be achieved by a long-living session - which is not implemented py psrp today I guess. Anyways, I'll try it!

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Nov 8, 2018

Just wanted to share a little bit more on this one:
I implemented a simple tcp listener using a long-running PS Runspace to see if that would speed things up (code at https://github.com/trondhindenes/AnsibleBackgroundWorker), along with a very rudimentary Ansible connection plugin using regular Python sockets. The speed increase is pretty staggering, its actually twice as fast as regular winrm.

cpu-wise tho, it's pretty much the same. Which leads me to think that the way Ansible sends in commands to a Windows node might be inefficient - as far as I can see, a simple win_service task sends in a string of 63450 characters - altho the win_service module itself is only 20k characters.

I'm guessing that Ansible will construct and execute the "global function" set upon each task execution (since there's no state to store the functions if using regular winrm), and that this eats up a lot of cpu. We're essentially re-importing the same set of functions again and again for each task that gets executed on a node.
With a stateful listener, such as the one I've crudely managed to carve out there is actually a persistent runspace that can "hold" these functions, which would make it possible to only send the actual "module code" - my assumption is that this would help speed up executions further.

Also noticing that the actual module seems to encoded and sent in a module_wrapper object which I'm guessing gets decoded and executed. I don't know if that process has been perf-tested, but something tells me this might not be the most efficient (cpu-wise) way of sending a few lines of PS over to a node. Even a 2-cpu vm on my fairly new laptop I can see a very noticeable cpu spike when Ansible does its thing - far more than a simple "Get-Service" call should (imho).

@jborean93
Copy link
Contributor

Yep that's pretty much what I expected and something we are looking to do with persistent connections in the future. Using psrp should mean we don't have to worry about managing and securing our own sockets but may take some hits on the WinRM payloads. Having a persistent runspace means we can;

  • Cache the modules on the remote side
  • Cache the module utils and exec wrappers
  • Cache the C# code that is dynamically compiled, no need to compile this for every run
  • Stop taking hits when starting a new powershell interpreter for normal exec processes
  • Lower the number of WinRM messages sent

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Nov 8, 2018

Yup. One area that probably needs fine-tuning is environment variable manipulation. This is easy in Ansible today, since envvars get "reloaded" on every task, but with a long-living session of any kind there would probably have to be some special sauce that took care of reloading the session upon changes in envvars. Shouldn't be too bad tho.

In any case, I welcome this. As we're scaling out our Windows estate and trying to keep instance sizes as small as possible, the current limitations in how winrm is used (both the slow execution and high cpu usage) are really painful for us.

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Nov 8, 2018

btw results from my testing:
same playbook, different connection methods. Target CPU usage is fairly similar for psrp and tcp hack, about 15-20% lower than with winrm. Very noticeable spikes in all 3 cases,

regular winrm
ansible-playbook test_playbook.yml -i hosts -vvv 1,10s user 0,27s system 10% cpu 12,658 total

psrp:
ansible-playbook test_playbook.yml -i hosts -vvv 2,35s user 0,50s system 21% cpu 13,111 total

trond's tcp hack thingy:
ansible-playbook test_playbook.yml -i hosts -vvv 0,87s user 0,20s system 12% cpu 8,896 total

@jborean93
Copy link
Contributor

Agreed, we definitely have a few ideas as to how to go about this. It's the next major goal in our minds for improving the performance and this is the area we are targeting.

@trondhindenes
Copy link
Contributor Author

awesome, really good news. I'm fine with closing this if you want, please let me know if there's any testing or other type of input I can be of assistance with.

@jborean93
Copy link
Contributor

Let's keep it open, I don't think we have an issue already for this work so good to have something there. Thanks for your investigations so far, it's been really helpful.

@trondhindenes
Copy link
Contributor Author

great! Will keep an eye on it!

@jhawkesworth
Copy link
Contributor

I am happy to test too when there is something to try out.
I don't know enough to make any useful suggestions about implementation. I suppose its obvious to say that it would be possible to lash up something using a callback plugin although it wouldn't be something you'd want for a final implementation it might be enough to see if there are peformance gains to be had.

@jhawkesworth
Copy link
Contributor

@dagwieers - I thought you might be interested in this thread too.

@Richard-Payne
Copy link

8 months down line, has anything come of this?

@trondhindenes
Copy link
Contributor Author

I would love to see some movement here, we desperately need a more efficient way of communicating with Windows from Ansible.

@trondhindenes
Copy link
Contributor Author

trondhindenes commented Dec 11, 2019

Screenshot from 2019-12-11 16-13-36
This is a t3.medium instance. We run Ansible against the node, the ansible playbook takes about 15 minutes even tho its a fairly limited number of steps. The instance is out of cpu credit balance way before ansible is done. This causes us to have to scale for provisioning instead of payload, which is a noticeable waste. We would be able to run way smaller instances if Ansible was more efficient (used less cpu)

@doyl54
Copy link

doyl54 commented Jan 14, 2020

Any news about this issue ? Ansible on Windows is really slower than on Linux. We don't know what to do to improve our timmings on Windows.

@trondhindenes
Copy link
Contributor Author

IMHO the first thing that needs to be addressed is the jit compilations that occur on every task. From what I understand it should be possible to do a lot of optimizations there without affecting the overall connection architecture. Maybe @jborean93 has some thoughts?

@doyl54
Copy link

doyl54 commented Jan 14, 2020

I think Kerberos is a part of the waste of time that occurs on windows's nodes too.

@trondhindenes
Copy link
Contributor Author

@doyl54 I haven't seen any indication of that.

@doyl54
Copy link

doyl54 commented Jan 15, 2020

Actually this is my main issue, i've noticed that every action that needs ansible to do is always long when he ask kerberos for his rights to do it or not. (i'm using Kerberos authentification as connection system to my nodes)

@jborean93
Copy link
Contributor

This isn't lost on us there's just no real way of implementing this in a common way and not just a hack for Windows. There are tentative plans to get working on trying to implement those ideas for a future Ansible version but right now a lot of the focus of our development to 2.10 is to enable collections support in Ansible.

@briantist
Copy link
Contributor

I've been working on a side project that will enable Ansible to work against a JEA endpoint, but it works just as well without JEA so it has a fair amount of overlap with this kind of thing (many of the things @jborean93 mentioned like caching the modules, sending fewer WinRM messages, possibility of pre-compiling the C# utils).

I've got working prototypes just not anything that's quite in a state to share. The split to collections kind of put a wrench in the some of the ideas for how to version it and keep it aligned so I'm still waiting to see how a lot of that plays out.

@petemounce
Copy link
Contributor

IMHO the first thing that needs to be addressed is the jit compilations that occur on every task. From what I understand it should be possible to do a lot of optimizations there without affecting the overall connection architecture. Maybe @jborean93 has some thoughts?

@trondhindenes have you seen https://docs.ansible.com/ansible/latest/user_guide/windows_performance.html ? I contributed that a little while ago. We do this inside of our cloud instance's startup script (and google Windows instances have it out of the box since GoogleCloudPlatform/compute-image-windows#174).

@mattclay
Copy link
Member

mattclay commented Aug 9, 2023

While this is a feature we'd like to implement, we have no plans to do so in the near future.

@mattclay mattclay closed this as completed Aug 9, 2023
@ansible ansible locked and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.8 This issue/PR affects Ansible v2.8 feature This issue/PR relates to a feature request. support:core This issue/PR relates to code supported by the Ansible Engineering Team. windows Windows community
Projects
None yet
Development

No branches or pull requests

9 participants