Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ansible: poor performance with multiple targets #250

Open
dw opened this Issue May 15, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@dw
Copy link
Owner

dw commented May 15, 2018

Per user report in #249, now that >100 targets is possible growing pains are starting to rear their heads.

Likely low hanging fruit:

  • 1-syscall-per-msg overhead that can be coalesced easily
  • 5-syscalls-per-Latch.get() overhead due to the work in #249
  • 1 connection multiplexer per CPU core on the controller
  • ModuleFinder inefficiencies that are hidden by Ansiballz caching
  • dict(hostvars) in connection.py

Hard mode:

  • VariableManager is one of the most expensive parts of Ansible (the code is from hell itself), we call it repeatedly to form the chain of configurations for connection delegation.
@dw

This comment has been minimized.

Copy link
Owner Author

dw commented May 17, 2018

Hi @sharph, if you're feeling adventurous, I have 3 Ansible patches in https://github.com/dw/ansible/tree/dmw that reduce runtime almost 40% for 10 host runs. These don't help explain why Mitogen is slower, but I'm struggling to profile that yet because Ansible sucks so much in these cases :)

Still "work in progress", but if you're interested in kicking the tires, it would be very useful.

Edit: whups, the complete branch is https://github.com/dw/ansible/tree/dmw .. devel is just a doc patch

@dw

This comment has been minimized.

Copy link
Owner Author

dw commented May 29, 2018

Hi @sharph, can you please let me know the latency between your controller machine and the target machines. Thanks

@sharph

This comment has been minimized.

Copy link

sharph commented May 29, 2018

It varies, but can be as good as 20-30ms and as bad as 300-400ms roundtrip depending on the target.

@dw dw added the user-reported label Jun 10, 2018

@dw dw added the target:v0.3 label Aug 11, 2018

@dw

This comment has been minimized.

Copy link
Owner Author

dw commented Nov 1, 2018

Hi @sharph,

Long time no update :) I have not stopped working on very large run performance, but unfortunately most of the important work cannot exist in a third-party extension.

Meanwhile, over the past months quite a few big fixes have been found -- two in particular were addressed in 0.2.3, relating to context switching overhead due to the use of threading, and one that concentrated a lot of redundant CPU usage in the connection multiplexer process.

I also have a separate branch that supports per-CPU multiplexer processes, however it is a little stale at this point. If you are still interested in this kind of work, and the newer fixes don't help much -- please say! I can at least see if the per-CPU work helps in your case, and if so, we could see about merging it into the stable series.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.