Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Performance Work

Draft
wants to merge 46 commits into
base: devel
Choose a base branch
from
Draft

WIP Performance Work #64961

wants to merge 46 commits into from

Conversation

jimi-c
Copy link
Member

@jimi-c jimi-c commented Nov 17, 2019

This is an experimental branch for testing performance related improvements.

Depends-On: ansible/ansible-zuul-jobs#245

@jimi-c jimi-c self-assigned this Nov 17, 2019
@jimi-c jimi-c requested review from sivel, bcoca and nitzmahone Nov 17, 2019
@jimi-c jimi-c added code_cleanup performance roadmap_candidate WIP labels Nov 17, 2019
@jimi-c jimi-c force-pushed the ansible_30 branch 4 times, most recently from de82b46 to d2defe0 Compare Nov 20, 2019
@dw
Copy link
Contributor

dw commented Nov 20, 2019

Rather than send task_vars down to worker (massive single-threaded O(config) generation+serialization bottleneck, and in this case filesystem IO -- good luck with NFS), send template-expanded values up from worker to parent (fully parallelized generation and basically free, aside from fiddly code). Take care not to block task scheduling loop while waiting for processed values

@jimi-c
Copy link
Member Author

jimi-c commented Nov 20, 2019

@dw the problem is we need the task_vars in the main proc for a few templated fields (the task name and now the throttle value), so we have to generate them there. I did test just throwing that away and having the worker regenerate them, however that was actually quite a bit slower. Actually just typing that I realize now that since I've moved callbacks to the result side we don't need to template the task name but the throttle field stands :/

@dw
Copy link
Contributor

dw commented Nov 20, 2019

Given throttle is inherently 1 for each host-task combination (Ansible runs no more than one task per host simultaneously), does it make sense for its template to have host-task binding? Even if host binding is desired, IIRC there is precedent elsewhere to template using only the first matched host, although this might be something from an ancient version. For 1000 hosts, eliminating 999 get_vars() per task would be a big deal, especially with a slow vars configuration.

The binding of throttle isn't documented, only that it can be set on a per-task basis, so making it play+task binding and calculating it once might be something that could live in the changelog, if mentioned at all. There is also the option (as for run_once) of verifying throttle is a template before calling get_vars().

For callbacks I use a buffer where events are reordered to ensure on_task_start for the first host is always reported before subsequent callbacks, by delaying them until the worker has reported the template-expanded name. That way plugins always see the 'correct' event ordering

Regardless of strategy the important thing is that dreaded get_vars() call, there is no escaping its blast radius

@jimi-c
Copy link
Member Author

jimi-c commented Nov 20, 2019

@dw I will give that a try, though it's a little more complicated for non-linear strategies (though we can cache the task_vars and look them up per uuid or something). I had originally wanted to move the get_vars call to the worker side, but earlier in my testing it was slower than sending them over via file. I'll try it again though as only generating vars once for the (play, task) context instead of per-host may help.

@jimi-c
Copy link
Member Author

jimi-c commented Nov 20, 2019

Also, with static worker sets, we'd have to ship facts/other vars includes to each worker, which may eat up some of the gain. As is, keeping the call to get_vars in the main proc simplifies things.

@dw
Copy link
Contributor

dw commented Nov 21, 2019

I have a replacement vars manager that solves your facts problem and quite a bit more.. it's something that should be upstream. My experiences with fielding proposals and PRs has been abysmal to this point, primarily because most of your communication does not happen in open spaces. Is there some other way I could approach contributing this to you?

pabelanger added a commit to pabelanger/ansible-zuul-jobs-1 that referenced this issue Nov 23, 2019
We don't want to merge this code, but disable it until we know

  ansible/ansible#64961

is working properly.

Signed-off-by: Paul Belanger <pabelanger@redhat.com>
@pabelanger
Copy link
Contributor

pabelanger commented Nov 23, 2019

recheck

@jimi-c jimi-c force-pushed the ansible_30 branch 3 times, most recently from 0f7fcf9 to 065b348 Compare Dec 6, 2019
@jimi-c jimi-c force-pushed the ansible_30 branch 2 times, most recently from 0f2187f to 530fe3b Compare Dec 7, 2019
@ansible ansible deleted a comment from ansible-zuul bot Dec 7, 2019
@ansible ansible deleted a comment from ansible-zuul bot Dec 7, 2019
jimi-c added 15 commits Dec 8, 2019
One of the performance improvements from perf_idea was to make the DataLoader
class init with the basedir unfracked. This prevents a LOT of os.path calls
later, but it does mean the DataLoader will never deal with relative paths in
methods like the dwim searches. Overall this should be better as it removes a
potential bug with relative pathing, but it does have implications for testing.

These tests were disabled because they're probably not required anymore. They
could be shifted to test against the $CWD of the tests but we may not care.
* Temporarily disabled direct use of send_callback as there are serialization
  issues with several of the objects being sent across the Queues
* args was not set to always post validate?
* Fixing botmeta since we replaced the old plugin loader
* Fixing role duplication detection and handling
* Cleaning up pep and other sanity issues
This also cleans up some other code, which was used to set the preferred
extension type based on the connection plugin.
jimi-c added 3 commits Dec 8, 2019
* Modifies include_vars to use new role data fields
* Fixing templating of loop control fields
* Feeding include_vars info back to workers via updates
@ansibot
Copy link
Contributor

ansibot commented Feb 8, 2021

@jimi-c This PR was evaluated as a potentially problematic PR for the following reasons:

  • More than 50 changed files.

Such PR can only be merged by human. Contact a Core team member to review this PR on IRC: #ansible-devel on irc.freenode.net

click here for bot help

@ansibot ansibot added affects_2.10 has_issue needs_rebase needs_triage pre_azp labels Feb 8, 2021
@samdoran samdoran removed the needs_triage label Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.10 code_cleanup has_issue needs_rebase performance pre_azp roadmap_candidate WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants