-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype vars and inventory dump performance tweaks #79687
base: devel
Are you sure you want to change the base?
Prototype vars and inventory dump performance tweaks #79687
Conversation
Demo elimination of problematic disk I/O that slows down host/group vars resolution and inventory dumps. PluginLoader.all(): * makes heavy use of globbing to find builtin and legacy plugins * does not cache path traversal info or missing paths, only source for loaded plugins * does not cache config defs for loaded plugins host_group_vars vars plugin: * overuses `os.path.realpath()`, which is incredibly expensive; normalize paths earlier, or maybe not at all- penalty for duplicates is likely much less than the aggregate cost of the calls * does not cache missing paths/files utils/path.py basedir() helper: * normalize with abspath() earlier or cache the normalized result; abspath calls are very expensive and occurring on every `get_vars` call utils/vars.py _validate_mutable_mappings() helper: * this gets called millions of times during combine_vars; normalize things earlier or catch the immutability problems at the call sites- the `isinstance` calls are surprisingly expensive when called so frequently vars/clean.py clean_facts(): * excessive (and incomplete for collections!) use of PluginLoader.all() with connection plugins to get var prefixes to clean * cache the needed dynamic info vars/plugins.py get_vars_from_path(): * excessive use of PluginLoader.all() for vars plugins (cache findings) * excessive creation of vars plugins instances; builtin host_group_vars is basically stateless. Either cache instances or cache vars_plugins configs- the plugin setup and config is what's killing us when this is called millions of times.
plugins now have |
The test
The test
|
this is based on ansible#32609, but now the cache is cleared when a new directory is added to the loader * avoid calling glob.glob() when looking for py files in variable folders * for each host listed in the inventory. improves performance by 30%. (in 2017, for unspecified example) I added a global cache so all plugin loaders stay in sync like the other caches, not sure if this is wanted. ansible#79687 also identifies this as a hot code path.
this is based on ansible#32609 * avoid calling glob.glob() when looking for py files in variable folders for each host listed in the inventory. In addition, I added a global cache so all plugin loaders stay in sync (not sure if this is wanted), and reset the cache when new directories are added to the loader, like the other caches. ansible#79687 also identifies this as a hot code path. Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
this is based on ansible#32609 * avoid calling glob.glob() when looking for py files in variable folders for each host listed in the inventory. In addition, I added a global cache so all plugin loaders stay in sync (not sure if this is wanted), and reset the cache when new directories are added to the loader, like the other caches. ansible#79687 also identifies this as a hot code path. Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
SUMMARY
Demo elimination of unnecessary disk I/O that slows down host/group vars resolution and inventory dumps.
fixes #79664 (poorly; these changes are not production-ready but just show the issues and suggest fixes)
Running
ansible-inventory
with these changes against this inventory script with 100 groups of 500 hosts each (50k hosts total) reduces the inventory dump wall clock time by ~ an order of magnitude:devel:
this branch:
Some notes about the rationale for the changes:
PluginLoader.all():
host_group_vars vars plugin:
os.path.realpath()
, which is incredibly expensive; normalize paths earlier, or maybe not at all- penalty for duplicates is likely much less than the aggregate cost of the callsutils/path.py basedir() helper:
get_vars
callutils/vars.py _validate_mutable_mappings() helper:
isinstance
calls are surprisingly expensive when called so frequentlyvars/clean.py clean_facts():
vars/plugins.py get_vars_from_path():
ISSUE TYPE
COMPONENT NAME
ansible-inventory, vars plugins