-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in loader.py all() function for each host. #32609
Conversation
The test
|
for each host listed in the inventory. improves performance by 30%.
e6748e5
to
bd70ab9
Compare
this should be solved by our directory cache at this point, each task should not retrigger glob.glob, only when an include/import happens and adds new directory to pathing. assigned to verify |
this is based on ansible#32609, but now the cache is cleared when a new directory is added to the loader * avoid calling glob.glob() when looking for py files in variable folders * for each host listed in the inventory. improves performance by 30%. (in 2017, for unspecified example) I added a global cache so all plugin loaders stay in sync like the other caches, not sure if this is wanted. ansible#79687 also identifies this as a hot code path.
this is based on ansible#32609 * avoid calling glob.glob() when looking for py files in variable folders for each host listed in the inventory. In addition, I added a global cache so all plugin loaders stay in sync (not sure if this is wanted), and reset the cache when new directories are added to the loader, like the other caches. ansible#79687 also identifies this as a hot code path. Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
I confirmed this is still happening, and I've opened a refreshed version of this PR #82448, sorry for not doing it sooner. The example inventory you provided now only takes about 10 seconds on devel, and 5 seconds on the refreshed PR! |
this is based on ansible#32609 * avoid calling glob.glob() when looking for py files in variable folders for each host listed in the inventory. In addition, I added a global cache so all plugin loaders stay in sync (not sure if this is wanted), and reset the cache when new directories are added to the loader, like the other caches. ansible#79687 also identifies this as a hot code path. Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
closing in favour of #82448 |
SUMMARY
Working a customer who has a very large inventory. Recent fixes to devel help reduce inventory load time from 4 minutes down to 30 seconds. Ran vmprof to see how to get any additional performance savings and noticed that
glob.glob()
is been called several times in thelib/ansible/plugins/loader.py
all() function. did not see the need for this as the .py files been polled are the same for each host listed in the inventory. preventingglob.glob()
from been called for each host listed in the inventory improves performance by 30%-->
ISSUE TYPE
COMPONENT NAME
lib/ansible/plugins/loader.py
ANSIBLE VERSION
ADDITIONAL INFORMATION
Because I cannot use the customer's inventory output, I created a simulation of their output. Here is the script to generate the inventory used to test this patch. The script generates 1001 groups with an average group size of 19 and a total host count of 19999. Each host has 99 host variables defined.
The vmprofile before the patch looks as follows
Before the patch
this inventory using backport.py takes
25.01s user 2.61s system 99% cpu 27.824 total
on my laptop.After the patch
Inventory loads in
19.25s user 1.56s system 99% cpu 20.948 total