Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in loader.py all() function for each host. #32609

Conversation

skamithi
Copy link
Contributor

@skamithi skamithi commented Nov 7, 2017

SUMMARY

Working a customer who has a very large inventory. Recent fixes to devel help reduce inventory load time from 4 minutes down to 30 seconds. Ran vmprof to see how to get any additional performance savings and noticed that glob.glob() is been called several times in the lib/ansible/plugins/loader.py all() function. did not see the need for this as the .py files been polled are the same for each host listed in the inventory. preventing glob.glob() from been called for each host listed in the inventory improves performance by 30%
-->

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

lib/ansible/plugins/loader.py

ANSIBLE VERSION
ansible 2.5.0
  config file = None
  configured module search path = [u'/home/ansible/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible
  executable location = /home/ansible/inv-test/bin/ansible
  python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]
ADDITIONAL INFORMATION

Because I cannot use the customer's inventory output, I created a simulation of their output. Here is the script to generate the inventory used to test this patch. The script generates 1001 groups with an average group size of 19 and a total host count of 19999. Each host has 99 host variables defined.

#!/usr/bin/python

import json

host_count = 20000
json_output = {
    'all': {'hosts': []},
    '_meta': {'hostvars': {}}
}
base_name = 'server'

metadata = {}

for i in range(1, 100):
    _key = "blah" + str(i)
    metadata[_key] = "blahblahblah"

groups = 1000
host_count = 20000
group_size = host_count / groups
group_number = 0
_group_size = 0
_new_group_name = 'group_' + str(group_number)

for i in range(1, host_count):
    if _group_size >= group_size:
        group_number += 1
        _new_group_name = 'group_' + str(group_number)
        _group_size = 0
    servername = base_name + str(i)
    json_output['all']['hosts'].append(servername)
    json_output['_meta']['hostvars'][servername] = metadata

    if not json_output.get(_new_group_name):
        json_output[_new_group_name] = {'hosts': []}
    json_output.get(_new_group_name).get('hosts').append(servername)
    _group_size += 1
print(json.dumps(json_output))

The vmprofile before the patch looks as follows

vmprof output:
%:      name:                                       location:
100.0%  run_path                                    /usr/lib/python2.7/runpy.py:235
100.0%  _run_module_code                            /usr/lib/python2.7/runpy.py:75
100.0%  _run_module_as_main                         /usr/lib/python2.7/runpy.py:147
100.0%  _run_code                                   /usr/lib/python2.7/runpy.py:62
100.0%  main                                        /home/ansible/inv-test/lib/python2.7/site-packages/vmprof/__main__.py:30
100.0%  <module>                                    backport.py:19
100.0%  <module>                                    /home/ansible/inv-test/lib/python2.7/site-packages/vmprof/__main__.py:1
98.7%   <module>                                    /home/ansible/inv-test/bin/ansible:21
98.2%   run                                         backport.py.py:121
65.7%   json_inventory                              backport.py.py:257
65.1%   _get_host_variables                         backport.py.py:194
65.0%   get_vars                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:204
28.9%   all                                         /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/loader.py:405
28.0%   _plugins_play                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:297
27.6%   _plugins_inventory                          /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:284
22.6%   _get_plugin_vars                            /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:265
22.4%   get_vars                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/vars/host_group_vars.py:60
20.1%   glob                                        /usr/lib/python2.7/glob.py:18
18.7%   iglob                                       /usr/lib/python2.7/glob.py:29
16.0%   parse_sources                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/manager.py:194
16.0%   __init__                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/manager.py:121
16.0%   _play_prereqs                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/cli/__init__.py:776
15.3%   parse_source                                /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/manager.py:218
15.2%   parse                                       /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/inventory/script.py:64
14.7%   n:PyObject_Call:0:-                        
13.2%   dump                                        backport.py.py:182
13.2%   dumps                                       /usr/lib/python2.7/json/__init__.py:193
12.8%   realpath                                    /home/ansible/inv-test/lib/python2.7/posixpath.py:372
12.8%   encode                                      /usr/lib/python2.7/json/encoder.py:186
12.1%   glob1                                       /usr/lib/python2.7/glob.py:71
11.9%   _iterencode                                 /usr/lib/python2.7/json/encoder.py:417
11.0%   n:PyEval_EvalCodeEx:0:-                    
10.6%   _iterencode_dict                            /usr/lib/python2.7/json/encoder.py:341
10.0%   _joinrealpath                               /home/ansible/inv-test/lib/python2.7/posixpath.py:380
9.8%    all_plugins_play                            /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:312
9.2%    all_plugins_inventory                       /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:309
9.2%    groups_plugins_play                         /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:323
9.0%    groups_plugins_inventory                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:319
8.8%    populate_host_vars                          /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/inventory/__init__.py:83
7.2%    set_variable                                /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/data.py:212
7.2%    n:<native symbol 0x4ddd31>:0:-             
6.4%    n:<native symbol 0x51e191>:0:-             

Before the patch

this inventory using backport.py takes 25.01s user 2.61s system 99% cpu 27.824 total on my laptop.

After the patch

Inventory loads in 19.25s user 1.56s system 99% cpu 20.948 total

@skamithi skamithi changed the title large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in inventory var module search for each host. large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in loader.py all() function for each host. Nov 7, 2017
@ansibot ansibot added affects_2.5 This issue/PR affects Ansible v2.5 bugfix_pull_request needs_triage Needs a first human triage before being processed. python3 support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Nov 7, 2017
@ansibot
Copy link
Contributor

ansibot commented Nov 7, 2017

The test ansible-test sanity --test pep8 [?] failed with the following error:

lib/ansible/plugins/loader.py:417:5: E303 too many blank lines (2)

click here for bot help

@ansibot ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Nov 7, 2017
for each host listed in the inventory. improves performance by 30%.
@skamithi skamithi force-pushed the inventory_script_performance_enhancement branch from e6748e5 to bd70ab9 Compare November 7, 2017 11:57
@jborean93 jborean93 removed the needs_triage Needs a first human triage before being processed. label Nov 8, 2017
@ansibot ansibot removed the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Nov 16, 2017
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Nov 24, 2017
@ansibot ansibot added bug This issue/PR relates to a bug. performance and removed bugfix_pull_request labels Mar 2, 2018
@abadger abadger removed the python3 label Mar 20, 2018
@ansibot ansibot added needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Mar 28, 2018
@ansibot ansibot added the new_plugin This PR includes a new plugin. label May 23, 2018
@ansibot ansibot added support:community This issue/PR relates to code supported by the Ansible community. and removed support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Sep 20, 2018
@ansibot ansibot added support:core This issue/PR relates to code supported by the Ansible Engineering Team. and removed support:community This issue/PR relates to code supported by the Ansible community. labels Nov 26, 2018
@bcoca bcoca requested a review from s-hertel August 23, 2019 15:24
@bcoca
Copy link
Member

bcoca commented Aug 23, 2019

this should be solved by our directory cache at this point, each task should not retrigger glob.glob, only when an include/import happens and adds new directory to pathing.

assigned to verify

@bcoca bcoca added the needs_verified This issue needs to be verified/reproduced by maintainer label Aug 23, 2019
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Dec 6, 2020
@ansibot ansibot added the pre_azp This PR was last tested before migration to Azure Pipelines. label Dec 6, 2020
@ansibot ansibot removed the has_issue label Jul 12, 2023
@ansibot ansibot added needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html and removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html labels Oct 24, 2023
s-hertel pushed a commit to s-hertel/ansible that referenced this pull request Dec 20, 2023
this is based on ansible#32609, but now
the cache is cleared when a new directory is added to the loader

* avoid calling glob.glob() when looking for py files in variable folders

* for each host listed in the inventory. improves performance by 30%.
  (in 2017, for unspecified example)

I added a global cache so all plugin loaders stay in sync like the other
caches, not sure if this is wanted.

ansible#79687 also identifies this as a hot code path.
s-hertel added a commit to s-hertel/ansible that referenced this pull request Dec 20, 2023
this is based on ansible#32609

* avoid calling glob.glob() when looking for py files in variable folders for each host listed in the inventory.

In addition, I added a global cache so all plugin loaders stay in sync
(not sure if this is wanted), and reset the cache when new directories
are added to the loader, like the other caches.

ansible#79687 also identifies this as a hot code path.

Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
@s-hertel
Copy link
Contributor

I confirmed this is still happening, and I've opened a refreshed version of this PR #82448, sorry for not doing it sooner. The example inventory you provided now only takes about 10 seconds on devel, and 5 seconds on the refreshed PR!

s-hertel added a commit to s-hertel/ansible that referenced this pull request Jan 31, 2024
this is based on ansible#32609

* avoid calling glob.glob() when looking for py files in variable folders for each host listed in the inventory.

In addition, I added a global cache so all plugin loaders stay in sync
(not sure if this is wanted), and reset the cache when new directories
are added to the loader, like the other caches.

ansible#79687 also identifies this as a hot code path.

Co-authored-by: Sloane Hertel <19572925+s-hertel@users.noreply.github.com>
@Akasurde
Copy link
Member

closing in favour of #82448

@Akasurde Akasurde closed this Jun 13, 2024
@sivel sivel removed the needs_verified This issue needs to be verified/reproduced by maintainer label Jun 18, 2024
@ansible ansible locked and limited conversation to collaborators Jul 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.5 This issue/PR affects Ansible v2.5 bug This issue/PR relates to a bug. needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. new_plugin This PR includes a new plugin. performance pre_azp This PR was last tested before migration to Azure Pipelines. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants