Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in loader.py all() function for each host. #32609

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

skamithi
Copy link
Contributor

@skamithi skamithi commented Nov 7, 2017

SUMMARY

Working a customer who has a very large inventory. Recent fixes to devel help reduce inventory load time from 4 minutes down to 30 seconds. Ran vmprof to see how to get any additional performance savings and noticed that glob.glob() is been called several times in the lib/ansible/plugins/loader.py all() function. did not see the need for this as the .py files been polled are the same for each host listed in the inventory. preventing glob.glob() from been called for each host listed in the inventory improves performance by 30%
-->

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

lib/ansible/plugins/loader.py

ANSIBLE VERSION
ansible 2.5.0
  config file = None
  configured module search path = [u'/home/ansible/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible
  executable location = /home/ansible/inv-test/bin/ansible
  python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]
ADDITIONAL INFORMATION

Because I cannot use the customer's inventory output, I created a simulation of their output. Here is the script to generate the inventory used to test this patch. The script generates 1001 groups with an average group size of 19 and a total host count of 19999. Each host has 99 host variables defined.

#!/usr/bin/python

import json

host_count = 20000
json_output = {
    'all': {'hosts': []},
    '_meta': {'hostvars': {}}
}
base_name = 'server'

metadata = {}

for i in range(1, 100):
    _key = "blah" + str(i)
    metadata[_key] = "blahblahblah"

groups = 1000
host_count = 20000
group_size = host_count / groups
group_number = 0
_group_size = 0
_new_group_name = 'group_' + str(group_number)

for i in range(1, host_count):
    if _group_size >= group_size:
        group_number += 1
        _new_group_name = 'group_' + str(group_number)
        _group_size = 0
    servername = base_name + str(i)
    json_output['all']['hosts'].append(servername)
    json_output['_meta']['hostvars'][servername] = metadata

    if not json_output.get(_new_group_name):
        json_output[_new_group_name] = {'hosts': []}
    json_output.get(_new_group_name).get('hosts').append(servername)
    _group_size += 1
print(json.dumps(json_output))

The vmprofile before the patch looks as follows

vmprof output:
%:      name:                                       location:
100.0%  run_path                                    /usr/lib/python2.7/runpy.py:235
100.0%  _run_module_code                            /usr/lib/python2.7/runpy.py:75
100.0%  _run_module_as_main                         /usr/lib/python2.7/runpy.py:147
100.0%  _run_code                                   /usr/lib/python2.7/runpy.py:62
100.0%  main                                        /home/ansible/inv-test/lib/python2.7/site-packages/vmprof/__main__.py:30
100.0%  <module>                                    backport.py:19
100.0%  <module>                                    /home/ansible/inv-test/lib/python2.7/site-packages/vmprof/__main__.py:1
98.7%   <module>                                    /home/ansible/inv-test/bin/ansible:21
98.2%   run                                         backport.py.py:121
65.7%   json_inventory                              backport.py.py:257
65.1%   _get_host_variables                         backport.py.py:194
65.0%   get_vars                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:204
28.9%   all                                         /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/loader.py:405
28.0%   _plugins_play                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:297
27.6%   _plugins_inventory                          /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:284
22.6%   _get_plugin_vars                            /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:265
22.4%   get_vars                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/vars/host_group_vars.py:60
20.1%   glob                                        /usr/lib/python2.7/glob.py:18
18.7%   iglob                                       /usr/lib/python2.7/glob.py:29
16.0%   parse_sources                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/manager.py:194
16.0%   __init__                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/manager.py:121
16.0%   _play_prereqs                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/cli/__init__.py:776
15.3%   parse_source                                /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/manager.py:218
15.2%   parse                                       /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/inventory/script.py:64
14.7%   n:PyObject_Call:0:-                        
13.2%   dump                                        backport.py.py:182
13.2%   dumps                                       /usr/lib/python2.7/json/__init__.py:193
12.8%   realpath                                    /home/ansible/inv-test/lib/python2.7/posixpath.py:372
12.8%   encode                                      /usr/lib/python2.7/json/encoder.py:186
12.1%   glob1                                       /usr/lib/python2.7/glob.py:71
11.9%   _iterencode                                 /usr/lib/python2.7/json/encoder.py:417
11.0%   n:PyEval_EvalCodeEx:0:-                    
10.6%   _iterencode_dict                            /usr/lib/python2.7/json/encoder.py:341
10.0%   _joinrealpath                               /home/ansible/inv-test/lib/python2.7/posixpath.py:380
9.8%    all_plugins_play                            /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:312
9.2%    all_plugins_inventory                       /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:309
9.2%    groups_plugins_play                         /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:323
9.0%    groups_plugins_inventory                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/manager.py:319
8.8%    populate_host_vars                          /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/inventory/__init__.py:83
7.2%    set_variable                                /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/data.py:212
7.2%    n:<native symbol 0x4ddd31>:0:-             
6.4%    n:<native symbol 0x51e191>:0:-             

Before the patch

this inventory using backport.py takes 25.01s user 2.61s system 99% cpu 27.824 total on my laptop.

After the patch

Inventory loads in 19.25s user 1.56s system 99% cpu 20.948 total

@skamithi skamithi changed the title large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in inventory var module search for each host. large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in loader.py all() function for each host. Nov 7, 2017
@ansibot ansibot added affects_2.5 bugfix_pull_request needs_triage python3 support:core labels Nov 7, 2017
@ansibot
Copy link
Contributor

@ansibot ansibot commented Nov 7, 2017

The test ansible-test sanity --test pep8 [?] failed with the following error:

lib/ansible/plugins/loader.py:417:5: E303 too many blank lines (2)

click here for bot help

@ansibot ansibot added the needs_revision label Nov 7, 2017
for each host listed in the inventory. improves performance by 30%.
@skamithi skamithi force-pushed the inventory_script_performance_enhancement branch from e6748e5 to bd70ab9 Compare Nov 7, 2017
@jborean93 jborean93 removed the needs_triage label Nov 8, 2017
@ansibot ansibot removed the needs_revision label Nov 16, 2017
@ansibot ansibot added the stale_ci label Nov 24, 2017
@ansibot ansibot added bug performance and removed bugfix_pull_request labels Mar 2, 2018
@abadger abadger removed the python3 label Mar 20, 2018
@ansibot ansibot added needs_rebase needs_revision labels Mar 28, 2018
@ansibot ansibot added the new_plugin label May 23, 2018
@ansibot ansibot added support:community and removed support:core labels Sep 20, 2018
@ansibot ansibot added support:core and removed support:community labels Nov 26, 2018
@bcoca bcoca requested a review from s-hertel Aug 23, 2019
@bcoca
Copy link
Member

@bcoca bcoca commented Aug 23, 2019

this should be solved by our directory cache at this point, each task should not retrigger glob.glob, only when an include/import happens and adds new directory to pathing.

assigned to verify

@bcoca bcoca added the needs_verified label Aug 23, 2019
@ansibot ansibot removed the stale_ci label Dec 6, 2020
@ansibot ansibot added the pre_azp label Dec 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.5 bug has_issue needs_rebase needs_revision needs_verified new_plugin performance pre_azp support:core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants