large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in all() function for each host. #32609

base: devel
skamithi commented Nov 7, 2017


Working a customer who has a very large inventory. Recent fixes to devel help reduce inventory load time from 4 minutes down to 30 seconds. Ran vmprof to see how to get any additional performance savings and noticed that glob.glob() is been called several times in the lib/ansible/plugins/ all() function. did not see the need for this as the .py files been polled are the same for each host listed in the inventory. preventing glob.glob() from been called for each host listed in the inventory improves performance by 30%

  • Bugfix Pull Request


ansible 2.5.0
  config file = None
  configured module search path = [u'/home/ansible/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible
  executable location = /home/ansible/inv-test/bin/ansible
  python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]

Because I cannot use the customer's inventory output, I created a simulation of their output. Here is the script to generate the inventory used to test this patch. The script generates 1001 groups with an average group size of 19 and a total host count of 19999. Each host has 99 host variables defined.


import json

host_count = 20000
json_output = {
    'all': {'hosts': []},
    '_meta': {'hostvars': {}}
base_name = 'server'

metadata = {}

for i in range(1, 100):
    _key = "blah" + str(i)
    metadata[_key] = "blahblahblah"

groups = 1000
host_count = 20000
group_size = host_count / groups
group_number = 0
_group_size = 0
_new_group_name = 'group_' + str(group_number)

for i in range(1, host_count):
    if _group_size >= group_size:
        group_number += 1
        _new_group_name = 'group_' + str(group_number)
        _group_size = 0
    servername = base_name + str(i)
    json_output['_meta']['hostvars'][servername] = metadata

    if not json_output.get(_new_group_name):
        json_output[_new_group_name] = {'hosts': []}
    _group_size += 1

The vmprofile before the patch looks as follows

vmprof output:
%:      name:                                       location:
100.0%  run_path                                    /usr/lib/python2.7/
100.0%  _run_module_code                            /usr/lib/python2.7/
100.0%  _run_module_as_main                         /usr/lib/python2.7/
100.0%  _run_code                                   /usr/lib/python2.7/
100.0%  main                                        /home/ansible/inv-test/lib/python2.7/site-packages/vmprof/
100.0%  <module>                          
100.0%  <module>                                    /home/ansible/inv-test/lib/python2.7/site-packages/vmprof/
98.7%   <module>                                    /home/ansible/inv-test/bin/ansible:21
98.2%   run                               
65.7%   json_inventory                    
65.1%   _get_host_variables               
65.0%   get_vars                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
28.9%   all                                         /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/
28.0%   _plugins_play                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
27.6%   _plugins_inventory                          /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
22.6%   _get_plugin_vars                            /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
22.4%   get_vars                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/vars/
20.1%   glob                                        /usr/lib/python2.7/
18.7%   iglob                                       /usr/lib/python2.7/
16.0%   parse_sources                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/
16.0%   __init__                                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/
16.0%   _play_prereqs                               /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/cli/
15.3%   parse_source                                /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/
15.2%   parse                                       /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/inventory/
14.7%   n:PyObject_Call:0:-                        
13.2%   dump                              
13.2%   dumps                                       /usr/lib/python2.7/json/
12.8%   realpath                                    /home/ansible/inv-test/lib/python2.7/
12.8%   encode                                      /usr/lib/python2.7/json/
12.1%   glob1                                       /usr/lib/python2.7/
11.9%   _iterencode                                 /usr/lib/python2.7/json/
11.0%   n:PyEval_EvalCodeEx:0:-                    
10.6%   _iterencode_dict                            /usr/lib/python2.7/json/
10.0%   _joinrealpath                               /home/ansible/inv-test/lib/python2.7/
9.8%    all_plugins_play                            /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
9.2%    all_plugins_inventory                       /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
9.2%    groups_plugins_play                         /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
9.0%    groups_plugins_inventory                    /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/vars/
8.8%    populate_host_vars                          /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/plugins/inventory/
7.2%    set_variable                                /home/ansible/inv-test/local/lib/python2.7/site-packages/ansible/inventory/
7.2%    n:<native symbol 0x4ddd31>:0:-             
6.4%    n:<native symbol 0x51e191>:0:-             

Before the patch

this inventory using takes 25.01s user 2.61s system 99% cpu 27.824 total on my laptop.

After the patch

Inventory loads in 19.25s user 1.56s system 99% cpu 20.948 total

@skamithi skamithi changed the title large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in inventory var module search for each host. large inventory loading performance enhancement. avoid calling glob.glob() search for .py files in all() function for each host. Nov 7, 2017
for each host listed in the inventory. improves performance by 30%.
this should be solved by our directory cache at this point, each task should not retrigger glob.glob, only when an include/import happens and adds new directory to pathing.

assigned to verify

