Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance issue with large number of groups #13957

Merged
merged 1 commit into from
May 27, 2016

Conversation

towolf
Copy link
Contributor

@towolf towolf commented Jan 18, 2016

Ansible excessively checks the file system for the potential presence of
group_vars and host_vars files.

For large numbers of groups this leads to combinatorial performance
issues.

This commit generates a set of group_vars and host_vars filenames for
every possible location and then checks against the sets before making a
stat of the file system.

Also included in this commit is caching of the base directory lookup
for the inventory.

related to issue #13956

Ansible excessively checks the file system for the potential presence of
`group_vars` and `host_vars` files.

For large numbers of groups this leads to combinatorial performance
issues.

This commit generates a set of group_vars and host_vars filenames using
`os.listdir()` in every possible location and then checks against the sets
before making a stat of the file system.

Also included in this commit is caching of the base directory lookup
for the inventory.
@jimi-c jimi-c added this to the stable-2.0 milestone Jan 20, 2016
@jimi-c
Copy link
Member

jimi-c commented Mar 31, 2016

cc @srvg

@srgvg
Copy link
Contributor

srgvg commented Apr 1, 2016

First test of this PR shows me an improvement in parse time around 10-20% - compared to the parent commit. I don't have as many groups as @towolf however, but the difference is real enough.

@towolf
Copy link
Contributor Author

towolf commented Apr 1, 2016

We have prepared test case which is structurally identical to our live inventory (just sanitized the json with random strings, cf issue #13956). If you need "more pathological" test data check out the repo:

https://github.com/towolf/ansible-large-inventory-testcase

@hloeffler
Copy link
Contributor

@jimi-c any chance we see this before the next freeze?
( we still stick to 1.9 )

@towolf
Copy link
Contributor Author

towolf commented May 3, 2016

@bcoca maybe you could chime in here as well? We'd really love to have this in before stable2.1 is released as Ansible 2.1.

@alikins
Copy link
Contributor

alikins commented May 9, 2016

I did a little bit of profiling with this patch against the test data (git@github.com:towolf/ansible-large-inventory-testcase.git) and tried a few tweaks of my own at https://github.com/alikins/ansible/tree/inventory_profiling

The code in this (pr #13957) makes the most difference. Running the test case mentioned above went from ~15 seconds to around 2 seconds.

[fedora-23:ansible-large-inventory-testcase (master % u=)]$ time ansible-playbook -i inventory.sh playbook.yml

PLAY [Hi] **********************************************************************

TASK [debug msg=Hi] ************************************************************
ok: [localhost] => {
    "msg": "Hi"
}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   


real    0m6.968s
user    0m6.577s
sys 0m0.402s
[fedora-23:ansible-large-inventory-testcase (master % u=)]$ # Now with the patch from pr 13957 applied
[fedora-23:ansible-large-inventory-testcase (master % u=)]$ time ansible-playbook -i inventory.sh playbook.yml

PLAY [Hi] **********************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Hi"
}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   


real    0m2.826s
user    0m2.689s
sys 0m0.144s
[fedora-23:ansible-large-inventory-testcase (master % u=)]$ time ansible-playbook -i inventory.sh playbook.yml

PLAY [Hi] **********************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Hi"
}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   


real    0m2.735s
user    0m2.595s
sys 0m0.155s
[fedora-23:ansible-large-inventory-testcase (master % u=)]$ time ansible-playbook -i inventory.sh playbook.yml

PLAY [Hi] **********************************************************************

TASK [debug msg=Hi] ************************************************************
ok: [localhost] => {
    "msg": "Hi"
}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   


real    0m1.327s
user    0m1.180s
sys 0m0.160s
[fedora-23:ansible-large-inventory-testcase (master % u=)]$ time ansible-playbook -i inventory.sh playbook.yml

PLAY [Hi] **********************************************************************

TASK [debug msg=Hi] ************************************************************
ok: [localhost] => {
    "msg": "Hi"
}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   


real    0m1.193s
user    0m1.040s
sys 0m0.166s

@alikins
Copy link
Contributor

alikins commented May 9, 2016

I pushed some SVG flamegraphs of before after to https://github.com/alikins/ansible/tree/inventory_profiling/images (though github doesn't seem to want to display them...)

(test to see if inline works)
before svg
after svg

@towolf
Copy link
Contributor Author

towolf commented May 9, 2016

Thanks for the corroboration of my findings, @alikins.

Do the tweaks you mention work in conjunction with my own? Do you have them separated out as a patch or branch somewhere?

Your graphs somehow contain stuff that looks like javascript and text and not SVG graphics.

@alikins
Copy link
Contributor

alikins commented May 10, 2016

Your graphs somehow contain stuff that looks like javascript and text and not SVG graphics.

Oh, indeed. Did something weird uploading those. Should be fixed now.
At least at

before: http://alikins.github.io/ansible/images/playbook-inventory-flame.svg
before

after: http://alikins.github.io/ansible/images/playbook-inventory-flame-with-fixes-finer.svg
after

@alikins
Copy link
Contributor

alikins commented May 27, 2016

Another note about the graphs that isn't obvious. The total time for the first one was in the 12-15s range, while the total time for the second one was in the 1-2s range. (The second graph was ran with higher sampling rate...)

@bcoca bcoca modified the milestones: 2.2.0, stable-2.0 May 27, 2016
@bcoca bcoca merged commit 328b423 into ansible:devel May 27, 2016
@towolf towolf deleted the group_vars_perf_issue branch May 29, 2016 12:01
jimi-c pushed a commit that referenced this pull request Jun 6, 2016
Ansible excessively checks the file system for the potential presence of
`group_vars` and `host_vars` files.

For large numbers of groups this leads to combinatorial performance
issues.

This commit generates a set of group_vars and host_vars filenames using
`os.listdir()` in every possible location and then checks against the sets
before making a stat of the file system.

Also included in this commit is caching of the base directory lookup
for the inventory.
@brandond
Copy link
Contributor

I added a comment to 328b423, but wanted to speak up in this PR as well. The commit introduces a breaking change that prevents Ansible from looking in the working directory for group_vars and host_vars.

@bcoca
Copy link
Member

bcoca commented Jun 22, 2016

@brandond ansible is NOT supposed to look in the working directory, ONLY in the directories in which the inventory and the playbook reside

@brandond
Copy link
Contributor

brandond commented Jun 22, 2016

@bcoca there must have been a regression introduced at some point then. I started using Ansible fairly recently (2.1) and have structured my workspace such that the top-level directory isn't cluttered with playbooks. If there's a better pattern for keeping a clean top-level folder structure, I'm all ears.

I can provide a strace of the group_vars lookup before and after that commit to show that it was indeed checking the working dir; this stopped working after I updated from git, and a bisect isolated the issue to that commit.

@bcoca
Copy link
Member

bcoca commented Jun 22, 2016

if it was checking it at any point, it was a bug, not a feature, I tested with 1.9 and 2.1 and it behaves in both as I describe, so i'm guessing the bug was introduces somewhere in 2.0 due to the rewrite.

@brandond
Copy link
Contributor

I just tested branches stable-1.9, stable-2.0.0.1, and stable-2.1.1. All stable releases starting with 2.0 have checked the working directory.

Documented or not, this is a breaking change that disables functionality that folks may have depended on working for the last two stable releases.

@brandond
Copy link
Contributor

brandond commented Jun 22, 2016

I can even tell you why it was doing that. The Inventory class __init__ sets self._playbook_basedir to None, and then calls parse_inventory. This eventually ends up in a call to self._get_hostgroup_vars with self.playbook_basedir still None, as the playbook has not yet loaded and called set_playbook_basedir to populate the playbook path.

This is demonstrated by adding the following patch to stable-2.1.1:

--- a/lib/ansible/inventory/__init__.py
+++ b/lib/ansible/inventory/__init__.py
@@ -722,10 +722,13 @@ class Inventory(object):
         if not new_pb_basedir:
             basedirs = [_basedir, self._playbook_basedir]
         else:
             basedirs = [self._playbook_basedir]

+        if None in basedirs:
+            raise RuntimeError('basedirs contains None, which is interpreted as working directory! Basedirs: %s' % (basedirs))
+
         for basedir in basedirs:
             # this can happen from particular API usages, particularly if not run
             # from /usr/bin/ansible-playbook
             if basedir in ('', None):
                 basedir = './'

Gives this result:

(ansible-env)[ec2-user@l257387-vm infosec-ansible]$ ansible-playbook playbooks/test.yml -vvv
Using /home/ec2-user/ansible/ansible.cfg as config file
ERROR! Unexpected Exception: basedirs contains None, which is interpreted as working directory! Basedirs: ['/home/ec2-user/ansible/inventory', None]
the full traceback was:

Traceback (most recent call last):
  File "/home/ec2-user/ansible-env/bin/ansible-playbook", line 92, in <module>
    exit_code = cli.run()
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/cli/playbook.py", line 132, in run
    inventory = Inventory(loader=loader, variable_manager=variable_manager, host_list=self.options.inventory)
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/inventory/__init__.py", line 85, in __init__
    self.parse_inventory(host_list)
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/inventory/__init__.py", line 144, in parse_inventory
    group.vars = combine_vars(group.vars, self.get_group_variables(group.name))
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/inventory/__init__.py", line 509, in get_group_variables
    self._vars_per_group[groupname] = self._get_group_variables(groupname, vault_password=vault_password)
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/inventory/__init__.py", line 527, in _get_group_variables
    vars = combine_vars(vars, self.get_group_vars(group))
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/inventory/__init__.py", line 707, in get_group_vars
    return self._get_hostgroup_vars(host=None, group=group, new_pb_basedir=new_pb_basedir)
  File "/home/ec2-user/ansible-env/lib/python2.7/site-packages/ansible/inventory/__init__.py", line 728, in _get_hostgroup_vars
    raise RuntimeError('basedirs contains None, which is interpreted as working directory! Basedirs: %s' % (basedirs))
RuntimeError: basedirs contains None, which is interpreted as working directory! Basedirs: ['/home/ec2-user/ansible/inventory', None]

@towolf
Copy link
Contributor Author

towolf commented Jun 23, 2016

https://xkcd.com/1172/

@brandond
Copy link
Contributor

If it's got to be this way, there should probably be a warning in the 2.2 release notes to the effect that paths that were searched in 2.0 and 2.1 will no longer be checked.

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bugfix_pull_request labels Mar 5, 2018
@ansible ansible locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants