fact gathering optimization for scalability #2553

bengland2 · 2018-05-02T12:56:16Z

The gather and delegate facts task is not scalable, and requires in general O(N^2) fact-gathering operations per host, but it is not clear to me why that should be. I have run tests with this patch that just gathers facts on a host ONCE, instead of N times, and ceph-ansible succeeds just the same. The effect on memory consumption and time to execute this task is dramatic. Particularly for HCI configurations, this could make a big difference in scalability of ceph-ansible.

--- /usr/share/ceph-ansible/site-docker.yml.sample.v31_orig	2018-04-30 19:53:30.539611319 +0000
+++ /usr/share/ceph-ansible/site-docker.yml.sample	2018-04-30 22:05:00.561611319 +0000
@@ -24,26 +24,16 @@
     - name: gather facts
       setup:
       when:
-        - not delegate_facts_host | bool or inventory_hostname in groups.get('clients', [])
+        - not delegate_facts_host | bool
 
     - name: gather and delegate facts
       setup:
-      delegate_to: "{{ item }}"
-      delegate_facts: True
-      with_items: "{{ groups['all'] | difference(groups.get('clients', [])) }}"
       when:
         - delegate_facts_host | bool

Here's a graph of ansible RSS and virtual memory consumption before the patch, with --forks 25, 3 [osds] hosts, 3 [mons] hosts, and 80 [clients] hosts, running ansible 2.4.3 and ceph-ansible-3.1.0-0.1.beta8 starting at 18:35:

And here's the same exact run with the patch, starting at 19:56:

elapsed time for this one fact gathering task drops from 5 minutes to about 1 minute, and RSS drops from 8 GB to about 2 GB.

The text was updated successfully, but these errors were encountered:

bengland2 · 2018-05-03T12:42:26Z

The above patch should not work in general, as @guits explained to me, because ansible modules are interpreting vars about host B on the remote system A, so facts have to be gathered about B on the remote system A to allow these variables to be used (his example below).

However, it may be possible to cut down on the possible (A, B) combinations that have to be fact-gathered to lower the work done in this task from O(N^2) to O(N), where N is the number of hosts in the ansible inventory file.

RADOS clients and OSDs find out about other OSDs, storage pools, etc. by talking to the monitors, not through ceph-ansible. So in theory it should be sufficient for each non-monitor host to gather facts about itself and about the 3 monitor hosts (so that ceph.conf can be constructed).
we don't need all facts about the monitor hosts, probably just a subset including network information.
monitors have to know about all other hosts so that they can construct crush maps, build MDS filesystems, etc. But there are only 3 of these, so it's not so bad.

This might not be that hard to verify -- i.e. search for references to ansible "groups" var. A preliminary search seemed to confirm this.

So the resulting patch might look something like this - only limited testing so far:

    # fact gathering delegation allows modules on remote hosts
    # to access information about other hosts
    
    # this step allows monitors to know cluster configuration
    
    - name: monitors gather facts about everyone
      setup:
      delegate_to: "{{ item }}"
      delegate_facts: True
      with_items: "{{ groups['all'] }}"
      when:
        - inventory_hostname in groups.get('mons', [])

    # this step allows any host to construct a ceph.conf, for example
    
    - name: everyone gathers facts about monitors
      setup:
      delegate_to: "{{ item }}"
      delegate_facts: True
      with_items: "{{ groups['mons'] }}"
      when:
        - inventory_hostname not in groups.get('mons', [])

I am testing this on a 3-node cluster with a single monitor and all other roles defined on the other 2 nodes, using site.yml, and I verified that no fact gathering was done between pairs of non-monitor hosts, therefore it is now O(N) computational complexity.

background: Guillaume Abrioux provided this example illustrating why ceph-ansible requires remote hosts to have facts about other hosts on hand:

# site.yml
---
- hosts:
    - osds
  gather_facts: true

  roles:
    - role: defaults
    - role: osd
    
# roles/osd/tasks/main.yml
- name: test
  debug:
    msg: "{{ hostvars['mon0']['ansible_all_ipv4_addresses'] }}"

results in output like this:

TASK [osd : test] **************************************************************************************************************************************************************************************************************************************************************
fatal: [osd0]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ansible_all_ipv4_addresses'\n\nThe error appears to have been in '/Users/guits/GIT/playbook-test/roles/osd/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: test\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'ansible_all_ipv4_addresses'"}
fatal: [osd1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ansible_all_ipv4_addresses'\n\nThe error appears to have been in '/Users/guits/GIT/playbook-test/roles/osd/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: test\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'ansible_all_ipv4_addresses'"}

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

bengland2 · 2018-05-04T12:01:45Z

The method in the 2nd comment seems to work. I do not totally understand why it works yet, particularly with respect to ceph.conf.j2, which has all sorts of references to vars in other groups besides 'mons'. Guillaume has proposed a much smaller patch that may accomplish the same thing - what does "run_once" addition actually do?

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits · 2018-05-04T12:28:54Z

@bengland2 I think I was a bit confused with facts gathering topic but after some tests, it appears as soon as you've gathered fact 1 time for one node, it's available from any other node.

run_once: true makes the task running only 1 time.
It's just nicer than having something like when: inventory_hostname == groups['all'] | first which would produce a lot of 'skipping' in the log and I think it's better to use run_once: true in case of using --limit

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 75733da) Signed-off-by: Sébastien Han <seb@redhat.com>

bengland2 · 2018-05-17T21:38:08Z

@guits @jtaleric @fultonj I tried it with 80 computes, 4 OSDs and 3 mons using ceph-ansible-3.1.0-0.1.rc3.el7cp.noarch and ansible 2.4.3. Memory consumption for this task dropped to near zero, very cool. But it still takes 10 minutes to get through this one fact gathering task, not as cool. I think it's doing 1 host at a time. I'm leaving the issue closed because the memory problem and O(N^2) behavior is fixed, maybe in a future release I'll take another look at the speed aspect. At least we now have a test methodology.

Since we fixed the `gather and delegate facts` task, this exception is not needed anymore. It's a leftover that should be removed to save some time when deploying a cluster with a large client number. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Since we fixed the `gather and delegate facts` task, this exception is not needed anymore. It's a leftover that should be removed to save some time when deploying a cluster with a large client number. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 8288480) Signed-off-by: Sébastien Han <seb@redhat.com>

this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Since we fixed the `gather and delegate facts` task, this exception is not needed anymore. It's a leftover that should be removed to save some time when deploying a cluster with a large client number. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 8288480)

this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 232a16d) Signed-off-by: Sébastien Han <seb@redhat.com>

guits added a commit that referenced this issue May 3, 2018

playbook: improve facts gathering

0224181

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits added a commit that referenced this issue May 3, 2018

playbook: improve facts gathering

6713c14

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits added a commit that referenced this issue May 3, 2018

playbook: improve facts gathering

5d5141b

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits added a commit that referenced this issue May 3, 2018

playbook: improve facts gathering

1aa7ba9

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits added a commit that referenced this issue May 3, 2018

playbook: improve facts gathering

58a456e

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits mentioned this issue May 3, 2018

playbook: improve facts gathering #2560

Merged

guits added a commit that referenced this issue May 3, 2018

playbook: improve facts gathering

27e493b

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits added a commit that referenced this issue May 4, 2018

playbook: improve facts gathering

0b38fb6

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

leseb closed this as completed in #2560 May 4, 2018

leseb pushed a commit that referenced this issue May 4, 2018

playbook: improve facts gathering

75733da

there is no need to gather facts with O(N^2) way. Only one node should gather facts from other node. Fixes: #2553 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

TamizharasiS mentioned this issue Jun 5, 2018

run_once in gather and delegate facts #2708

Closed

guits added a commit that referenced this issue Jun 5, 2018

rolling_update: fix facts gathering delegation

e49661e

this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

guits mentioned this issue Jun 5, 2018

rolling_update: fix facts gathering delegation #2712

Merged

leseb pushed a commit that referenced this issue Jun 6, 2018

rolling_update: fix facts gathering delegation

232a16d

this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

i-m-wisch mentioned this issue Sep 16, 2018

Switch to containerized ceph daemons fails with more than 99 OSDs #3128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fact gathering optimization for scalability #2553

fact gathering optimization for scalability #2553

bengland2 commented May 2, 2018

bengland2 commented May 3, 2018

bengland2 commented May 4, 2018

guits commented May 4, 2018

bengland2 commented May 17, 2018

fact gathering optimization for scalability #2553

fact gathering optimization for scalability #2553

Comments

bengland2 commented May 2, 2018

bengland2 commented May 3, 2018

bengland2 commented May 4, 2018

guits commented May 4, 2018

bengland2 commented May 17, 2018