Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

play, block, task: New attribute forks #42528

Closed
wants to merge 3 commits into from

Conversation

t-woerner
Copy link

@t-woerner t-woerner commented Jul 9, 2018

With this patch it is possible to limit the number of concurrent task runs for the task where it has been added. This is similar to forks, but affects only one task. It can be used to serialize one or more tasks.

SUMMARY

The ability to limit concurrent task executions for a specific task would help ansible-freeipa to deploy a bigger number of replicas and dependant clients in a cluster environment in a nearly parallel way.

Right now it is not possible to install several replicas with FreeIPA at the same time due to conflicts in the CA configuration step. For ansible-freeipa the installers have been split up into smaller pieces already to be able to replace some of the parts with new roles later on. With this patch in ansible it is possible to have a nearly parallel installation as only the affected step need to be executed in a serial way. The remaining tasks in the role can be done in parallel.

The creation of all replicas with forks:1 is resulting in a by far longer deployment time. Also the proposals from issue #12170 are not working as the registered output of previous tasks is needed.

This fixes #12170
This fixes #24037

ISSUE TYPE
  • Feature Pull Request
ANSIBLE VERSION
ansible 2.5.2
ADDITIONAL INFORMATION

Excerpt from the tasks/install.yml file used to install the replica with the new forks directive:

...
  - name: Install - Setup CA
    ipareplica_setup_ca:
      ### server ###
      setup_ca: "{{ ipareplica_setup_ca }}"
      setup_kra: "{{ result_ipareplica_test.setup_kra }}"
      no_pkinit: "{{ ipareplica_no_pkinit }}"
      no_ui_redirect: "{{ ipareplica_no_ui_redirect }}"
      ### certificate system ###
      subject_base: "{{ result_ipareplica_prepare.subject_base }}"
      ### additional ###
      ccache: "{{ result_ipareplica_prepare.ccache }}"
      _ca_enabled: "{{ result_ipareplica_prepare._ca_enabled }}"
      _ca_file: "{{ result_ipareplica_prepare._ca_file }}"
      _ca_subject: "{{ result_ipareplica_prepare._ca_subject }}"
      _subject_base: "{{ result_ipareplica_prepare._subject_base }}"
      _pkinit_pkcs12_info: "{{ result_ipareplica_prepare._pkinit_pkcs12_info }}"
      _top_dir: "{{ result_ipareplica_prepare._top_dir }}"
      dirman_password: "{{ ipareplica_dirman_password }}"
      config_setup_ca: "{{ result_ipareplica_prepare.config_setup_ca }}"
      config_master_host_name: "{{ result_ipareplica_install_ca_certs.config_master_host_name }}"
      config_ca_host_name: "{{ result_ipareplica_install_ca_certs.config_ca_host_name }}"
      config_ips: "{{ result_ipareplica_prepare.config_ips }}"
    when: result_ipareplica_prepare._ca_enabled
    forks: 1
...

Parallel execution without forks: 1:

...
TASK [ipareplica : Install - Setup CA] *****************************************
task path: /root/ansible/ansible-freeipa/roles/ipareplica/tasks/install.yml:419
fatal: [ipareplica3.test3.local]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ipareplica3.test3.local closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_MJSMjy/ansible_module_ipareplica_setup_ca.py\", line 229, in <module>\r\n    main()\r\n  File \"/tmp/ansible_MJSMjy/ansible_module_ipareplica_setup_ca.py\", line 222, in main\r\n    ca.install(False, config, options, custodia=custodia)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/ca.py\", line 223, in install\r\n    install_step_0(standalone, replica_config, options, custodia=custodia)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/ca.py\", line 303, in install_step_0\r\n    use_ldaps=standalone)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py\", line 467, in configure_instance\r\n    self.start_creation(runtime=runtime)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/service.py\", line 520, in start_creation\r\n    run_step(full_msg, method)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/service.py\", line 510, in run_step\r\n    method()\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/dogtaginstance.py\", line 473, in setup_admin\r\n    master_conn.simple_bind(self.admin_dn, self.admin_password)\r\n  File \"/usr/lib/python2.7/site-packages/ipapython/ipaldap.py\", line 1142, in simple_bind\r\n    bind_dn, bind_password, server_controls, client_controls)\r\n  File \"/usr/lib64/python2.7/contextlib.py\", line 35, in __exit__\r\n    self.gen.throw(type, value, traceback)\r\n  File \"/usr/lib/python2.7/site-packages/ipapython/ipaldap.py\", line 1030, in error_handler\r\n    raise errors.ACIError(info=\"%s %s\" % (info, desc))\r\nipalib.errors.ACIError: Insufficient access:  Invalid credentials\r\n", "msg": "MODULE FAILURE", "rc": 1}
fatal: [ipareplica4.test3.local]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ipareplica4.test3.local closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_muUnai/ansible_module_ipareplica_setup_ca.py\", line 229, in <module>\r\n    main()\r\n  File \"/tmp/ansible_muUnai/ansible_module_ipareplica_setup_ca.py\", line 222, in main\r\n    ca.install(False, config, options, custodia=custodia)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/ca.py\", line 223, in install\r\n    install_step_0(standalone, replica_config, options, custodia=custodia)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/ca.py\", line 303, in install_step_0\r\n    use_ldaps=standalone)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py\", line 467, in configure_instance\r\n    self.start_creation(runtime=runtime)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/service.py\", line 520, in start_creation\r\n    run_step(full_msg, method)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/service.py\", line 510, in run_step\r\n    method()\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/dogtaginstance.py\", line 473, in setup_admin\r\n    master_conn.simple_bind(self.admin_dn, self.admin_password)\r\n  File \"/usr/lib/python2.7/site-packages/ipapython/ipaldap.py\", line 1142, in simple_bind\r\n    bind_dn, bind_password, server_controls, client_controls)\r\n  File \"/usr/lib64/python2.7/contextlib.py\", line 35, in __exit__\r\n    self.gen.throw(type, value, traceback)\r\n  File \"/usr/lib/python2.7/site-packages/ipapython/ipaldap.py\", line 1030, in error_handler\r\n    raise errors.ACIError(info=\"%s %s\" % (info, desc))\r\nipalib.errors.ACIError: Insufficient access:  Invalid credentials\r\n", "msg": "MODULE FAILURE", "rc": 1}
fatal: [ipareplica1.test3.local]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ipareplica1.test3.local closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_QegF3A/ansible_module_ipareplica_setup_ca.py\", line 229, in <module>\r\n    main()\r\n  File \"/tmp/ansible_QegF3A/ansible_module_ipareplica_setup_ca.py\", line 222, in main\r\n    ca.install(False, config, options, custodia=custodia)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/ca.py\", line 223, in install\r\n    install_step_0(standalone, replica_config, options, custodia=custodia)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/ca.py\", line 303, in install_step_0\r\n    use_ldaps=standalone)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py\", line 467, in configure_instance\r\n    self.start_creation(runtime=runtime)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/service.py\", line 520, in start_creation\r\n    run_step(full_msg, method)\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/service.py\", line 510, in run_step\r\n    method()\r\n  File \"/usr/lib/python2.7/site-packages/ipaserver/install/dogtaginstance.py\", line 473, in setup_admin\r\n    master_conn.simple_bind(self.admin_dn, self.admin_password)\r\n  File \"/usr/lib/python2.7/site-packages/ipapython/ipaldap.py\", line 1142, in simple_bind\r\n    bind_dn, bind_password, server_controls, client_controls)\r\n  File \"/usr/lib64/python2.7/contextlib.py\", line 35, in __exit__\r\n    self.gen.throw(type, value, traceback)\r\n  File \"/usr/lib/python2.7/site-packages/ipapython/ipaldap.py\", line 1030, in error_handler\r\n    raise errors.ACIError(info=\"%s %s\" % (info, desc))\r\nipalib.errors.ACIError: Insufficient access:  Invalid credentials\r\n", "msg": "MODULE FAILURE", "rc": 1}
changed: [ipareplica2.test3.local] => {"changed": true}
...

Serial execution with forks: 1:

...
TASK [ipareplica : Install - Setup CA] *****************************************
task path: /root/ansible/ansible-freeipa/roles/ipareplica/tasks/install.yml:419
changed: [ipareplica1.test3.local] => {"changed": true}
changed: [ipareplica2.test3.local] => {"changed": true}
changed: [ipareplica3.test3.local] => {"changed": true}
changed: [ipareplica4.test3.local] => {"changed": true}
...

@ansibot ansibot added affects_2.7 This issue/PR affects Ansible v2.7 feature This issue/PR relates to a feature request. needs_triage Needs a first human triage before being processed. new_contributor This PR is the first contribution by a new community member. small_patch support:core This issue/PR relates to code supported by the Ansible Engineering Team. traceback This issue/PR includes a traceback. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Jul 9, 2018
@bcoca bcoca removed the needs_triage Needs a first human triage before being processed. label Jul 9, 2018
@dagwieers
Copy link
Contributor

I would call this option forks, make it override the global forks option and make this configurable on a per-task, per-block, per-role and per-play option, see: #24037

@mattclay

This comment has been minimized.

@mattclay mattclay added the ci_verified Changes made in this PR are causing tests to fail. label Jul 16, 2018
t-woerner added a commit to freeipa/ansible-freeipa that referenced this pull request Jul 19, 2018
…f CA

There is a pull request and also a proposal for ansible be able to limit the
number of concurrent executions for a single task:

- ansible/proposals#129
- ansible/ansible#42528

The keyword is currently named max_concurrent, but might be renamed later
on. If the keyword is present, but not supported by ansible, it will be
simply ignored. Therefore there is no issue right now with adding in here
early.
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jul 25, 2018
@notrix

This comment has been minimized.

@ansibot ansibot added the support:community This issue/PR relates to code supported by the Ansible community. label Sep 16, 2018
@ansibot ansibot removed the support:community This issue/PR relates to code supported by the Ansible community. label Nov 26, 2018
@kustodian
Copy link
Contributor

This looks like a really simple change for a feature that could be really useful. I have a few situations where this would make things a lot simpler.

If the core devs don't have anything against it, how can we get them to discuss about the name of the option and get it merged?

@bcoca
Copy link
Member

bcoca commented Jan 14, 2019

@kustodian simple to add, but not when dealing with implications:

  • how is this affected by global fork limits?
  • how does this interact with 'serial' ?
  • how does this affect max_fail_percentage?
  • what is the expected result with run_once or 'bypass host loop' actions?
  • how is delegation affected?
  • at which failure point do we trigger block error handling (rescue)?
  • how do we handle variable/value precedence when they return facts?
  • how do we handle precedence when they update inventory or other shared resources (add_host/group_by)?

And I'm sure I'm missing quite a few considerations, the list above is just from the top of my head. These, and more, have been brought up in adhoc and core IRC meetings discussing this topic (but possibly not this specific ticket) and we have not found good answers for most of those, we need to be able to answer all of the above before we consider adding it to the code base.

@kustodian
Copy link
Contributor

@bcoca then it's best to just close this ticket with the explanation that it complicates a lot of things and because of that it will not be added into Ansible anytime in the near future.

@bcoca
Copy link
Member

bcoca commented Jan 16, 2019

@kustodian we are still considering the feature, we just need to figure out how to deal with all the associated issues it entails.

@t-woerner
Copy link
Author

@bcoca here are answers for the items where I am able to provide answers for the open questions.

@kustodian simple to add, but not when dealing with implications:

* how is this affected by global fork limits?

The minimum of global fork limit and inventory hosts defines how many workers are created. The current implementation of max_concurrent is done in StrategyBase._queue_task. It is only effective if max_concurrent is bigger than 0 and max_concurrent is smaller than the number of workers. If the current worker index (self._cur_worker) is greater or equal task.max_concurrent the index is set back to 0. This is the same mechanism that is used to switch back to worker 0 if the maximum number of workers has been reached.
That means that max_concurrent is not able to increase the number of workers. A decrease of the global fork limit will also decrease the maximum number that can be used with max_concurrent to have an effect. max_concurrent could be seen as a task local limitation of the number of workers.

* how does this interact with 'serial' ?

Batches are created according to the number of hosts that need to be processed with limiting the number of hosts in a batch to serial count. These batches are processed one after the other. The inventory is restricted to the hosts that are part of the batch while processing the batch. As the number of hosts defined in the inventory will also limit the number of workers, max_concurrent will only have an effect if it is smaller than serial if serial is used. serial is limiting the number of workers, that could be further limited with max_concurrent.

* how does this affect max_fail_percentage?

The calculation of the percentage of fails is done according to the iterator.batch_size and the number of failed hosts. As iterator.batch_size and also failed_hosts is not changed by the implementation of max_concurrrent I do not see how this could be affected.

* what is the expected result with run_once or 'bypass host loop' actions?

As max_concurrent is only temporary limiting the number of workers without changing the task or the batch size I do not see how this could affect run_once as there is no way to use max_concurrent to force to not use any workers. The expected result is the same as without max_concurrent.

* how is delegation affected?

I do not see how max_concurrent with the temporary limitation of workers should affect this. Is global fork limit affecting it?

* at which failure point do we trigger block error handling (rescue)?

I do not see how max_concurrent with the temporary limitation of workers should affect this. Is global fork limit affecting it?

* how do we handle variable/value precedence when they return facts?

I do not see how max_concurrent with the temporary limitation of workers should affect this. Is global fork limit affecting it?

* how do we handle precedence when they update inventory or other shared resources (add_host/group_by)?

I do not see how max_concurrent with the temporary limitation of workers should affect this. Is global fork limit affecting it?

And I'm sure I'm missing quite a few considerations, the list above is just from the top of my head. These, and more, have been brought up in adhoc and core IRC meetings discussing this topic (but possibly not this specific ticket) and we have not found good answers for most of those, we need to be able to answer all of the above before we consider adding it to the code base.

Please let us start an open discussion.

@bcoca
Copy link
Member

bcoca commented Jan 31, 2019

@t-woerner that is mostly how this implementation works, but the discussion is more about 'how should it work?'

I do think yours is one of the simplest approaches by having it be a subset of existing workers, but I expect users to do something like the following: serial: 3 forks: 5 max_concurrent: 5 and expect an expansion on serial. Even with current code, they should at least get a warning that you can never exceed min(serial, forks) .. though i'm not as sure as you are about the actual behaviour (need to test).

Another example, when combining max_concurrent: X with run_once ... i would also expect a warning or error since those settings are in conflict and we can only obey one. Its simpler with serial/fork since run_once is clearly more specific (on the task vs play and config) but when both are on the task there is no clear way to set expectations (see loop/when issues).

It is not only mapping out 'desired behaviour' its also responding to conflicts and giving the user feedback about incompatibilities.

FYI, the discussion is always open, it just has lacked someone pushing it to the forefront.

@dagwieers
Copy link
Contributor

dagwieers commented Jan 31, 2019

/me still thinks this ought to be a per-play, per-role, per-block, per-task directive named forks. Which would influence the global/command-line forks option. Could even be ansible_forks if need be.

@t-woerner
Copy link
Author

@t-woerner that is mostly how this implementation works, but the discussion is more about 'how should it work?'

I do think yours is one of the simplest approaches by having it be a subset of existing workers, but I expect users to do something like the following: serial: 3 forks: 5 max_concurrent: 5 and expect an expansion on serial. Even with current code, they should at least get a warning that you can never exceed min(serial, forks) .. though i'm not as sure as you are about the actual behaviour (need to test).

I used the name max_concurrent exactly because of the current behaviour. The number of workers is limited by forks and also the number of hosts in the current batch. I would not expect a simply expansion in this case. The serial option might have been added because of an issue with controller, nodes, used software or environment. This is the same as max_concurrent. I added this to the role to be able to successfully deploy replicas. Right now it is 2 using the same master without getting into trouble, but with some enhancements this might increase to higher values. This can then be adapted with the version of the software for example. But it should not automatically be increased.

Information about a low serial setting with a high forks setting might be good to have. This might also be good for settings in other levels. But most likely people will only see this while debugging.

Another example, when combining max_concurrent: X with run_once ... i would also expect a warning or error since those settings are in conflict and we can only obey one. Its simpler with serial/fork since run_once is clearly more specific (on the task vs play and config) but when both are on the task there is no clear way to set expectations (see loop/when issues).

I think it should be possible to add a conflict test with run_once on the task level, but it will not be simple to add a test like this by looking at other levels.

It is not only mapping out 'desired behaviour' its also responding to conflicts and giving the user feedback about incompatibilities.

Testing and giving feedback of all settings of all levels (command line, play, blocks, tasks, ...) that might conflict or influence others is really way more work and also invasive than this PR is supposed to be and to be honest it is out of scope of this PR in my opinion. It is something global that needs to be started independently.

FYI, the discussion is always open, it just has lacked someone pushing it to the forefront.

@t-woerner
Copy link
Author

What priority do you see for this PR? If it is going to be part of 2.8, then it has high priority and I will be able to spend more time. But if it needs to wait till 2.9 or later, then it is low priority and will move down to the bottom of my work plan.

@dagwieers
Copy link
Contributor

@t-woerner I hope this can ship with Ansible v2.8.

@sivel
Copy link
Member

sivel commented Feb 25, 2019

To be completely honest, this may not ship with 2.8. We are evaluating the further impact of this with a new feature for selecting the process model (threading vs forking) which is in progress and intended to land in 2.9. The impact of this work could actually cause issues with that planned feature, which could be negative if accepted first.

@ansibot ansibot added stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. labels Mar 5, 2019
run_once: yes
- block:
- name: "Test 1 (max forks: 3)"
command: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of python this should be utilizing ansible_python_interpreter, potentially falling back to another python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible_playbook_python, since you are executing on "localhosts" (i assume connection = local)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this in the test case.

@bcoca bcoca added this to TODO: Backlog, anything can go here. Anyone can work on this after their approved work is done. in Ansible 2.9 Mar 14, 2019
@bcoca bcoca moved this from TODO: Backlog, anything can go here. Anyone can work on this after their approved work is done. to In Progress: Dev places card in this column when they are *ACTIVELY* working on it. Use WIP on PRs that are not yet complete in Ansible 2.9 May 6, 2019
@bcoca bcoca moved this from In Progress: Dev places card in this column when they are *ACTIVELY* working on it. Use WIP on PRs that are not yet complete to TODO: Backlog, anything can go here. Anyone can work on this after their approved work is done. in Ansible 2.9 May 6, 2019
@kustodian
Copy link
Contributor

Any updates about threading vs forking?

@sivel
Copy link
Member

sivel commented Jun 7, 2019

Any updates about threading vs forking?

To my knowledge, we are working through some testing decisions about how we will properly test both process models without overloading the CI infrastructure, but still get enough coverage.

@ansibot ansibot removed the stale_review Updates were made after the last review and the last review is more than 7 days old. label Jul 1, 2019
@ansibot ansibot added the stale_review Updates were made after the last review and the last review is more than 7 days old. label Jul 9, 2019
With ansible_python_interpreter defined in inventory file using
ansible_playbook_python.

Signed-off-by: Thomas Woerner <twoerner@redhat.com>
@ansibot ansibot removed stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. labels Aug 5, 2019
@t-woerner
Copy link
Author

According to https://docs.ansible.com/ansible/latest/roadmap/ROADMAP_2_9.html the Beta 1 Feature (No new functionality (including modules/plugins) to any code) is in less than a month.

Will the threading and forking work get merged for 2.9 still? There has not been an update for #44280 since June 21st.

sivel
sivel previously requested changes Aug 5, 2019
test/integration/targets/forks/test_forks.yml Show resolved Hide resolved
@sivel sivel dismissed their stale review August 5, 2019 18:43

Dismiss

@t-woerner
Copy link
Author

@gundalow

If this is accepted we will need docs.

When is this needed and what is the format?

Could test it via runme.sh to run a Playbook and count the number of forks found in the debug logs

There are ests for linear and free strategy: e31cc8d

@ansibot ansibot added stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. labels Aug 14, 2019
@jimi-c jimi-c closed this Aug 28, 2019
@jimi-c
Copy link
Member

jimi-c commented Aug 28, 2019

Merged #60702 which supersedes this.

@ansible ansible locked and limited conversation to collaborators Oct 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.7 This issue/PR affects Ansible v2.7 docs This issue/PR relates to or includes documentation. feature This issue/PR relates to a feature request. has_issue needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. new_contributor This PR is the first contribution by a new community member. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. support:community This issue/PR relates to code supported by the Ansible community. support:core This issue/PR relates to code supported by the Ansible Engineering Team. traceback This issue/PR includes a traceback.
Projects
No open projects
Ansible 2.9
TODO: Backlog, anything can go here. ...
Development

Successfully merging this pull request may close these issues.

Add forks per play or per task support for "serial" on an individual task