Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"retries" without "until": retry until success #20802

Closed
candlerb opened this issue Jan 29, 2017 · 15 comments
Closed

"retries" without "until": retry until success #20802

candlerb opened this issue Jan 29, 2017 · 15 comments
Labels
affects_2.16 c:executor/playbook_executor c:executor/task_executor c:playbook/task feature This issue/PR relates to a feature request. has_pr This issue has an associated PR. support:core This issue/PR relates to code supported by the Ansible Engineering Team.

Comments

@candlerb
Copy link
Contributor

candlerb commented Jan 29, 2017

ISSUE TYPE
  • Feature Idea
COMPONENT NAME

retries

ANSIBLE VERSION
ansible 2.2.1.0
  config file =
  configured module search path = Default w/o overrides
CONFIGURATION

N/A

OS / ENVIRONMENT

N/A

SUMMARY

At the moment, giving "retries:" without an "until:" condition has no effect.

It would be useful if this could be implemented to mean "retry until success"

(Or if not, at least generate an error saying that retries has no effect without until)

STEPS TO REPRODUCE
- hosts: localhost
  tasks:
  - file: path=/tmp/zzz state=absent

  - name: testing timeout
    get_url:
      url: http://1.2.3.4/zzz
      dest: /tmp/zzz
    retries: 2
    delay: 3
EXPECTED RESULTS

The module is retried 2 times.

ACTUAL RESULTS

The module is run only once.

PLAY [localhost] ***************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [file] ********************************************************************
ok: [localhost]

TASK [testing timeout] *********************************************************
fatal: [localhost]: FAILED! => {"changed": false, "dest": "/tmp/zzz", "failed": true, "msg": "Request failed: <urlopen error timed out>", "state": "absent", "url": "http://1.2.3.4/zzz"}
	to retry, use: --limit @/home/ubuntu/ert.retry

To get the desired behaviour you have to add three lines:

    register: result
    until: result|succeeded
    ignore_errors: yes

But an unfortunate side-effect of ignore_errors is that a failure even after all the retries is ignored:

TASK [testing timeout] *********************************************************
FAILED - RETRYING: TASK: testing timeout (2 retries left).
FAILED - RETRYING: TASK: testing timeout (1 retries left).
fatal: [localhost]: FAILED! => {"attempts": 2, "changed": false, "dest": "/tmp/zzz", "failed": true, "msg": "Request failed: <urlopen error timed out>", "state": "absent", "url": "http://1.2.3.4/zzz"}
...ignoring

So you need to add another task, something like this:

  - name: report error
    fail:
      msg: 'Unable to download {{result.url}}: {{result.response|default(result.msg)}}'
    when: not result|succeeded

This is pretty messy when all you wanted to do is say "if this task fails, retry it up to N times and then give up"

@ansibot ansibot added affects_2.2 This issue/PR affects Ansible v2.2 feature_idea needs_triage Needs a first human triage before being processed. labels Jan 29, 2017
@alikins alikins added c:executor/playbook_executor c:executor/task_executor c:playbook/task and removed needs_triage Needs a first human triage before being processed. labels Jan 30, 2017
@ansibot ansibot added the support:core This issue/PR relates to code supported by the Ansible Engineering Team. label Jun 29, 2017
feanil added a commit to openedx-unsupported/configuration that referenced this issue Sep 5, 2017
The `retries` statement is just ignored unless an `until` clause
is provided.

ansible/ansible#20802
feanil added a commit to openedx-unsupported/configuration that referenced this issue Sep 6, 2017
The `retries` statement is just ignored unless an `until` clause
is provided.

ansible/ansible#20802
feanil added a commit to openedx-unsupported/configuration that referenced this issue Sep 6, 2017
The `retries` statement is just ignored unless an `until` clause
is provided.

ansible/ansible#20802
@wvidana
Copy link
Contributor

wvidana commented Oct 12, 2017

Yup, this just happened to me on Ansible 2.3. The run failed and aborted instead of retrying (I didn't set until neither, I expected it to retry until task succeded)

ansible 2.3.2.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.13 (default, Feb 13 2017, 15:15:09) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]

@ansibot ansibot added feature This issue/PR relates to a feature request. and removed feature_idea labels Mar 2, 2018
@MPV
Copy link

MPV commented May 15, 2018

Is this maybe solved now using the failed_when feature?

See:
http://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html#controlling-what-defines-failure

I have (so far) successfully migrated my tasks into failed_when from the above mentioned "fail: when: not result|succeeded" method.

@MPV
Copy link

MPV commented May 15, 2018

A complete example of what I changed it into to get it to work:

- name: check nexus
  get_url:
    url: "http://{{ nexus_host }}/nexus/"
    dest: /tmp/nexus
    force: yes
  register: nexus_result
  until: nexus_result|succeeded
  retries: 10
  delay: 15
  failed_when: nexus_result is failure

@candlerb
Copy link
Contributor Author

I have retested with ansible 2.5, and now the "ignore_errors:" part is no longer needed.

So the following works correctly, in both success and failure situations:

- hosts: localhost
  tasks:
  - file: path=/tmp/zzz state=absent

  - name: testing timeout
    get_url:
      url: http://1.2.3.4/zzz
      dest: /tmp/zzz
    retries: 2
    delay: 3
    register: result            # <<<
    until: result is succeeded  # <<<

  - debug:
      msg: finished

There is no benefit from adding failed_when: result is failed (although it doesn't stop it working).

Specifying retries: without until: still does nothing though, meaning you need to add the two highlighted lines.

It's better than it was, but I still think the default for retries: should be "until success", or at least emit a warning that retries: without until: does nothing.

@wvidana
Copy link
Contributor

wvidana commented May 15, 2018

I believe this is still happening on 2.5

@candlerb
Copy link
Contributor Author

@woqer: what exactly is still happening? Can you provide a sample playbook?

The playbook I just posted was tested with ansible 2.5.0-1ppa~xenial

It shows that you still need an until: condition with retries:, but you no longer need ignore_errors: nor a subsequent playbook task to catch errors.

@wvidana
Copy link
Contributor

wvidana commented May 15, 2018

@candlerb you still need until: with retries: on 2.5.1, sorry I wasn't specific before

@jeking3
Copy link
Contributor

jeking3 commented Dec 10, 2018

retries: without any other conditions should check for succeeded by default; please implement this!

@SJrX
Copy link

SJrX commented Feb 22, 2019

Yes please do, it was very confusing. At the very least implement an error: For example the api.ipify.org fails sometimes so I tried:

- name: "Determine current public IP Address"
  ipify_facts:
  register: ipify_facts
  retries: 10
  delay: 5

That doesn't work but doesn't generate an error or anything.

@captainjs
Copy link

Hello,
It's been a while (over a year) since that problem was reported.
The situation is the same today: the retry was not working because we haven't provided an "until".
That's documented, but still, on operational level we are expecting to have a warning displayed, "warning: a reply without until is present, so the retry is ignored" or something like that.

A user just wasted a few hours on that until we finally realised what was happening.

At this point, it seems to be lost in merging, various blabla about it, and the issue is not addressed.

Can someone clarify the status?

@jeking3
Copy link
Contributor

jeking3 commented May 15, 2019

The point of this issue is to determine what happens when until is removed. You provided it for both tests.

@captainjs
Copy link

The point of this issue is to determine what happens when until is removed. You provided it for both tests.

Ok but is there anyone working on ensuring that there's at least a warning when there's only "reply" and no "until"? So that the user can see it's ignored?

@ansibot ansibot added the has_pr This issue has an associated PR. label Jul 27, 2019
@froblesmartin
Copy link

This will be addressed with this PR: #43128

karlmdavis added a commit to CMSgov/beneficiary-fhir-data that referenced this issue Sep 26, 2019
Trying a different approach from the previous wait_timeout, which wasn't
working. This approach was suggested here:
<ansible/ansible#20802>.

BLUEBUTTON-1112
@s-hertel
Copy link
Contributor

s-hertel commented Dec 4, 2020

Removing the register and until boilerplate for retries seems convenient.

@MallocArray
Copy link

I just ran into the same thing where I added retries expecting it to retry if there was a failure, but since I didn't have an until: parameter, the retry was skipped without warning.

@jborean93 jborean93 added affects_2.16 and removed affects_2.2 This issue/PR affects Ansible v2.2 affects_2.13 affects_2.14 labels Apr 12, 2023
@ansible ansible locked and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.16 c:executor/playbook_executor c:executor/task_executor c:playbook/task feature This issue/PR relates to a feature request. has_pr This issue has an associated PR. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.