Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution engine not backward compatible in 2.0 #15395

Closed
vroy opened this issue Apr 12, 2016 · 15 comments
Closed

Execution engine not backward compatible in 2.0 #15395

vroy opened this issue Apr 12, 2016 · 15 comments
Labels
bug This issue/PR relates to a bug.
Milestone

Comments

@vroy
Copy link
Contributor

vroy commented Apr 12, 2016

ISSUE TYPE
  • Bug Report
ANSIBLE VERSION

Used for actual results:

ansible-playbook 2.0.1.0
  config file =
  configured module search path = Default w/o overrides

Used for expected results:

$ ansible-playbook --version
ansible-playbook 1.9.5
  configured module search path = None
SUMMARY

We've recently upgraded from ansible 1.9.5 to 2.0.1.0 with much success but we ran into issues where failures are not handled as expected:

  1. If any_errors_fatal is true and there's an unreachable host, task errors will be ignored: https://gist.github.com/vroy/e98bd1c552d9a7633ca0fda88a9176c5

  2. Hosts that failed are removed from the current play but restored in the next play: https://gist.github.com/vroy/4b877a7bc9bb996de96a862e82b98e62

PATCH

I'm submitting this as a bug first to get some feedback before opening a pull request. Please let me know if you have any questions or if there's anything I can do to help.

I've been working on a patch (against 2.0.1.0 simply because that's what we use in production) that resolves these two issues: vroy@63471cd

The results after my patch is applied were included in the gists and doesn't seem to cause issues with some of our other happy path playbooks that we use for testing.

One thing to note is that even with my patch, we see more plays being logged though they are not running any tasks as expected. I think this could be improved.

@vroy
Copy link
Contributor Author

vroy commented Apr 25, 2016

An updated set of patches can be seen in this diff: stable-2.0...vroy:backward-compatible-executor

I believe that it fixes most backward compatibility issues with 1.9.

@mattmonkey83
Copy link

The diff above appears to help with an issue I recently logged - #15523

There's an improvement in behaviour where all hosts are unreachable.

Without patch all unreachable -

$ ansible-playbook -i test_inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-204-28-247.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
        to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-204-28-247.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0

With the patch all unreachable -

$ ansible-playbook -i test_inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-204-28-247.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
        to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-204-28-247.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0

It doesn't seem to help when only a subset of the hosts are unreachable however.

Without the patch subset unreachable -

$ ansible-playbook -i test_inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-204-28-247.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

NO MORE HOSTS LEFT *************************************************************

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
        to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-204-28-247.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
ec2-54-242-223-5.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0

With the patch subset unreachable -

$ ansible-playbook -i test_inv test.yml

PLAY [test] ********************************************************************

TASK [ping] ********************************************************************
fatal: [ec2-54-204-28-247.compute-1.amazonaws.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
ok: [ec2-54-242-223-5.compute-1.amazonaws.com]

PLAY [localhost] ***************************************************************

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "Should not run"
}
        to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
ec2-54-204-28-247.compute-1.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0
ec2-54-242-223-5.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=0
localhost                  : ok=1    changed=0    unreachable=0    failed=0

@jimi-c
Copy link
Member

jimi-c commented May 16, 2016

@vroy I'm looking at merging these for 2.1 and would like you to share any sample playbooks you may have. I was going to tackle preserving the failed hosts from the iterator as I mentioned directly on the commit earlier.

@jimi-c jimi-c added this to the 2.1.0 milestone May 16, 2016
jimi-c added a commit that referenced this issue May 17, 2016
This was re-added by 63471cd (and modified by me to use iterator again),
it simply needs to be removed.

Fixes #15395
@jimi-c
Copy link
Member

jimi-c commented May 17, 2016

@vroy created a feature branch (above) against devel with your patches and one fix of my own which actually removes the check on failed hosts when compiling the list of active hosts in the linear strategy (so no need to use the iterator there at all).

Could you please test this against your reproducers and let me know if the issues are still resolved? I'll also be checking #15523, #13750, and #14665 to see if they're resolved.

@jimi-c
Copy link
Member

jimi-c commented May 17, 2016

Oops, the link above is just for the last commit, the branch is here: https://github.com/ansible/ansible/compare/vroy_backward-compatible-executor

@vroy
Copy link
Contributor Author

vroy commented May 17, 2016

@jimi-c Thanks for looking into this.

I have a lot of test playbooks but unfortunately they're tied in our test framework at the moment. It might be a little longer until I can make this easily reproducible.

I've tried your branch and am still seeing hosts being included in future plays: https://gist.github.com/vroy/db8d2bfa18e249b79947a540ce128720

I've tried with my patch against 2.0.1.0 and it's behaving the same as 1.9.5.

@jimi-c
Copy link
Member

jimi-c commented May 17, 2016

That's really odd, because my version of your branch didn't add anything other than to remove the check on failed hosts within the linear strategy, which shouldn't affect future plays.

@jimi-c
Copy link
Member

jimi-c commented May 17, 2016

@vroy I just pushed an additional commit to that branch which makes your example gist function as expected.

jimi-c added a commit that referenced this issue May 17, 2016
This was re-added by 63471cd (and modified by me to use iterator again),
it simply needs to be removed.

Fixes #15395
@jimi-c
Copy link
Member

jimi-c commented May 17, 2016

Just to follow up, it looks like all three of those issues are resolved by this branch in my testing.

@vroy
Copy link
Contributor Author

vroy commented May 17, 2016

I'll run your updated branch against my little test suite again tomorrow morning and let you know how it turns out.

@vroy
Copy link
Contributor Author

vroy commented May 18, 2016

I'm still seeing some issues around how max_fail_percentage is handled: https://gist.github.com/vroy/4c0d7ced6e73a39b6f4ae2d18c293c15

This was using your updated branch:

bash-3.2$ ansible-playbook --version
ansible-playbook 2.2.0 (vroy_backward-compatible-executor 887850bbc7) last updated 2016/05/18 08:19:19 (GMT -300)
  lib/ansible/modules/core: (detached HEAD 92bf802cb8) last updated 2016/05/18 08:19:24 (GMT -300)
  lib/ansible/modules/extras: (detached HEAD e710dc47fe) last updated 2016/05/18 08:19:24 (GMT -300)
  config file =
  configured module search path = Default w/o overrides

@jimi-c
Copy link
Member

jimi-c commented May 18, 2016

Ok, that one is an odd corner-case, so I feel good about merging this work in now for 2.1 rc3 (which we're aiming to release today). I'll continue looking into the max fail % issue above as well.

jimi-c added a commit that referenced this issue May 18, 2016
This was re-added by 63471cd (and modified by me to use iterator again),
it simply needs to be removed.

Fixes #15395
@jimi-c jimi-c closed this as completed in 0f659d6 May 18, 2016
@jimi-c jimi-c reopened this May 18, 2016
@jimi-c
Copy link
Member

jimi-c commented May 18, 2016

@vroy the max_fail_pct thing should be fixed in devel/stable-2.1 as well (tested against your gist above).

# ansible-playbook -vv -i inv max_fail_percentage_does_not_affect_future_plays.yml 
Using /etc/ansible/ansible.cfg as config file
PLAYBOOK: max_fail_percentage_does_not_affect_future_plays.yml *****************
3 plays in max_fail_percentage_does_not_affect_future_plays.yml
PLAY [Fail on first host] ******************************************************
TASK [setup] *******************************************************************
ok: [host3]
ok: [host4]
ok: [host1]
ok: [host2]
TASK [Fail on first host] ******************************************************
task path: /root/testing/15395/max_fail_percentage_does_not_affect_future_plays.yml:4
fatal: [host1]: FAILED! => {"changed": false, "failed": true, "msg": "Halting"}
skipping: [host2] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
skipping: [host3] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
skipping: [host4] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
TASK [Will not run on first host] **********************************************
task path: /root/testing/15395/max_fail_percentage_does_not_affect_future_plays.yml:7
changed: [host4] => {"changed": true, "cmd": ["echo", "should not run on first host"], "delta": "0:00:00.003134", "end": "2016-05-18 14:10:07.782730", "rc": 0, "start": "2016-05-18 14:10:07.779596", "stderr": "", "stdout": "should not run on first host", "stdout_lines": ["should not run on first host"], "warnings": []}
changed: [host2] => {"changed": true, "cmd": ["echo", "should not run on first host"], "delta": "0:00:00.003843", "end": "2016-05-18 14:10:07.787135", "rc": 0, "start": "2016-05-18 14:10:07.783292", "stderr": "", "stdout": "should not run on first host", "stdout_lines": ["should not run on first host"], "warnings": []}
changed: [host3] => {"changed": true, "cmd": ["echo", "should not run on first host"], "delta": "0:00:00.004004", "end": "2016-05-18 14:10:07.789357", "rc": 0, "start": "2016-05-18 14:10:07.785353", "stderr": "", "stdout": "should not run on first host", "stdout_lines": ["should not run on first host"], "warnings": []}
NO MORE HOSTS LEFT *************************************************************
PLAY [Continues running] *******************************************************
TASK [setup] *******************************************************************
ok: [host2]
ok: [host4]
ok: [host3]
TASK [command] *****************************************************************
task path: /root/testing/15395/max_fail_percentage_does_not_affect_future_plays.yml:14
changed: [host3] => {"changed": true, "cmd": ["echo", "Continuing"], "delta": "0:00:00.003061", "end": "2016-05-18 14:10:11.182399", "rc": 0, "start": "2016-05-18 14:10:11.179338", "stderr": "", "stdout": "Continuing", "stdout_lines": ["Continuing"], "warnings": []}
changed: [host4] => {"changed": true, "cmd": ["echo", "Continuing"], "delta": "0:00:00.003795", "end": "2016-05-18 14:10:11.186598", "rc": 0, "start": "2016-05-18 14:10:11.182803", "stderr": "", "stdout": "Continuing", "stdout_lines": ["Continuing"], "warnings": []}
changed: [host2] => {"changed": true, "cmd": ["echo", "Continuing"], "delta": "0:00:00.003885", "end": "2016-05-18 14:10:11.187053", "rc": 0, "start": "2016-05-18 14:10:11.183168", "stderr": "", "stdout": "Continuing", "stdout_lines": ["Continuing"], "warnings": []}
NO MORE HOSTS LEFT *************************************************************
PLAY [Continues running some more] *********************************************
TASK [setup] *******************************************************************
ok: [host2]
ok: [host4]
ok: [host3]
TASK [command] *****************************************************************
task path: /root/testing/15395/max_fail_percentage_does_not_affect_future_plays.yml:19
changed: [host2] => {"changed": true, "cmd": ["echo", "Continuing"], "delta": "0:00:00.006797", "end": "2016-05-18 14:10:14.449458", "rc": 0, "start": "2016-05-18 14:10:14.442661", "stderr": "", "stdout": "Continuing", "stdout_lines": ["Continuing"], "warnings": []}
changed: [host3] => {"changed": true, "cmd": ["echo", "Continuing"], "delta": "0:00:00.007048", "end": "2016-05-18 14:10:14.465126", "rc": 0, "start": "2016-05-18 14:10:14.458078", "stderr": "", "stdout": "Continuing", "stdout_lines": ["Continuing"], "warnings": []}
changed: [host4] => {"changed": true, "cmd": ["echo", "Continuing"], "delta": "0:00:00.006773", "end": "2016-05-18 14:10:14.466279", "rc": 0, "start": "2016-05-18 14:10:14.459506", "stderr": "", "stdout": "Continuing", "stdout_lines": ["Continuing"], "warnings": []}
NO MORE HOSTS LEFT *************************************************************
PLAY RECAP *********************************************************************
host1                      : ok=1    changed=0    unreachable=0    failed=1   
host2                      : ok=6    changed=3    unreachable=0    failed=0   
host3                      : ok=6    changed=3    unreachable=0    failed=0   
host4                      : ok=6    changed=3    unreachable=0    failed=0   

@vroy
Copy link
Contributor Author

vroy commented May 19, 2016

@jimi-c Confirmed. All of my reproducers are passing against the devel branch.

Thanks again for looking into this!

@vroy vroy closed this as completed May 19, 2016
@jimi-c
Copy link
Member

jimi-c commented May 19, 2016

No problem, thank you for the excellent testing!

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 7, 2018
@ansible ansible locked and limited conversation to collaborators Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants