Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: timeout on static node in ansibler doesn't move InputManifest to the ERROR state #1339

Closed
JKBGIT1 opened this issue Apr 18, 2024 · 2 comments
Assignees
Labels
bug Something isn't working groomed Task that everybody agrees to pass the gatekeeper

Comments

@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Apr 18, 2024

Current Behaviour

ansibler failed due to the timeout on the static node. These error logs also appeared in the builder and the workflow didn't move forward, however, the InputManifest didn't end up in an ERROR state (it was in a DONE state).
AFAIK the InputManifest state wasn't updated manually in Mongo (see)

builder logs

2024-04-10T13:06:00Z INF Calling InstallVPN on Ansibler cluster=wox01-cluster-qy5w5zl module=builder project=default-wox01
2024-04-10T13:42:11Z ERR Failed to build cluster error="error in Ansibler for cluster wox01-cluster project default-wox01 : error while calling InstallVPN on Ansibler: rpc error: code = Unknown desc = error encountered while installing VPN for cluster wox01-cluster project default-wox01 : error while running ansible for services/ansibler/server/clusters/wox01-cluster-qy5w5zl-l2ddj12 : exit status 2:\n\tdb-us-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus01-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus02-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus03-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out" cluster=wox01-cluster module=builder
2024-04-10T13:42:11Z ERR Error encountered while processing config error="error in Ansibler for cluster wox01-cluster project default-wox01 : error while calling InstallVPN on Ansibler: rpc error: code = Unknown desc = error encountered while installing VPN for cluster wox01-cluster project default-wox01 : error while running ansible for services/ansibler/server/clusters/wox01-cluster-qy5w5zl-l2ddj12 : exit status 2:\n\tdb-us-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus01-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus02-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus03-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out" module=builder project=default-wox01
ansibler logs

2024-04-10T13:29:11Z WRN Retrying command ansible-playbook ../../ansible-playbooks/wireguard.yml -i inventory.ini -f 15 ... (4/5) module=ansibler
2024-04-10T13:34:21Z WRN Error encountered while executing ansible-playbook ../../ansible-playbooks/wireguard.yml -i inventory.ini -f 15  : exit status 2 module=ansibler
2024-04-10T13:34:21Z ERR failed to execute cmd: ansible-playbook ../../ansible-playbooks/wireguard.yml -i inventory.ini -f 15 : 
        db-us-1: failed
        task: Wait 300 seconds for target connection to become reachable/usable
        summary: timed out waiting for ping module test: timed out

        us03-1: failed
        task: Wait 300 seconds for target connection to become reachable/usable
        summary: timed out waiting for ping module test: timed out

        us02-1: failed
        task: Wait 300 seconds for target connection to become reachable/usable
        summary: timed out waiting for ping module test: timed out

        us01-1: failed
        task: Wait 300 seconds for target connection to become reachable/usable
        summary: timed out waiting for ping module test: timed out module=ansibler
2024-04-10T13:34:21Z INF Next retry in 160s... module=ansibler
2024-04-10T13:37:01Z WRN Retrying command ansible-playbook ../../ansible-playbooks/wireguard.yml -i inventory.ini -f 15 ... (5/5) module=ansibler
2024-04-10T13:42:11Z WRN Error encountered while executing ansible-playbook ../../ansible-playbooks/wireguard.yml -i inventory.ini -f 15  : exit status 2 module=ansibler
2024-04-10T13:42:11Z ERR Command ansible-playbook ../../ansible-playbooks/wireguard.yml -i inventory.ini -f 15  was not successful after 5 retries module=ansibler
2024-04-10T13:42:11Z ERR Error encountered while installing VPN error="error while running ansible for services/ansibler/server/clusters/wox01-cluster-qy5w5zl-l2ddj12 : exit status 2:\n\tdb-us-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus01-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus02-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out\n\n\tus03-1: failed\n\ttask: Wait 300 seconds for target connection to become reachable/usable\n\tsummary: timed out waiting for ping module test: timed out" cluster=wox01-cluster module=ansibler project=default-wox01

Expected Behaviour

InputManifest should end up in an ERROR state.

Steps To Reproduce

  1. Apply an InputManifest with static nodes to which ansibler won't connect.
@JKBGIT1 JKBGIT1 added the bug Something isn't working label Apr 18, 2024
@Despire Despire added the groomed Task that everybody agrees to pass the gatekeeper label Apr 19, 2024
@Despire Despire self-assigned this Apr 29, 2024
@Despire
Copy link
Contributor

Despire commented Apr 29, 2024

I was not able to replicate this issue at all. The input manifests always ends up with error state in DB. The only option is that a new workflow was run after the error which ended up sucessful.

@bernardhalas
Copy link
Member

Unable to reproduce.

@bernardhalas bernardhalas closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working groomed Task that everybody agrees to pass the gatekeeper
Projects
None yet
Development

No branches or pull requests

3 participants