Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The running ansible process received a shutdown signal. #15245

Open
5 of 11 tasks
Peter1295 opened this issue Jun 3, 2024 · 5 comments
Open
5 of 11 tasks

The running ansible process received a shutdown signal. #15245

Peter1295 opened this issue Jun 3, 2024 · 5 comments

Comments

@Peter1295
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)

Bug Summary

Random crashes with message The running ansible process received a shutdown signal.
After the crash awx-task pod what was running job disappear from the Instances but pod is still running in the cluster.

Attaching logs from ArgoCD awx-task.txt

AWX version

AWX 24.4.0

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

v2.17.0

Operating system

k8s cluster on OL9

Web browser

Firefox, Chrome, Edge

Steps to reproduce

Created playbook with 6 5min pause commands and run template.

Expected results

Finish template in 30mins.

Actual results

Failed within 10 minutes with error The running ansible process received a shutdown signal.

Additional information

Issue behaves the same like on #14948 but that should be resolved by version 23.8.1 and I am using newest version of AWX.

@TheRealHaoLiu
Copy link
Member

TheRealHaoLiu commented Jun 12, 2024

Random crashes with message The running ansible process received a shutdown signal.

where are u seeing this? please provide some context

currently we do not have enough information to understand what is happening here.

@Peter1295
Copy link
Author

AWX Template fails with that message. Time is random, mostly between 7-12mins of job running, I can see it happens with a jobs what are doing changes on multiple hosts (patching, VM customization etc.).

Unfortunately awx-task logs do not show anything helpful, just a message job/workflow failed.
Workflow job 18542 failed due to reason: No error handling path for workflow job node(s) [(26156,failed)]. Workflow job node(s) missing unified job template and error handling path [].

Cluster is running on k8s with setup of 2 Control planes and 4 worker nodes, where maximum CPU and Memory usage based on command kubectl top node is around 20%/80% (CPU/MEM) and all nodes have at least 40% free disk space.

AWX database is running on external Postgres server.

@Peter1295
Copy link
Author

Attaching logs from automation pod what failed in the middle. Absolutely no info what is happening, not from awx-task, awx-web nor automation pod. Any suggestion what to look for?
task3.log

AWX is really needed for us, we are using it for managing, deploying, patching etc. on daily basis, it is running at least 50 templates a day and I cannot be permanently connected on it to check if it's still working.
We have another instance on production environment, where we still run 23.3.1 what is running properly, but unfortunately downgrade is not working anymore, it cannot use upgraded database.

@Peter1295
Copy link
Author

Another update, issue is not version related, I was able to downgrade AWX with to version 23.8.1 (what should not have such problems).
Issue is not even with database, where I used both actual and older postgres from before migration.
Sometimes it fails in 5min, sometimes job run for almost 1h.

@Peter1295
Copy link
Author

Issue persist also on 24.6.0.
Kubernetes logs shows only info about successful shutdown of automation pod, not what and why it is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants