-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs using ssh credentials get stuck when loading passphrase #11051
Comments
We can reproduce the same issue after upgrading from 17.1.0 to 19.X.X in Kubernetes. |
I'm seeing this issue on the latest 19.X.X but it only started happening once our job ID number exceeded 10k. |
I can confirm this is also the case for our issue. I have lowered the AUTO INCREMENTATION for the jobs in the db (don't do that in production!) - After that, the issue is gone. So our issue really seems to be connected to #10489. Does anyone know of a "safe way" to remove all job runs and reset the counter as a workaround for now? |
Much of the implementation of this is in ansible-runner, like the writing to the pipe, which can be seen here: The natural speculation is that the It jumps out to me that you show Provided that these jobs were started after the migration finished, this shouldn't happen. Anywhere you see "pdd_wrapper" is a red flag to me, and suggests something stale from prior to the migration. |
Any solution to this will probably resolve #11453 as well, even if the original cause is different. |
Hello @AlanCoding , @rbicker , I have the same issue. I did the migration from local docker installation 17.x to the kubernetes version in 19.2.0 and after that all my projects can't sync anymore and are still in running state forever. I open an issue for that #11518 and in the private key of my ssh credential I don't have any passphrase. So the issue is happening even if we don't have passphrase for the credential. Moreover my jobs id are 2791... Thank you very much for your help. May be, is it related to this issue ansible/awx-operator#376 ? |
@rbicker , And in the awx-ee container where did you find the log ? I can't find anything about the running job .. and I'm asking if it's possibly linked to ansible/awx-operator#376 Moreover, for me my jobs are at 2791 so I don't think it's linked to #10489 |
This looks the same as a large number of issues open right now, and I would like to start consolidating them soon. I would favor #11518 as the primary issue. |
Closing in favor of #11518 |
Please confirm the following
Summary
After migrating our docker-compose based awx 16.0.0 installation to a kubernetes based installation using awx-operator 0.13 (while providing the old postgres database and the ansible secret as instructed in the migration guide), jobs using ssh credentials have stopped working.
In the webinterface these jobs proceed to and are stuck in "running" state forever. There is no output shown.
Using
awx-manage shell_plus
, I was able to verify that credentials can be decrypted successfully. Other credential types are working fine.I can see in the awx-ee container that the jobs seem to be stuck on ssh-add processes (like
ssh-add /tmp/pdd_wrapper_50404_25q104h6/awx_50404_br45qk20/artifacts/50404/ssh_key_data
for example). From what I can tell, the named pipe "ssh_key_data" is never receiving the passphrase which is why the job gets stuck. When I manually write the passphrase to the named pipe, the job proceeds! How do the passphrases normally get passed to the named pipes?I have tried running our migrated awx installation on minikube and k3s, I have also tried using awx 18.0.0. We are facing this issue either way.
I am not sure if #10489 is connected to this issue as our job IDs are over 50000.
AWX version
19.3.0
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
Rocky 8.4
Web browser
No response
Steps to reproduce
Unfortunately I am not sure how the issue can be reproduced as we are only facing it when migrating our docker-compose based awx 16.0.0 installation. I am happy to help troubleshooting the issue on our installation in any way.
Expected results
Jobs should start running after successfully loading ssh credentials with passphrases
Actual results
Jobs with ssh credentials that have passphrases are stuck in "running" state forever without actually starting.
Additional information
No response
The text was updated successfully, but these errors were encountered: