-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing job output and log lines #14003
Comments
We're seeing the same thing on 22.2.0 on a new k8s multi-node cluster. (We did not see this issue on an older 21.7.0, single-node k8s. Not sure what the critical difference is.) I do see the output in the automation-job pod log, but various errors in the awx-ee log (at the time the job ends):
|
What distro of k8s are y'all using? Are you running jobs in the control plane's namespace or externally via a container group? |
We're using k8s 1.26.1 via kubeadm (on-prem). awx is deployed via awx-operator 2.1.0 with helm without anything special, jobs in the "default" container group instance (not control plane), AWX EE. |
I'm seeing this, at least, on |
I can't remember if it was under |
For the output that I began this issue with, it happened under the |
Hrm, when using |
At first I thought this was exclusively a ui issue but since you indicated that the downloaded output was also missing the same lines I'm going to flip this over to api. Downloaded output comes straight through the api. |
@mamercad Sorry for the confusion here. By default AWX will run pods in the same namespace the control plane is running. The "default" container group is using the k8s api to run jobs in the same namespace AWX is running. The instance group "controlplane" forces certain types of tasks (like project updates) to run within the AWX pod rather than as an external pod. We also support running jobs in remote Kubernetes clusters using the container group mechanism. Based on your comments and image here it looks like you're just running things in the local cluster, which is what I was curious about. |
Ah, okay. Yes, I'm running everything in the same cluster, and, in the same namespace. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
For the new "Job terminated due to error" that I'm seeing on 22.3.0, I opened #14057. |
I'm testing this version, 21.11.14, with the resolution in #14057 to see if they do the trick for this version as well. |
Seems good, going to close this (10k lines worked just fine): ❯ wc -l ~/Downloads/job_126.txt
10019 /Users/mark/Downloads/job_126.txt |
Please confirm the following
security@ansible.com
instead.)Bug Summary
I've run into this rather strange situation where I'm missing output from jobs. In an attempt to reproduce it, I created a relatively simple Ansible playbook which runs for a few hours and generates a few thousands of lines of output. For the most recent test, it ran for about 4 hours and 20 minutes and the playbook simply counted to 40,000.
The output in the UI and when downloaded looks like this (abbreviated):
Notice the huge swath of missing lines.
The playbook that it's running is quite simple:
AWX version
21.11.14
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
2.9
Operating system
Linux
Web browser
Chrome
Steps to reproduce
This should be simple to reproduce, it's a very simple playbook.
Expected results
That all of the output is there.
Actual results
Missing many thousands of lines.
Additional information
No response
The text was updated successfully, but these errors were encountered: