Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job goes into error state, but finishes successfully in the background #10889

Closed
3 tasks done
lukasertl opened this issue Aug 16, 2021 · 2 comments
Closed
3 tasks done

Comments

@lukasertl
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I am not entitled to status updates or other assurances.

Summary

With a fairly complex playbook that produces a rather large amount of output (11000+ lines) the job output stops updating at a random point (but usually above line number 10000), although the job is still in the "running" state, and is also executing tasks successfully, as confirmed by the pod logs in OpenShift.

At some point the job finishes and goes into "error" state, but no indication what error occurred or where (job output is still not updated).

Checking the OpenShift logs of the automation-job pod I can see that the playbook itself finished without error.

In the awx-task pod logs I can find these messages:

2021-08-16 12:33:49,737 DEBUG    [ba396ebbe875409e82f26174b109d351] awx.main.tasks job 320 (running) finished running, producing 20346 events.
2021-08-16 12:33:49,751 DEBUG    [ba396ebbe875409e82f26174b109d351] awx.analytics.job_lifecycle job-320 post run
2021-08-16 12:33:49,975 DEBUG    [ba396ebbe875409e82f26174b109d351] awx.analytics.job_lifecycle job-320 finalize run
2021-08-16 12:33:49,976 DEBUG    [ba396ebbe875409e82f26174b109d351] awx.analytics.job_lifecycle job-320 finish job fact cache
[...]
2021-08-16 12:33:58,454 WARNING  [ba396ebbe875409e82f26174b109d351] awx.main.dispatch job 320 (error) encountered an error (rc=None), please see task stdout for details.

If I remove several tasks from the playbook to reduce overall output the job finishes successfully.

AWX version

19.3

Installation method

openshift

Modifications

yes

Ansible version

No response

Operating system

No response

Web browser

Chrome

Steps to reproduce

Run a playbook that produces lots of output events.

Expected results

Job output should no be truncated.
Job should not go into error state.

Actual results

Job output is truncated.
Job goes into error state.

Additional information

Modified Exexution Environment (built upon AWX-EE 0.6.0)

@coudenysj
Copy link

I can confirm this behaviour on AWX 19.2.2. We have even larger outputs (last failure stopped at 58000+ lines).

The AWX UI just show "Status: Error", but in the logging we see:

2021-08-17 12:16:39,681 DEBUG    [0cc9b41e49cd4ac6830e99efac57cff5] awx.main.tasks job 40102 (error) is no longer active, reaping orphaned k8s pod
2021-08-17 12:16:42,884 WARNING  [0825ada3842f4e64b4b5e579829d34ce] awx.main.dispatch job 40102 (error) encountered an error (rc=None), please see task stdout for details.
2021-08-17 12:16:42,887 DEBUG    [0825ada3842f4e64b4b5e579829d34ce] awx.main.dispatch task 7073b9f6-57f0-41bf-9f41-24ca41f55f4a starting awx.main.tasks.handle_work_error(*['7073b9f6-57f0-41bf-9f41-24ca41f55f4a'])
2021-08-17 12:16:42,888 DEBUG    [0825ada3842f4e64b4b5e579829d34ce] awx.main.tasks Executing error task id 7073b9f6-57f0-41bf-9f41-24ca41f55f4a, subtasks: [{'type': 'job', 'id': 40102}]

@shanemcd
Copy link
Member

Duplicate of issue #9961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants