Result_traceback should not include job stdout #12961

fosterseth · 2022-09-27T15:23:41Z

related #12644

SUMMARY

If a job fails, we do receptor work results and put that output into result_traceback.

We should only do this if

Receptor unit has failed
Runner callback processed 0 events

Otherwise we risk putting too much data into this field.

Would require this receptor change ansible/receptor#672

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

AWX VERSION

awx: 21.5.1.dev221+gaeb614e45d

awx/main/tasks/receptor.py

shanemcd · 2022-09-27T15:43:31Z

awx/main/tasks/receptor.py

+                        # massive, only ask for last 1000 bytes
+                        startpos = max(stdout_size - 1000, 0)
+                        resultsock = receptor_ctl.get_work_results(self.unit_id, startpos=startpos, return_sockfile=True)
+                        lines = resultsock.readlines()


I'm worried about this hanging if the stream doesn't end with a \n. We may need to put the socket in non-blocking mode before doing this and just read the raw bytes.

I did some local testing and readlines() seems to be okay with files that don't end in \n. If it gets an EOF it just stops there.

we could also just do read() or read(1000)

as for non-blocking I added resultsock.setblocking(False). Hard to test this exactly, but I will make sure this codepath doesn't throw errors

AlanCoding

@shanemcd's comment may be a valid concern, but this doesn't seem to make it any worse than it was before in that respect, and should prevent some other disaster cases.

AlanCoding · 2022-10-05T19:08:35Z

awx/main/tasks/receptor.py

+                        # contain useful information about why the job failed. In case stdout is
+                        # massive, only ask for last 1000 bytes
+                        startpos = max(stdout_size - 1000, 0)
+                        resultsock, resultfile = receptor_ctl.get_work_results(self.unit_id, startpos=startpos, return_socket=True, return_sockfile=True)


What's are the version compatibility issues with this? Does this requires an updated receptor, and does that make it unreasonable to backport? Is there any idea to work on a different fix for backports?

the receptor change is minimal, so backporting shouldn't be too hard.

a more minimal version of this that doesn't require a receptor change is to not do get_work_results at all if stdout size > 1000.

I see, so we could merge that and develop that solution you mention and test it separately. Fine by me.

AlanCoding · 2022-12-20T21:37:35Z

I really want to get this merged!

If a job fails, we do receptor work results and put that output into result_traceback. We should only do this if 1. Receptor unit has failed 2. Runner callback processed 0 events Otherwise we risk putting too much data into this field.

github-actions bot added the component:api label Sep 27, 2022

AlanCoding reviewed Sep 27, 2022

View reviewed changes

awx/main/tasks/receptor.py Outdated Show resolved Hide resolved

shanemcd reviewed Sep 27, 2022

View reviewed changes

fosterseth force-pushed the fix_results_traceback branch from aeb614e to 94410bf Compare September 27, 2022 16:43

AlanCoding approved these changes Sep 27, 2022

View reviewed changes

fosterseth force-pushed the fix_results_traceback branch 2 times, most recently from dd8458a to f6cffa7 Compare September 28, 2022 04:51

AlanCoding reviewed Oct 5, 2022

View reviewed changes

fosterseth force-pushed the fix_results_traceback branch from f6cffa7 to 7d7458f Compare December 9, 2022 21:33

fosterseth mentioned this pull request Dec 21, 2022

AWX jobs can't tolerate the K8s master nodes restart or termination #13350

Closed

9 tasks

Result_traceback should not include job stdout

36c0d07

If a job fails, we do receptor work results and put that output into result_traceback. We should only do this if 1. Receptor unit has failed 2. Runner callback processed 0 events Otherwise we risk putting too much data into this field.

fosterseth mentioned this pull request Dec 21, 2022

bump receptorctl version to 1.3.0 #13366

Merged

fosterseth force-pushed the fix_results_traceback branch from 7d7458f to 36c0d07 Compare December 21, 2022 18:49

fosterseth merged commit ee509ae into ansible:devel Jan 3, 2023

AlanCoding mentioned this pull request Jan 31, 2023

Regression in error handling when ansible-runner install fails in traceback #13494

Closed

fosterseth mentioned this pull request Mar 15, 2023

Job failed due to log size limit was reached and can't retrieve the failed job output. #13680

Open

9 tasks

AlanCoding mentioned this pull request Apr 3, 2023

Consume job_explanation from runner, fix error reporting error #13482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result_traceback should not include job stdout #12961

Result_traceback should not include job stdout #12961

fosterseth commented Sep 27, 2022 •

edited

shanemcd Sep 27, 2022

fosterseth Sep 27, 2022

AlanCoding left a comment

AlanCoding Oct 5, 2022

fosterseth Oct 6, 2022

AlanCoding Oct 6, 2022

AlanCoding commented Dec 20, 2022

Result_traceback should not include job stdout #12961

Result_traceback should not include job stdout #12961

Conversation

fosterseth commented Sep 27, 2022 • edited

SUMMARY

ISSUE TYPE

COMPONENT NAME

AWX VERSION

shanemcd Sep 27, 2022

Choose a reason for hiding this comment

fosterseth Sep 27, 2022

Choose a reason for hiding this comment

AlanCoding left a comment

Choose a reason for hiding this comment

AlanCoding Oct 5, 2022

Choose a reason for hiding this comment

fosterseth Oct 6, 2022

Choose a reason for hiding this comment

AlanCoding Oct 6, 2022

Choose a reason for hiding this comment

AlanCoding commented Dec 20, 2022

fosterseth commented Sep 27, 2022 •

edited