-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add try/except on release of work Unit and add force to workunit reaper #15129
base: devel
Are you sure you want to change the base?
Add try/except on release of work Unit and add force to workunit reaper #15129
Conversation
hi @fosterseth , |
@fosterseth with external and unstable execution node (i rebooted it multiple times) this happes one mor time, and with my last commint I should cover all things |
so this case happen when the work unit does not exist on remote execution node i think the receptor should handle this more gracefully. when it confirm that the work unit doesn't exist on remote node just terminate the local work unit? @tanganellilore lets bring this to ansible/receptor? |
or maybe just do this on specific exception on the awx side? do this on exception all the time might have too large of a blast radius |
I think that if this type of release will be managed by receptor, on awx side need to be managed in some way. I understood from @fosterseth that this is wanted in some way by receptor, that raise error when workunit not exists also with force-release. FROM my tests, i enter on exception only when there are issue beetween awx and execution node and/or execution node will be rebooted not gracefully. |
SUMMARY
In case we have some issue beetween execution node and AWX, and AWX will not catch that execution node is not working well or nor reachave or simply delete workunit (I don't identify exactly the use case but appen to me in 24.2.0 with execution node and ansible runne 1.4.3), workflow still wait the running state.
if we try to cancel the job/workflow via UI, we receive error below on awx-task pod and job never cancelled/stopped.
In thi PR i simply try/except the for cycle and demand the release to workunit reaper, where I put the
force-release
command instead of simplerelease
.I think that we need to force the release inside the for-cycle, because administrative_workunit_reaper check a lot of things on work unit side, that to me is not much sense because we already filter by ACTIVE_STATES on UnifiedJob filter.
If this is true, i can change it adding a force-relase command on exception in that way we are shure that works will be relased when cancel will be clicked on UI.
ISSUE TYPE
COMPONENT NAME
AWX VERSION
ADDITIONAL INFORMATION