Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing trigger for failed-to-start nodes #13802

Merged
merged 1 commit into from Jul 24, 2023

Conversation

AlanCoding
Copy link
Member

SUMMARY

Connect #2766

For the same scenario, the numbers with this patch are:

Started 4/5/2023, 8:40:01 AM
Finished 4/5/2023, 8:40:02 AM

So this is 1 second, compared to 50 seconds before the patch.

Looking at the code, I tried to ask the deep question of what exact criteria is missing the trigger here. I believe it's the scenario where the start checks failed. In that case, we are not waiting for the job to finish (because the job never starts to begin with), so we will run out the timer for the workflow manager scheduler if we don't re-schedule right away. That's the reason for the 50 seconds we were hitting before.

This is a very simple patch, and I don't see any risks of over-scheduling. Spawning, and failing to start a node, is a processing action which corresponds to completion of a node. In the general sense, we do need to worry about infinite scheduling loops. As long as our scheduling corresponds to a tangible and finite form of progress for processing jobs, this shouldn't happen.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

@AlanCoding AlanCoding merged commit 98bfe3f into ansible:devel Jul 24, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants