-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition between job cancelling and activating #6521
Comments
While writing this analysis, I changed my answer a couple of times. This shows that it was pretty difficult to determine the situation. Some parts of how job timeouts and activations work could definitely be improved. We no longer need the clean up for 2 reasons:
The indirection I experienced while looking into this is because of the cleaning up. It feels like a workaround for the 'normal' operation making it possible to leave orphaned deadlines behind in the first place. Consider again the race condition: a We should probably still do what was proposed earlier:
I also think we should write a test to make sure we don't have this race condition and then remove this clean-up logic to reduce the indirection. |
Hello, kindly consider #8949 . We have hundreds of |
I'm not 100% sure but it looks like we have seen this again on SaaS. We see several:
I think it can be reproduced with the recent game day (I think it is related to job can't be activated or not send to the client, because of to large payload. It seems to cause these issues. |
Might be also related to #12778 |
Might be that this is also me from a different cluster. |
Closed as resolved by: |
Description
This is a follow up to #2816.
The fix for #2816 was on the symptom level. It added some extra logic in
DbJobState
, including a clean up state change in case that the job could not be found:As part of #6176 the clean up was disabled to make the migration easier.
At the end of the migration, we need to check the following:
The text was updated successfully, but these errors were encountered: