New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V2V] Skip the infra conversion job if it's already in a running state #19255
Conversation
@fdupont-redhat What do you think? |
Checked commits https://github.com/djberg96/manageiq/compare/f130bcc4736e06773986237364e33c33df37e12e~...b7769b465690ee6a232719c606213aae043ada0b with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 lib/infra_conversion_throttler.rb
|
@djberg96 why not synchronously signaling the job?
|
@fdupont-redhat wouldn't that significantly slow down the process? |
@djberg96 not sure. It only synchronously executes InfraConversionJob.start method, that then queues the next transition. And that method only does 2 others calls, but they update the DB, so it might be costly :) |
@fdupont-redhat I had Ilanit restart an appliance and run a migration of 20 VM's using my change, and there were none of those errors. |
@djberg96 is this a race condition because some other process is starting these conversion jobs at the same time as this is, or because this started some but they weren't marked as started yet? |
@agrare I'm afraid I never found out before switching over to the Platform team. @fdupont-redhat did you ever dig into this by chance? @fdupont-redhat should I close this and let you take it? |
@agrare, my preferred assumption is that it is caused by async |
But without a |
@agrare @fdupont-redhat What's your suggestion then for this? Add an explicit reload somewhere? Or should I just close this? |
I think we need to find out how this is happening, the throttler should only be running on one appliance (right?) so jobs shouldn't be getting started out from under us. |
Going to close this since I'm not doing V2V any more and Fabien is better equipped to figure this one out, and it's his BZ now. |
There appears to be a race condition where a job can already be in a running state by the time we try to start it. The result is that we see this in the log:
start is not permitted at state running
This PR just skips the job if it's already in a running state. I've also added the job ID to the error message for easier debugging.
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1724040