Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V2V] Skip the infra conversion job if it's already in a running state #19255

Closed
wants to merge 2 commits into from
Closed

[V2V] Skip the infra conversion job if it's already in a running state #19255

wants to merge 2 commits into from

Conversation

djberg96
Copy link
Contributor

@djberg96 djberg96 commented Sep 4, 2019

There appears to be a race condition where a job can already be in a running state by the time we try to start it. The result is that we see this in the log:

start is not permitted at state running

This PR just skips the job if it's already in a running state. I've also added the job ID to the error message for easier debugging.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1724040

@djberg96
Copy link
Contributor Author

djberg96 commented Sep 4, 2019

@fdupont-redhat What do you think?

@miq-bot
Copy link
Member

miq-bot commented Sep 4, 2019

Checked commits https://github.com/djberg96/manageiq/compare/f130bcc4736e06773986237364e33c33df37e12e~...b7769b465690ee6a232719c606213aae043ada0b with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 1 offense detected

lib/infra_conversion_throttler.rb

@ghost
Copy link

ghost commented Sep 4, 2019

@djberg96 why not synchronously signaling the job?

job.signal(:start)

@djberg96
Copy link
Contributor Author

djberg96 commented Sep 4, 2019

@fdupont-redhat wouldn't that significantly slow down the process?

@ghost
Copy link

ghost commented Sep 4, 2019

@djberg96 not sure. It only synchronously executes InfraConversionJob.start method, that then queues the next transition. And that method only does 2 others calls, but they update the DB, so it might be costly :)

@djberg96
Copy link
Contributor Author

djberg96 commented Sep 4, 2019

@fdupont-redhat I had Ilanit restart an appliance and run a migration of 20 VM's using my change, and there were none of those errors.

@djberg96
Copy link
Contributor Author

@miq-bot add_reviewer @agrare

@miq-bot miq-bot requested a review from agrare September 10, 2019 13:28
@agrare
Copy link
Member

agrare commented Sep 12, 2019

@djberg96 is this a race condition because some other process is starting these conversion jobs at the same time as this is, or because this started some but they weren't marked as started yet?

@djberg96
Copy link
Contributor Author

@agrare I'm afraid I never found out before switching over to the Platform team. @fdupont-redhat did you ever dig into this by chance?

@fdupont-redhat should I close this and let you take it?

@ghost
Copy link

ghost commented Nov 12, 2019

@agrare, my preferred assumption is that it is caused by async queue_signal that allows the InfraConversionThrottler.pending_conversion_jobs to retrieve jobs that are transitioning, i.e. still in waiting_to_start state in the DB, but the signal has been queued and will be honored before the next call to queue_signal(:start).

@agrare
Copy link
Member

agrare commented Nov 20, 2019

But without a .reload it will use the cached job so I doubt this would fix it if a job changed between the query and that main loop.

@djberg96
Copy link
Contributor Author

djberg96 commented Dec 5, 2019

@agrare @fdupont-redhat What's your suggestion then for this? Add an explicit reload somewhere? Or should I just close this?

@agrare
Copy link
Member

agrare commented Dec 5, 2019

I think we need to find out how this is happening, the throttler should only be running on one appliance (right?) so jobs shouldn't be getting started out from under us.

@djberg96
Copy link
Contributor Author

Going to close this since I'm not doing V2V any more and Fabien is better equipped to figure this one out, and it's his BZ now.

@djberg96 djberg96 closed this Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants