[V2V] Skip the infra conversion job if it's already in a running state #19255

djberg96 · 2019-09-04T13:53:04Z

There appears to be a race condition where a job can already be in a running state by the time we try to start it. The result is that we see this in the log:

start is not permitted at state running

This PR just skips the job if it's already in a running state. I've also added the job ID to the error message for easier debugging.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1724040

djberg96 · 2019-09-04T13:55:35Z

@fdupont-redhat What do you think?

miq-bot · 2019-09-04T14:02:10Z

Checked commits https://github.com/djberg96/manageiq/compare/f130bcc4736e06773986237364e33c33df37e12e~...b7769b465690ee6a232719c606213aae043ada0b with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 1 offense detected

lib/infra_conversion_throttler.rb

❗ - Line 14, Col 9 - Layout/EmptyLineAfterGuardClause - Add empty line after guard clause.

ghost · 2019-09-04T14:30:29Z

@djberg96 why not synchronously signaling the job?

job.signal(:start)

djberg96 · 2019-09-04T16:04:02Z

@fdupont-redhat wouldn't that significantly slow down the process?

ghost · 2019-09-04T16:17:18Z

@djberg96 not sure. It only synchronously executes InfraConversionJob.start method, that then queues the next transition. And that method only does 2 others calls, but they update the DB, so it might be costly :)

djberg96 · 2019-09-04T18:42:36Z

@fdupont-redhat I had Ilanit restart an appliance and run a migration of 20 VM's using my change, and there were none of those errors.

djberg96 · 2019-09-10T13:25:54Z

@miq-bot add_reviewer @agrare

agrare · 2019-09-12T12:32:53Z

@djberg96 is this a race condition because some other process is starting these conversion jobs at the same time as this is, or because this started some but they weren't marked as started yet?

djberg96 · 2019-11-11T13:34:37Z

@agrare I'm afraid I never found out before switching over to the Platform team. @fdupont-redhat did you ever dig into this by chance?

@fdupont-redhat should I close this and let you take it?

ghost · 2019-11-12T08:02:15Z

@agrare, my preferred assumption is that it is caused by async queue_signal that allows the InfraConversionThrottler.pending_conversion_jobs to retrieve jobs that are transitioning, i.e. still in waiting_to_start state in the DB, but the signal has been queued and will be honored before the next call to queue_signal(:start).

agrare · 2019-11-20T18:02:57Z

But without a .reload it will use the cached job so I doubt this would fix it if a job changed between the query and that main loop.

djberg96 · 2019-12-05T17:53:31Z

@agrare @fdupont-redhat What's your suggestion then for this? Add an explicit reload somewhere? Or should I just close this?

agrare · 2019-12-05T18:15:06Z

I think we need to find out how this is happening, the throttler should only be running on one appliance (right?) so jobs shouldn't be getting started out from under us.

djberg96 · 2019-12-19T13:39:57Z

Going to close this since I'm not doing V2V any more and Fabien is better equipped to figure this one out, and it's his BZ now.

djberg96 added 2 commits September 4, 2019 09:24

Skip the infra conversion job if it's already in a running state.

f130bcc

Add job ID to the signal error message.

b7769b4

ghost approved these changes Sep 4, 2019

View reviewed changes

miq-bot requested a review from agrare September 10, 2019 13:28

djberg96 closed this Dec 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V2V] Skip the infra conversion job if it's already in a running state #19255

[V2V] Skip the infra conversion job if it's already in a running state #19255

djberg96 commented Sep 4, 2019 •

edited

djberg96 commented Sep 4, 2019

miq-bot commented Sep 4, 2019

ghost commented Sep 4, 2019 •

edited by ghost

djberg96 commented Sep 4, 2019

ghost commented Sep 4, 2019

djberg96 commented Sep 4, 2019

djberg96 commented Sep 10, 2019

agrare commented Sep 12, 2019

djberg96 commented Nov 11, 2019

ghost commented Nov 12, 2019

agrare commented Nov 20, 2019

djberg96 commented Dec 5, 2019

agrare commented Dec 5, 2019

djberg96 commented Dec 19, 2019

[V2V] Skip the infra conversion job if it's already in a running state #19255

[V2V] Skip the infra conversion job if it's already in a running state #19255

Conversation

djberg96 commented Sep 4, 2019 • edited

djberg96 commented Sep 4, 2019

miq-bot commented Sep 4, 2019

ghost commented Sep 4, 2019 • edited by ghost

djberg96 commented Sep 4, 2019

ghost commented Sep 4, 2019

djberg96 commented Sep 4, 2019

djberg96 commented Sep 10, 2019

agrare commented Sep 12, 2019

djberg96 commented Nov 11, 2019

ghost commented Nov 12, 2019

agrare commented Nov 20, 2019

djberg96 commented Dec 5, 2019

agrare commented Dec 5, 2019

djberg96 commented Dec 19, 2019

djberg96 commented Sep 4, 2019 •

edited

ghost commented Sep 4, 2019 •

edited by ghost