-
Notifications
You must be signed in to change notification settings - Fork 899
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Heartbeat Thread to SmartProxy Worker
In order to fix an issue where long-running Smartstate jobs get killed under the mistaken assumption that they are being unresponsive when they are actually quite busy, a separate thread is being added to the SmartProxy Worker which just heartbeats every 30 seconds. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1519538
- Loading branch information
Showing
2 changed files
with
68 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,69 @@ | ||
class MiqSmartProxyWorker::Runner < MiqQueueWorkerBase::Runner | ||
self.delay_startup_for_vim_broker = true # NOTE: For smartproxy role | ||
|
||
def do_before_work_loop | ||
@tid = start_heartbeat_thread | ||
end | ||
|
||
def before_exit(message, _exit_code) | ||
@exit_requested = true | ||
# | ||
# Stop the Heartbeat Thread | ||
# | ||
safe_log("#{message} Stopping Heartbeat Thread.") | ||
|
||
# | ||
# Wait for the Heartbeat Thread to stop | ||
# | ||
unless @tid.nil? | ||
safe_log("#{message} Waiting for Heartbeat Thread to Stop.") | ||
@tid.join(worker_settings[:heartbeat_thread_shutdown_timeout]) rescue nil | ||
end | ||
end | ||
|
||
def start_heartbeat_thread | ||
@exit_requested = false | ||
@heartbeat_started = Concurrent::Event.new | ||
_log.info("#{log_prefix} Starting Heartbeat Thread") | ||
|
||
tid = Thread.new do | ||
begin | ||
heartbeat_thread | ||
rescue => err | ||
_log.error("#{log_prefix} Heartbeat Thread aborted because [#{err.message}]") | ||
_log.log_backtrace(err) | ||
Thread.exit | ||
ensure | ||
@heartbeat_started.set | ||
end | ||
end | ||
|
||
@heartbeat_started.wait | ||
_log.info("#{log_prefix} Started Heartbeat Thread") | ||
|
||
tid | ||
end | ||
|
||
def heartbeat_thread | ||
@heartbeat_started.set | ||
until @exit_requested do | ||
heartbeat | ||
sleep 30 | ||
end | ||
end | ||
|
||
def do_work | ||
if @tid.nil? || !@tid.alive? | ||
if !@tid.nil? && @tid.status.nil? | ||
dead_tid, @tid = @tid, nil | ||
_log.info("#{log_prefix} Waiting for the Heartbeat Thread to exit...") | ||
dead_tid.join # raise the exception the dead thread failed with | ||
end | ||
|
||
_log.info("#{log_prefix} Heartbeat Thread gone. Restarting...") | ||
@tid = start_heartbeat_thread | ||
end | ||
|
||
super | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters