Skip to content

Commit

Permalink
Add Heartbeat Thread to SmartProxy Worker
Browse files Browse the repository at this point in the history
In order to fix an issue where long-running Smartstate jobs get killed
under the mistaken assumption that they are being unresponsive when they
are actually quite busy, a separate thread is being added to the SmartProxy Worker
which just heartbeats every 30 seconds.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1519538
  • Loading branch information
jerryk55 committed Dec 21, 2017
1 parent bb9f922 commit 906ed99
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 1 deletion.
66 changes: 66 additions & 0 deletions app/models/miq_smart_proxy_worker/runner.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,69 @@
class MiqSmartProxyWorker::Runner < MiqQueueWorkerBase::Runner
self.delay_startup_for_vim_broker = true # NOTE: For smartproxy role

def do_before_work_loop
@tid = start_heartbeat_thread
end

def before_exit(message, _exit_code)
@exit_requested = true
#
# Stop the Heartbeat Thread
#
safe_log("#{message} Stopping Heartbeat Thread.")

#
# Wait for the Heartbeat Thread to stop
#
unless @tid.nil?
safe_log("#{message} Waiting for Heartbeat Thread to Stop.")
@tid.join(worker_settings[:heartbeat_thread_shutdown_timeout]) rescue nil
end
end

def start_heartbeat_thread
@exit_requested = false
@heartbeat_started = Concurrent::Event.new
_log.info("#{log_prefix} Starting Heartbeat Thread")

tid = Thread.new do
begin
heartbeat_thread
rescue => err
_log.error("#{log_prefix} Heartbeat Thread aborted because [#{err.message}]")
_log.log_backtrace(err)
Thread.exit
ensure
@heartbeat_started.set
end
end

@heartbeat_started.wait
_log.info("#{log_prefix} Started Heartbeat Thread")

tid
end

def heartbeat_thread
@heartbeat_started.set
until @exit_requested do
heartbeat
sleep 30
end
end

def do_work
if @tid.nil? || !@tid.alive?
if !@tid.nil? && @tid.status.nil?
dead_tid, @tid = @tid, nil
_log.info("#{log_prefix} Waiting for the Heartbeat Thread to exit...")
dead_tid.join # raise the exception the dead thread failed with
end

_log.info("#{log_prefix} Heartbeat Thread gone. Restarting...")
@tid = start_heartbeat_thread
end

super
end
end
3 changes: 2 additions & 1 deletion config/settings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -972,7 +972,7 @@
:prefetch_stale_threshold: 30.seconds
:rails_server: puma
:remote_console_type: VMRC
:role: database_operations,event,reporting,scheduler,smartstate,ems_operations,ems_inventory,user_interface,websocket,web_services,automate
:role: database_operations,event,reporting,scheduler,smartstate,ems_operations,ems_inventory,user_interface,websocket,web_services,automate,smartproxy
:server_dequeue_frequency: 5.seconds
:server_log_frequency: 5.minutes
:server_log_timings_threshold: 1.second
Expand Down Expand Up @@ -1165,6 +1165,7 @@
:memory_threshold: 2.gigabytes
:queue_timeout: 20.minutes
:restart_interval: 6.hours
:heartbeat_thread_shutdown_timeout: 10.seconds
:schedule_worker:
:container_entities_purge_interval: 1.day
:binary_blob_purge_interval: 1.hour
Expand Down

0 comments on commit 906ed99

Please sign in to comment.