Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upKill workers that don't stop after a configurable time #13805
Conversation
@@ -99,6 +99,12 @@ def check_not_responding(class_name = nil) | |||
processed_workers.collect(&:id) | |||
end | |||
|
|||
NOT_RESPONDING = :not_responding | |||
MEMORY_EXCEEDED = :memory_exceeded |
This comment has been minimized.
This comment has been minimized.
carbonin
Feb 8, 2017
Member
I feel like this information belongs somewhere else. Maybe MiqServer::WorkerManagement::Monitor::Reason
?
Then ideally callers of worker_set_monitor_reason
will also use the same constants. Not sure if making that kind of change is in this PR's scope though.
|
This comment has been minimized.
This comment has been minimized.
Will merge after @jrafanie makes a couple of small changes. |
Previously, we'd gracefully ask them to exit and if the queue work they're doing, takes 1 hour to do, they'd exceed memory thresholds, keep running until the work is done and finally respond to the exit request. Now, we mark them as 'stopping' when they exceed a threshold and they have up to 10 minutes to finish before we'd kill them. This value is configurable in the 'stopping_timeout' field in each worker's advanced settings. https://bugzilla.redhat.com/show_bug.cgi?id=1395736
This comment has been minimized.
This comment has been minimized.
Checked commits jrafanie/manageiq@e5f4bd3~...b60a5f0 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 spec/models/miq_server/worker_management/monitor_spec.rb
|
This comment has been minimized.
This comment has been minimized.
Ok, I think I got your good suggestion in. Take another look @carbonin |
Looks good! |
This comment has been minimized.
This comment has been minimized.
cc @jcarter12 (this is the stopping workers PR) |
Due to module inclusion spaghetti, it's easier and less confusing to reference the Reason constants consistently in the MiqServer class, which is the ultimate destination for all of these modules. Fixes ManageIQ#13901 introduced in ManageIQ#13805 https://bugzilla.redhat.com/show_bug.cgi?id=1395736
Kill workers that don't stop after a configurable time (cherry picked from commit 9764870) https://bugzilla.redhat.com/show_bug.cgi?id=1395736
Kill workers that don't stop after a configurable time (cherry picked from commit 9764870) https://bugzilla.redhat.com/show_bug.cgi?id=1395736
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Backported to Euwe via #13949 |
This comment has been minimized.
This comment has been minimized.
Backported to Darga via #13950 |
jrafanie commentedFeb 7, 2017
Previously, we'd gracefully ask them to exit and if the queue work
they're doing, takes 1 hour to do, they'd exceed
memory thresholds, keep running until the work is done and finally
respond to the exit request.
Now, we mark them as 'stopping' when they
exceed a threshold and they have up to 10 minutes to finish before we'd
kill them. This value is configurable in the 'stopping_timeout' field in
each worker's advanced settings.
https://bugzilla.redhat.com/show_bug.cgi?id=1395736
@gtanzillo @carbonin Please review