-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disaster detection and disabled actions #1230
Conversation
name: 'type', | ||
label: 'Type', | ||
isRequired: true, | ||
options: DISABLED_ACTION_TYPES.map((type) => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to exclude already-disabled actions from this list.
(It would also be nice to be able to select multiple. But I don't think we have a multiselect component in FormModal yet, and I don't know if the backend even supports that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't write the backend to support multiple yet, might be a good idea at some point. However, it does support overriding the message for one that is already present (i.e if you issue a new request for disabling bounces when they are alreayd disabled, it will take the message form the newer one). So I might leave it be for the moment
@tpetr Updated the PR a bit to take the time range over which events are occurring into account better. - task lag takes into account how long the calculated value has been over the specified threshold. A disaster will trigger if it has been over for a certain amount of time (45s default)
|
@tpetr added a more comprehensive |
FREEZE_SLAVE(true), ACTIVATE_SLAVE(true), DECOMMISSION_SLAVE(true), VIEW_SLAVES(false), | ||
FREEZE_RACK(true), ACTIVATE_RACK(true), DECOMMISSION_RACK(true), VIEW_RACKS(false); | ||
|
||
private final boolean disableable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we word this better? canDisable
maybe?
@@ -2,6 +2,8 @@ | |||
|
|||
import static com.hubspot.singularity.data.transcoders.SingularityJsonTranscoderBinder.bindTranscoder; | |||
|
|||
import javax.ws.rs.HEAD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this snuck in from merge conflicts
In a critical situation it is helpful to limit the amount of task churn in Singularity. This PR adds the ability for an admin to globally disable certain actions. So far it is implemented for
BOUNCE
,DEPLOY
,SCALE
,REMOVE
, andDECOMMISSION
but it's easy to add more if needed.a
POST
to/disasters/disabled-actions/{action}
adds an action to the list of ones that are disabled with an optional messagea
DELETE
to/disasters/disabled-actions/{action}
removes it from that listSingularity will respond with a 423 (locked) and the message given when disabling (or a default message)
In this PR I am also adding an automated way of disabling actions based on things such as task lag and the frequency of lost slaves or lost tasks.
TODO for this PR:
/cc @tpetr