For example, a deployment job in a CI server like Jenkins could invoke this tool to setup a monitor downtime before killing a Tomcat process that is being monitored for health. Once the deployment is successful and tomcat has restarted, the tool can be invoked again to delete the downtime and resume Tomcat's monitoring.
This is a stand-alone script and can be placed and run from anywhere on disk. The recommended way to run it is inside a container. The following Python dependencies must be installed on the system before running the script:
Any downtimes managed by this script are tracked in its local state - a JSON file kept on disk. State initialization must be performed before using this script to manage downtimes. Multiple instances of this script can run simultaneously and operate on a shared state.
It is highly recommended that the state file be continuously backed up. If it is lost, the script cannot track the downtimes created by it.
The CLI is intended to be invoked using Python 2.7.
To get help on the tool's usage, run:
python dd-monitor-downtime.py -help
For help on a specific command, run:
python dd-monitor-downtime.py [command] -help
To get version information, use
python dd-monitor-downtime.py version. While reporting issues, the output of this command must be included in the description.
init command must be used to initialize state. By default, the state file is created in the process's current working directory as
.mdstate.json. This can be changed using
python dd-monitor-downtime.py init python dd-monitor-downtime.py init -statefile /opt/downtime-manager/state.json
When running the script with a downtime management command, it will try to load the state from the default path as described above. The
-state option can be used to specify the file from which to load the state.
python dd-monitor-downtime.py [command] [args] python dd-monitor-downtime.py -state /opt/downtime-manager/state.json [command] [args]
The commands described below are the downtime management commands and require Datadog credentials. These can be supplied either via commandline or environment. Commandline arguments have precedence over environment variables.
|CLI option||Environment variable||Description|
||Datadog account API key|
||Datadog account APP key|
Below is an example of how to supply datadog credentials through the commandline.
python dd-monitor-downtime.py -dd-api-key "xxxxxxx" -dd-app-key "xxxxxxx" [command] [args]
Scheduling a downtime
schedule command must be used to schedule a new downtime. The user must supply a name for this new downtime using
-md-name. This name must be unique across the local state being used by the script. For example, in build systems, the combination of job name and build number can guarantee a unique name (eg-
python dd-monitor-downtime.py schedule -help
- Create downtime on dev environment for ElasticSearch cluster, assuming that
serviceare valid scopes in the user's account.
python dd-monitor-downtime.py schedule -md-name "es-dev" -scope "env:dev,service:elasticsearch"
- Schedule a downtime for a specific time in future
python dd-monitor-downtime.py schedule -md-name "all-apps" -scope "env:prod" -start "1543600009" -end "1543603609" -timezone "UTC"
- Create downtime for a specific monitor
python dd-monitor-downtime.py schedule -md-name "java-apps" -scope "env:stage" -monitor-id "7376587"
- Create a recurring downtime
python dd-monitor-downtime.py schedule -md-name "everything" -scope "env:prod" -recur-type days -recur-period 3 -recur-weekdays "Mon,Fri"
Note that if the
-end option is supplied, the downtime will delete itself from Datadog at the specified time. Despite this, the
cancel command must be run with this downtime's name so that the script also removes it from local state and frees the name for re-use.
Supplying a value for
-end is highly recommended. This ensures that if the script is unable to delete a scheduled downtime, the downtime will delete itself at some point in future and won't permanently hide the target alerts.
Cancelling a downtime
To cancel a downtime managed by this script, the
cancel command can be used and the unique downtime name must be supplied using
python dd-monitor-downtime.py cancel -md-name "video-stream"
The above example deletes the downtime from Datadog as well as from local state and frees the name
video-stream for re-use.
This script internally uses datadogpy to communicate with Datadog. This library produces a log
No agent or invalid configuration file found if the script is run on a machine that doesn't have Datadog agent installed. This log can be safely ignored if the intention is to run the tool from such a machine. The tool itself doesn't require datadog agent to be present on the host.