This check monitors Velero through the Datadog Agent. It collects data about Velero's backup, restore and snapshot operations. This allows users to gain insight into the health, performance and reliability of their disaster recovery processes.
The Velero check is included in the Datadog Agent package. No additional installation is needed on your server.
Follow the instructions below to install and configure this check for an Agent running on a host.
-
Edit the
velero.d/conf.yaml
file, in theconf.d/
folder at the root of your Agent's configuration directory to start collecting your Velero performance data. See the sample velero.d/conf.yaml for all available configuration options.
See the Autodiscovery Integration Templates for guidance on configuring this integration in a containerized environment.
Note that two types of pods need to be queried for all metrics to be collected: velero
and node-agent
Therefore, make sure to update the annotations of the velero
deployment as well as the node-agent
daemonset.
Run the Agent's status subcommand and look for velero
under the Checks section.
This integration collects various Velero metrics, including:
- Backup: Success/failure rates, durations, and data sizes.
- Restore: Success/failure counts and validation failures.
- Snapshot: CSI and volume snapshot attempts, successes, and failures.
- Pod volume data: Upload/download success and failure rates. These are exposed by the
node-agent
pods.
See metadata.csv for a list of metrics provided by this integration.
The Velero integration does not include any events.
The Velero integration does not include any service checks.
Make sure that your Velero server is exposing metrics by checking that the feature is enabled in the deployment configuration:
# Settings for Velero's prometheus metrics. Enabled by default.
metrics:
enabled: true
scrapeInterval: 30s
scrapeTimeout: 10s
Need help? Contact Datadog support.