This check monitors Ray through the Datadog Agent. Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads, from reinforcement learning to deep learning to tuning, and model serving.
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
Starting from Agent release 7.49.0, the Ray check is included in the Datadog Agent package. No additional installation is needed on your server.
WARNING: This check uses OpenMetrics to collect metrics from the OpenMetrics endpoint Ray can expose, which requires Python 3.
-
Edit the
ray.d/conf.yaml
file, in theconf.d/
folder at the root of your Agent's configuration directory to start collecting your Ray performance data. See the sample configuration file for all available configuration options.This example demonstrates the configuration:
init_config: ... instances: - openmetrics_endpoint: http://<RAY_ADDRESS>:8080
-
Restart the Agent after modifying the configuration.
This example demonstrates the configuration as a Docker label inside docker-compose.yml
. See the sample configuration file for all available configuration options.
labels:
com.datadoghq.ad.checks: '{"ray":{"instances":[{"openmetrics_endpoint":"http://%%host%%:8080"}]}}'
This example demonstrates the configuration as Kubernetes annotations on your Ray pods. See the sample configuration file for all available configuration options.
apiVersion: v1
kind: Pod
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/ray.checks: |-
{
"ray": {
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080"
}
]
}
}
# (...)
spec:
containers:
- name: 'ray'
# (...)
Ray metrics are available on the OpenMetrics endpoint. Additionally, Ray allows you to export custom application-level metrics. You can configure the Ray integration to collect these metrics using the extra_metrics
option. All Ray metrics, including your custom metrics, use the ray.
prefix.
Note: Custom Ray metrics are considered standard metrics in Datadog.
This example demonstrates a configuration leveraging the extra_metrics
option:
init_config:
...
instances:
- openmetrics_endpoint: http://<RAY_ADDRESS>:8080
# Also collect your own Ray metrics
extra_metrics:
- my_custom_ray_metric
More info on how to configure this option can be found in the sample ray.d/conf.yaml
configuration file.
Run the Agent's status subcommand and look for ray
under the Checks section.
See metadata.csv for a list of metrics provided by this integration.
The Ray integration does not include any events.
See service_checks.json for a list of service checks provided by this integration.
The Ray integration can collect logs from the Ray service and forward them to Datadog.
-
Collecting logs is disabled by default in the Datadog Agent. Enable it in your
datadog.yaml
file:logs_enabled: true
-
Uncomment and edit the logs configuration block in your
ray.d/conf.yaml
file. Here's an example:logs: - type: file path: /tmp/ray/session_latest/logs/dashboard.log source: ray service: ray - type: file path: /tmp/ray/session_latest/logs/gcs_server.out source: ray service: ray
Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.
Then, set Log Integrations as pod annotations. This can also be configured with a file, a configmap, or a key-value store. For more information, see the configuration section of Kubernetes Log Collection.
Annotations v1/v2
apiVersion: v1
kind: Pod
metadata:
name: ray
annotations:
ad.datadoghq.com/apache.logs: '[{"source":"ray","service":"ray"}]'
spec:
containers:
- name: ray
For more information about the logging configuration with Ray and all the log files, see the official Ray documentation.
Need help? Contact Datadog support.