Skip to content

Files

Latest commit

 

History

History

ray

Agent Check: Ray

Overview

This check monitors Ray through the Datadog Agent. Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads, from reinforcement learning to deep learning to tuning, and model serving.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

Starting from Agent release 7.49.0, the Ray check is included in the Datadog Agent package. No additional installation is needed on your server.

WARNING: This check uses OpenMetrics to collect metrics from the OpenMetrics endpoint Ray can expose, which requires Python 3.

Configuration

Host

Metric collection
  1. Edit the ray.d/conf.yaml file, in the conf.d/ folder at the root of your Agent's configuration directory to start collecting your Ray performance data. See the sample configuration file for all available configuration options.

    This example demonstrates the configuration:

    init_config:
      ...
    instances:
      - openmetrics_endpoint: http://<RAY_ADDRESS>:8080
  2. Restart the Agent after modifying the configuration.

Docker

Metric collection

This example demonstrates the configuration as a Docker label inside docker-compose.yml. See the sample configuration file for all available configuration options.

labels:
  com.datadoghq.ad.checks: '{"ray":{"instances":[{"openmetrics_endpoint":"http://%%host%%:8080"}]}}'

Kubernetes

Metric collection

This example demonstrates the configuration as Kubernetes annotations on your Ray pods. See the sample configuration file for all available configuration options.

apiVersion: v1
kind: Pod
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/ray.checks: |-
      {
        "ray": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080"
            }
          ]
        }
      }
    # (...)
spec:
  containers:
    - name: 'ray'
# (...)

Ray metrics are available on the OpenMetrics endpoint. Additionally, Ray allows you to export custom application-level metrics. You can configure the Ray integration to collect these metrics using the extra_metrics option. All Ray metrics, including your custom metrics, use the ray. prefix.

Note: Custom Ray metrics are considered standard metrics in Datadog.

This example demonstrates a configuration leveraging the extra_metrics option:

init_config:
  ...
instances:
  - openmetrics_endpoint: http://<RAY_ADDRESS>:8080
    # Also collect your own Ray metrics
    extra_metrics:
      - my_custom_ray_metric

More info on how to configure this option can be found in the sample ray.d/conf.yaml configuration file.

Validation

Run the Agent's status subcommand and look for ray under the Checks section.

Data Collected

Metrics

See metadata.csv for a list of metrics provided by this integration.

Events

The Ray integration does not include any events.

Service Checks

See service_checks.json for a list of service checks provided by this integration.

Logs

The Ray integration can collect logs from the Ray service and forward them to Datadog.

  1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml file:

    logs_enabled: true
  2. Uncomment and edit the logs configuration block in your ray.d/conf.yaml file. Here's an example:

    logs:
      - type: file
        path: /tmp/ray/session_latest/logs/dashboard.log
        source: ray
        service: ray
      - type: file
        path: /tmp/ray/session_latest/logs/gcs_server.out
        source: ray
        service: ray

Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

Then, set Log Integrations as pod annotations. This can also be configured with a file, a configmap, or a key-value store. For more information, see the configuration section of Kubernetes Log Collection.

Annotations v1/v2

apiVersion: v1
kind: Pod
metadata:
  name: ray
  annotations:
    ad.datadoghq.com/apache.logs: '[{"source":"ray","service":"ray"}]'
spec:
  containers:
    - name: ray

For more information about the logging configuration with Ray and all the log files, see the official Ray documentation.

Troubleshooting

Need help? Contact Datadog support.