## Observability

Ray Serve provides a rich set of observability tools to help you understand the behavior of your service.

<div class="alert alert-info">

<b>Here is the roadmap for this notebook:</b>

<ol>
    <li>Metrics</li>
    <li>Logs</li>
    <li>Health Checks</li>
    <li>Alerts</li>
    <li>Tracing</li>
</ol>
</div>

**Imports**


In [None]:
import logging
import time
from pathlib import Path

import requests
from ray import serve
from ray.serve import metrics
from starlette.requests import Request

## 1. Metrics

Ray Serve provides the following metrics:

- **Throughput metrics:**
    - Queries per second (QPS)
    - Error QPS
    - Error by error code QPS

Shown are the throughput metrics for the MNIST application.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-serve/throughput_per_application.png" alt="Ray Serve Metrics" width="800">

- **Latency metrics:**
    - P50, P90, P99 latencies

Shown are the latency metrics for the MNIST application.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-serve/latency_per_application.png" alt="Ray Serve Latency Metrics" width="800">

- **Latency and throughput metrics are available at different levels of granularity:**
    - Per-application metrics
    - Per-deployment metrics
    - Per-replica metrics

Shown are the latency metrics on the deployment level.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-serve/latency_per_deployment.png" alt="Ray Serve Latency Metrics" width="800">

- **Deployment-specific metrics:**
    - Number of replicas
    - Queue size (TODO - explain which queue)

Shown are the number of replicas and queue size for the MNIST application.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-serve/replicas_per_deployment.png" alt="Ray Serve Deployment Metrics" width="400">

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-serve/queue_size_per_deploymnet.png" alt="Ray Serve Deployment Metrics" width="400">

### Define custom metrics

It is a good practice to define custom metrics to track the performance of your application.

To do so, use `serve.metrics`


In [None]:
@serve.deployment(num_replicas=2)
class MyDeployment:
    def __init__(self):
        self.my_counter = metrics.Counter(
            "my_counter",
            description=("The number of odd-numbered requests to this deployment."),
            tag_keys=("model",),
        )
        self.my_counter.set_default_tags({"model": "123"})

    async def __call__(self):
        self.my_counter.inc()


my_deployment = MyDeployment.bind()
serve.run(my_deployment)

We can then send requests to the deployment and see the custom metric in action.


In [None]:
start = time.time()
while time.time() - start < 120:
    requests.get("http://localhost:8000/")
    time.sleep(1)

Here is how the custom metric looks like in the Anyscale dashboard.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-serve/custom_metric.png" alt="Ray Serve Custom Metric" width="500">




<div class="alert alert-info">

**Note:**

Prometheus scrapes metrics at regular intervals. This is why we don't see the counter incrementing in real-time.

Configuring a shorter scrape interval will improve the resolution of the metrics but will also increase the load on the server.

</div>


## 2. Logs

To understand system-level behavior and to surface application-level details during runtime, you can leverage Ray Serve's logging.

**Implementation:**
- Uses Python's standard logging module
- Logger name is "ray.serve"

**Log Output Locations:**
- Logs are sent to stderr
- Logs are written to disk at `/tmp/ray/session_latest/logs/serve/`

**Types of Logs Captured:**
- System-level logs (from Serve controller and proxy)
- Access logs
- Custom user logs from deployment replicas

**Development Environment Behavior:**
- Logs are streamed to the driver Ray program
- Driver program can be either:
    - Python script calling serve.run()
    - serve run CLI command

Here is how to use logging in a deployment.


In [None]:
@serve.deployment()
class SayHelloDefaultLogging:
    async def __call__(self):
        logger = logging.getLogger("ray.serve")
        logger.info("hello world")


serve.run(SayHelloDefaultLogging.bind())

resp = requests.get("http://localhost:8000/")

<div class="alert alert-warning">

**Note:**
Given Ray Serve uses Python's standard logging module, aggressive logging inside your application will incur a performance penalty. Use logging levels to control the verbosity of your logs and to avoid this penalty when running in production.

</div>

### Logging Configuration

Here are the common configurations for logging.

- `enable_access_log`: Access logs are injected by default into Replica and Proxy logs. By default, it is `True`.
- `log_level`: Set the log level. By default, it is `INFO`.
- `encoding`: Set the encoding of the log file. By default, it is `JSON`.

You can set the logging configuration:
- At the deployment level
- At the serve instance level

Both programmatically or via a configuration file.



In [None]:
@serve.deployment(logging_config={"log_level": "DEBUG"})
class SayHelloDebugLogging:
    async def __call__(self):
        logger = logging.getLogger("ray.serve")
        logger.debug("hello world")


serve.run(
    SayHelloDebugLogging.bind(),
    logging_config={
        "encoding": "JSON",
        "log_level": "INFO",
        "enable_access_log": False,
    },
)

resp = requests.get("http://localhost:8000/")

## 3. Health Checks
You can configure health checks for your deployments to help detect when a replica is unhealthy.

<div class="alert alert-block alert-info">

**Best practice** If a Replica is using a stateful resource like a database, it is important to check the health of the resource periodically.

</div>



Here is an example that simulates a replica that relies on a database

In [None]:
def connect_to_db(con_url): 
    # Simulate a database connection
    time.sleep(1)
    return f"Connected to {con_url}"

# write a status file
path =  Path("/mnt/cluster_storage/db_status.txt")
path.write_text("alive")

# check status of connection
def is_alive(db):
    with open(path, "r") as file:
        status = file.read()
    return status == "alive"

@serve.deployment
class DataFetcher:
    def __init__(self, con_url):
        self.db = connect_to_db(con_url)
    def __call__(self, request: Request):
        return "ok"
    def check_health(self): # implement health check
        if not is_alive(self.db):
            raise Exception("Database connection lost")


We run the deployment and expect the health check to pass.


In [None]:
app_handle = serve.run(DataFetcher.bind("db_url"), name="data-fetcher", blocking=False)

We can simulate a database connection failure by modifying the status file. 


In [None]:
path.write_text("dead")

Observe the following key log lines

```
2025-09-26 18:01:45,513 WARNING Health check for Replica(id='f4yawq8r', deployment='DataFetcher', app='data-fetcher') failed: mray::ServeReplica:data-fetcher:DataFetcher.check_health() (pid=2634, ip=10.0.43.216, actor_id=7b2a9e3a4039907963dc2cb208000000, repr=<ray.serve._private.replica.ServeReplica:data-fetcher:DataFetcher object at 0x7f79addc2810>) ...
2025-09-26 18:01:54,722 WARNING Health check for Replica(id='f4yawq8r', deployment='DataFetcher', app='data-fetcher') failed: mray::ServeReplica:data-fetcher:DataFetcher.check_health() (pid=2634, ip=10.0.43.216, actor_id=7b2a9e3a4039907963dc2cb208000000, repr=<ray.serve._private.replica.ServeReplica:data-fetcher:DataFetcher object at 0x7f79addc2810>) ...
2025-09-26 18:02:04,364 WARNING Health check for Replica(id='f4yawq8r', deployment='DataFetcher', app='data-fetcher') failed: mray::ServeReplica:data-fetcher:DataFetcher.check_health() (pid=2634, ip=10.0.43.216, actor_id=7b2a9e3a4039907963dc2cb208000000, repr=<ray.serve._private.replica.ServeReplica:data-fetcher:DataFetcher object at 0x7f79addc2810>) ...
2025-09-26 18:02:04,364 WARNING Replica Replica(id='f4yawq8r', deployment='DataFetcher', app='data-fetcher') failed the health check 3 times in a row, marking it unhealthy.
2025-09-26 18:02:04,365 WARNING Replica Replica(id='f4yawq8r', deployment='DataFetcher', app='data-fetcher') failed health check, stopping it.
2025-09-26 18:02:04,365 INFO Stopping Replica(id='f4yawq8r', deployment='DataFetcher', app='data-fetcher') (currently ReplicaState.RUNNING).
2025-09-26 18:02:04,367 INFO Adding 1 replica to Deployment(name='DataFetcher', app='data-fetcher').
```

As expected, the health check was triggered and the replica was replaced.

Given the health check will continue to fail, the replica will be replaced again and again. We shutdown the serve application to avoid endless retries.


In [None]:
serve.shutdown()

## 4. Alerts

Ray integrates with Prometheus and Grafana for an enhanced observability experience. For alerts, the common route is to rely on Grafana alerting.

**Alert Types:**
Grafana [can alert](https://grafana.com/docs/grafana/v7.5/alerting/) based on:
- Metric values
- Rate of change
- Metric absence

**Notification Options:**
- Supports multiple [notification channels](https://grafana.com/docs/grafana/v7.5/alerting/notifications/#add-a-notification-channel) (Slack, PagerDuty, etc.)
- Email support planned for future
- Configurable through notification channels

**Documentation:** Full setup details available in Grafana's [official documentation](https://grafana.com/docs/grafana/v7.5/alerting/)

## 5. Tracing

To perform end-to-end distributed tracing of requests, you can use the Anyscale Tracing integration.

See the [tracing guide](https://docs.anyscale.com/monitoring/tracing/) for details.