Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions content/en/monitors/types/synthetic_monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ Depending on your incident management strategy, you may want to involve multiple

To enable renotification, toggle **Enable renotification** and select a time interval from the dropdown menu.

**Note**: To learn how Synthetic Monitoring notifications evaluate test results and trigger alerts, review the [Understanding Synthetic Monitor Alerting][7] guide.

## Enhanced notifications

Use and enrich Synthetic monitors to send more detailed notifications when a Synthetic Monitoring test is failing. The following features are available:
Expand Down Expand Up @@ -81,3 +83,4 @@ For more information, see [Synthetic Monitoring notifications][6].
[4]: /monitors/notify/#notification-recipients
[5]: /monitors/notify/#renotify
[6]: /synthetics/notifications
[7]: /synthetics/guide/how-synthetics-monitors-trigger-alerts/
1 change: 1 addition & 0 deletions content/en/synthetics/guide/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ cascade:
{{< nextlink href="monitors/types/synthetic_monitoring/" >}}Use Synthetic Test Monitors{{< /nextlink >}}
{{< nextlink href="synthetics/guide/synthetic-test-retries-monitor-status/" >}}Understand how Synthetic test retries determine monitor status{{< /nextlink >}}
{{< nextlink href="synthetics/guide/uptime-percentage-widget" >}}Monitor website uptime with SLOs{{< /nextlink >}}
{{< nextlink href="synthetics/guide/how-synthetics-monitors-trigger-alerts" >}}Understand how Synthetic monitors trigger an alert{{< /nextlink >}}
{{< /whatsnext >}}

{{< whatsnext desc="API:" >}}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
---
title: Understanding Synthetic Monitor Alerting
description: Understand how alerting rules, retries, and location thresholds determine when a Synthetic monitor alerts or recovers.

further_reading:
- link: '/synthetics/browser_tests'
tag: 'Documentation'
text: 'Learn about Browser Tests'
- link: '/synthetics/guide/synthetic-test-retries-monitor-status/'
tag: 'Guide'
text: 'Understand test retries and monitor status'
- link: '/synthetics/guide/uptime-percentage-widget/'
tag: 'Guide'
text: 'Monitor website uptime with SLOs'
---

Synthetic Monitoring monitors evaluate test results **over time**, not individual test executions.
This page explains how Datadog determines when a Synthetic Monitoring notification triggers an alert or recovers, and why alerts may behave differently than expected.

Use this page to understand:

- Why a monitor alerted later than expected
- Why a monitor recovered even though failures are still visible
- Why a test failure did not trigger an alert

## How alert evaluation works

Synthetic Monitoring does not trigger alerts based on a single failed run. Instead, it continuously evaluates test results through the following steps:

1. The test runs based on its configured schedule.
2. [Fast retries](#fast-retries) are applied, if configured.
3. Test results are aggregated across locations.
4. Failures are evaluated over time using the alerting rules.
5. The monitor transitions between **OK**, **Alert**, or **No Data** [status](#status-descriptions) as conditions are met or no longer met.

A monitor transitions to **Alert** only when all alerting rules are satisfied.
## Test runs that generate alerts
| Test run type | Evaluated for alerting |
|-----------------------------------------|------------------------|
| Scheduled runs | Yes |
| CI/CD-triggered runs | No |
| Manually triggered runs (unpaused test) | Yes, if state changes |
| Manually triggered runs (paused test) | No |

## Fast retries

Fast retries automatically re-run failed test executions.

{{< img src="synthetics/guide/monitors_trigger_alerts/fast_retry_2.png" alt="Retry conditions step of a synthetics test" style="width:80%;" >}}

**Example behaviors of fast retries:**

- A test configured with *n* retries can execute up to *n + 1* times per scheduled run (including the original attempt).
- If you have a [minimum duration](#alerting-rules) configured as an alerting rule, the timer starts when the final fast retry execution fails.
- Fast retry runs appear in test results with a `(fast retry)` label in the **Run Type** column. <br></br>

{{< img src="synthetics/guide/monitors_trigger_alerts/fast_retry_test_runs_2.png" alt="Test runs screen of a Synthetics test, highlighting the Scheduled (fast retry) run type" style="width:100%;" >}}

## Alerting rules

Alerting rules define when a monitor is allowed to change state based on test failures over time. When fast retries are enabled, a test run is not considered failed, and alerting evaluation does not begin, until all retries have been exhausted. An alert triggers only when all alerting conditions are met continuously for the configured duration.

Alerting rules typically include:

- **Minimum duration (alerting delay)**
How long failures must persist before triggering an alert.

- **Location scope**
For example, *any 1 of N locations* or *all locations*. <br></br>

{{< img src="synthetics/guide/monitors_trigger_alerts/schedule_and_alert_2.png" alt="Test runs screen of a Synthetics test, highlighting the Scheduled (fast retry) run type" style="width:80%;" >}}

<div class="alert alert-info">If any part of the alerting rule stops being true during the evaluation window, the minimum duration timer resets.</div>

## Test frequency and minimum duration

Two commonly confused settings are:

- **Test frequency**: How often the test runs
- **Minimum duration**: How long the test must continuously fail before alerting
<br>**Note**: If you have [fast retries](#fast-retries) enabled, the minimum duration timer starts when the final fast retry test execution fails.

### Example: Alerts triggered immediately

- Fast retries (not configured)
- Test frequency: 15 minutes
- Minimum duration: 13 minutes
- Location scope: 1 of 1

With the above settings, the alert triggers 13 minutes after the scheduled test runs have failed:

| Time | Event | Result | Monitor status |
|------|-------|--------|----------------|
| t0 | Scheduled test runs | Pass | OK |
| t15 | Scheduled test runs | Fail | OK (Minimum duration timer starts)|
| t28 | N/A |Fail | ALERT (13 minutes elapsed)|

If you want alerts after one test execution, this is a recommended configuration.

### Example: Fast retries

- Fast retries: 2 retries, with 1 minutes between retries
- Test frequency: 30 minutes
- Minimum duration: 5 minutes
- Location scope: 1 of 1

With the above settings, the minimum duration timer starts when the second fast retry fails:
| Time | Event | Result | Monitor status |
|------|-------|--------|----------------|
| t0 | Scheduled test runs | Pass | OK |
| t30 | Scheduled test runs | Fail | OK |
| t31 | First fast retry for scheduled test run at t30 | Fail | OK |
| t32 | Second fast retry for scheduled test run at t30 | Fail | OK (Minimum duration timer starts)|
| t37 | N/A | Fail | ALERT (5 minutes elapsed) |
| t60 | Scheduled test runs| Pass | OK |

**Note**: Because fast retries were configured, the alert triggered at t37 instead of t35, adding a 2-minute delay.

### Best practices

- If you want immediate alerting, set the minimum duration to `0` to alert as soon as a failure occurs.
- Enable fast retries to handle transient issues like network blips. For frequently running tests, pair this with a longer minimum duration to reduce alert noise.
- Avoid [overlapping fast retries with scheduled test runs][3] to help you determine which fast retries are associated with its related scheduled test runs.

## Location-based evaluation

Location rules determine **how many locations must fail** for an alert to trigger.

Common patterns include:

- Fail from any 1 of _N_ locations
- Fail from all locations
- At one moment, all locations were failing

A monitor can recover even if **some locations are still failing**, as long as the configured alerting rules are no longer satisfied during the evaluation window.

## Alert and recovery behavior

A recovery does not require all test runs to pass, only that the alerting conditions are no longer true.

- **Alert** notifications are sent when alerting rules are met.
- **Recovery** notifications are sent when alerting rules are no longer met.

## Global uptime and alert state

**Global uptime** represents the percentage of time your monitor was healthy (`OK` status) during the selected time period.

It is based on how long the monitor stayed in an `OK` state compared to the total monitoring period. Any time the monitor spends in an `ALERT` state lowers the global uptime.

Because this metric is based on the duration of the monitor's status and not on the status of a test execution, it cannot be reliably calculated based on the ratio of successful test results to the total number of test executions over the same period.

Depending on the test frequency, there may be times when the ratio can be used to "approximate" the global uptime. In some basic alerting configurations, such as a test that runs every minute with a minimum duration of 0, the ratio might roughly approximate the global uptime.

The formula for calculating global uptime is:

```
Global Uptime = ((Total Period - Time in Alert) / Total Period) × 100
```

### Example calculation

The following example demonstrates how a 95.83% global uptime is calculated.

1. Identify the monitoring period.

The monitor is scoped to `Jan 12, 10:56 AM - Jan 12, 4:56 PM`, a 360-minute period:

{{< img src="synthetics/guide/monitors_trigger_alerts/global_uptime.png" alt="A synthetics test run showing global uptime of 95.83%" style="width:100%;" >}}

2. Determine the time spent in alert status.

Zoom into the time range to identify when the monitor was in an alert state:

{{< img src="synthetics/guide/monitors_trigger_alerts/global_uptime_video.mp4" alt="Video of a Synthetics test run, scoping into the datetime period of the alert" video=true >}}

The alert period is `Jan 12, 3:46 PM – Jan 12, 4:01 PM`, approximately 15 minutes.

3. Apply the formula.

```text {hl_lines=[3]}
Total Period = 360 minutes
Time in Alert = 15 minutes
Global Uptime = ((360 - 15) / 360) × 100 = 95.83%
```

## Status descriptions

OK
: The monitor is healthy. Either all test runs are passing, or failures have not met the alerting conditions (minimum duration and location requirements).

ALERT
: The alerting conditions have been met. The test has been failing continuously for the configured minimum duration across the required number of locations.

NO DATA
: The monitor has not received any test results from any location (managed, private, or Datadog Agent) during the queried time period. Common causes include: <br></br>

- **The test is paused**: Paused tests do not execute and produce no data.
- **Advanced schedule configuration**: The queried time period falls outside the test's configured schedule windows.
- **Delay in test execution**: The test has not yet run during the selected time period. This typically occurs with overloaded private locations, which may cause intermittent timeouts, missed runs, gaps in the test schedule, or the private location stopped reporting.
When these symptoms are present, too many tests are assigned to the private location for it to handle. You can resolve this by adding workers, increasing concurrency, or adding compute resources. See [Dimensioning Private Locations][4] for more information.
- **Delay in data ingestion**: Test results have not yet been processed and are not available for the queried time period.

## Why alerts may behave unexpectedly

If a monitor does not alert or recovers unexpectedly, check for the following:

- Minimum duration and test frequency alignment
- Fast retry configuration
- Location scope
- Test execution results within the evaluation window
- Whether the test was paused

## Further Reading

{{< partial name="whats-next/whats-next.html" >}}

[3]: /synthetics/guide/synthetic-test-retries-monitor-status/#retries-that-overlap-with-other-test-runs
[4]: /synthetics/platform/private_locations/dimensioning
29 changes: 17 additions & 12 deletions content/en/synthetics/notifications/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ You can customize notifications using:
- **[Custom notification display](#display-custom-notifications-message)**: Show only your custom message without default enriched content.
- **[Simulate notifications](#simulate-notifications)**: Test your notification messages by sending simulated notifications.

**Note**: For information about accessing local (config) variables, see the [Variables][6] section.
**Note**: To learn how Synthetic Monitoring notifications evaluate test results and trigger alerts, review the [Understanding Synthetic Monitor Alerting][6] guide.

## Pre-filled monitor messages

Expand All @@ -48,8 +48,10 @@ These values appear by default in most notification channels. You can override o
{{< tabs >}}
{{% tab "API request response" %}}

Display the HTTP request and response details from an API test, including the method, URL, headers, body, status code, and any redirects.

**Request:**
```handlebars
```shell
{{#with synthetics.attributes.result.request}}
We made a {{method}} request to `{{{url}}}`{{#if headers}} with the following headers:

Expand All @@ -67,7 +69,7 @@ We made a {{method}} request to `{{{url}}}`{{#if headers}} with the following he
```

**Response:**
```handlebars
```shell
{{#with synthetics.attributes.result.response}}
We received an HTTP {{httpVersion}} response with a {{statusCode}} status code{{#if headers}} with the following headers:

Expand All @@ -93,7 +95,9 @@ The body's size was {{eval "humanize_bytes(bodySize)"}}{{#if body}} and containe
{{% /tab %}}
{{% tab "WebSocket tests" %}}

```handlebars
Display WebSocket test details including the handshake status, request message, and response close status with reason.

```shell
{{! Websocket request and response details }}
{{#with synthetics.attributes.result}}
{{#if handshake }}
Expand All @@ -119,10 +123,10 @@ and the response closed with status code {{response.close.statusCode}} and reaso
{{% /tab %}}
{{% tab "API tests variables" %}}

Iterate over extracted variables for API tests:
List all config and extracted variables from an API test, showing their names, types, and values. Obfuscated values are hidden for security.

**Config variables:**
```handlebars
```shell
{{#each synthetics.attributes.result.variables.config}}
* **Name:** {{name}}
Type: {{type}}
Expand All @@ -131,7 +135,7 @@ Iterate over extracted variables for API tests:
```

**Extracted Variables (Only visible for recovery notifications):**
```handlebars
```shell
{{#each synthetics.attributes.result.variables.extracted}}
* **Name:** {{name}}
Global Variable ID: {{id}}
Expand All @@ -142,9 +146,9 @@ Iterate over extracted variables for API tests:
{{% /tab %}}
{{% tab "Multistep API variables" %}}

Iterate over steps extracting variables for multistep API tests:
Loop through all steps in a multistep API test and display variables extracted by each successful step.

```handlebars
```shell
{{! List extracted variables across all successful steps }}
# Extracted Variables
{{#each synthetics.attributes.result.steps}}
Expand All @@ -158,9 +162,9 @@ Iterate over steps extracting variables for multistep API tests:
{{% /tab %}}
{{% tab "Browser and mobile test variables" %}}

Iterate over steps extracting variables for browser and mobile tests:
Loop through all steps in a browser or mobile test and display variables extracted by steps that use the "Extract variable" action.

```handlebars
```shell
{{#each synthetics.attributes.result.steps}}
{{#if extractedValue}}
* **Name**: {{extractedValue.name}}
Expand Down Expand Up @@ -249,6 +253,7 @@ Simulated notifications include **[TEST]** in their subject lines and use a defa
[3]: /synthetics/notifications/conditional_alerting
[4]: /synthetics/notifications/advanced_notifications
[5]: /monitors/notifications
[6]: /synthetics/notifications/template_variables/?tab=testinfo#variables
[6]: /synthetics/guide/how-synthetics-monitors-trigger-alerts/



3 changes: 3 additions & 0 deletions content/en/synthetics/notifications/template_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ further_reading:
- link: "/monitors/templates/"
tag: "Documentation"
text: "Learn more about monitor templates"
- link: "/synthetics/guide/how-synthetics-monitors-trigger-alerts/"
tag: "Guide"
text: "Understanding Synthetic Monitor Alerting"
---

## Overview
Expand Down
3 changes: 3 additions & 0 deletions content/en/synthetics/test_suites/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ further_reading:
- link: "https://www.datadoghq.com/blog/test-suites/"
tag: "Blog"
text: "Get organized, actionable insights from complex test environments"
- link: "/synthetics/guide/how-synthetics-monitors-trigger-alerts/"
tag: "Guide"
text: "Understanding Synthetic Monitor Alerting"
---

## Overview
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading