Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 235 additions & 0 deletions docs/reference/sql/functions/anomaly.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
---
keywords: [anomaly detection, anomaly score, window functions, zscore, MAD, IQR, statistical functions]
description: Lists and describes the anomaly detection window functions available in GreptimeDB, including Z-Score, MAD, and IQR-based scoring.
---

# Anomaly Detection Functions

GreptimeDB provides a set of statistical anomaly-scoring **window functions** that compute a numeric score reflecting how anomalous each row is relative to its window of values.
All three functions must be used with an `OVER` clause (window function syntax).

:::tip
These functions return `NULL` when the window does not contain enough valid (non-NULL) data points.
A score of `0.0` means the value is not anomalous; a larger value indicates a stronger anomaly.
When the spread (stddev / MAD / IQR) is zero but the current value deviates from the window center, the returned score is `+inf`, meaning the deviation is infinitely anomalous.
:::

## `anomaly_score_zscore`

Computes a Z-Score-based anomaly score for each row in a window.

**Formula:** `|x − mean| / stddev`

**Minimum valid samples:** 2 (uses population stddev, i.e. dividing by n)

```sql
anomaly_score_zscore(value) OVER (window_spec)
```

**Arguments:**

- **value**: A numeric column or expression to evaluate.

**Return type:** `DOUBLE`

**Degenerate cases:**

| Condition | Result |
|---|---|
| Fewer than 2 valid points in window | `NULL` |
| `stddev = 0` and `value = mean` | `0.0` |
| `stddev = 0` and `value ≠ mean` | `+inf` |
| Normal case | Finite positive `DOUBLE` |

**Example:**

```sql
SELECT
ts,
val,
anomaly_score_zscore(val) OVER (
ORDER BY ts
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) AS zscore
FROM metrics
ORDER BY ts;
```

## `anomaly_score_mad`

Computes a Median Absolute Deviation (MAD)-based anomaly score for each row in a window.
MAD is more robust than Z-Score because it is not influenced by extreme outliers.

**Formula:** `|x − median| / (MAD × 1.4826)`

The constant 1.4826 is a consistency factor that makes the MAD-based score asymptotically equivalent to the Z-Score for normally distributed data.

**Minimum valid samples:** 3 (with ≤ 2 samples, MAD is almost always 0, which yields spurious `+inf` scores)

```sql
anomaly_score_mad(value) OVER (window_spec)
```

**Arguments:**

- **value**: A numeric column or expression to evaluate.

**Return type:** `DOUBLE`

**Degenerate cases:**

| Condition | Result |
|---|---|
| Fewer than 3 valid points in window | `NULL` |
| `MAD = 0` and `value = median` | `0.0` |
| `MAD = 0` and `value ≠ median` | `+inf` |
| Normal case | Finite positive `DOUBLE` |

**Example:**

```sql
SELECT
ts,
val,
anomaly_score_mad(val) OVER (
PARTITION BY host
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS mad_score
FROM metrics
ORDER BY host, ts;
```

## `anomaly_score_iqr`

Computes an IQR (Interquartile Range / Tukey Fences)-based anomaly score for each row in a window.
The score measures the distance of the value beyond the lower fence (`Q1 − k × IQR`) or upper fence (`Q3 + k × IQR`).
Values within the fences receive a score of `0.0`.

**Formula:**
- If `value < Q1 − k × IQR`: score = `(Q1 − k × IQR − value) / IQR`
- If `value > Q3 + k × IQR`: score = `(value − Q3 − k × IQR) / IQR`
- Otherwise: score = `0.0`

**Minimum valid samples:** 3 (linear-interpolated Q1 ≠ Q3 is only possible at n ≥ 3)

```sql
anomaly_score_iqr(value, k) OVER (window_spec)
```

**Arguments:**

- **value**: A numeric column or expression to evaluate.
- **k**: A non-negative `DOUBLE` multiplier for the IQR fences (e.g., `1.5` for standard Tukey fences, `3.0` for far-out fences). Returns `NULL` if `k < 0`.

**Return type:** `DOUBLE`

**Degenerate cases:**

| Condition | Result |
|---|---|
| Fewer than 3 valid points in window | `NULL` |
| `IQR = 0` and value is within fences | `0.0` |
| `IQR = 0` and value is outside fences | `+inf` |
| Normal case | Finite non-negative `DOUBLE` |

**Example:**

```sql
SELECT
ts,
val,
anomaly_score_iqr(val, 1.5) OVER (
PARTITION BY host
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS iqr_score
FROM metrics
ORDER BY host, ts;
```

## Full Usage Example

This example creates a sample table, inserts time-series data with an injected outlier, and then uses all three anomaly functions together.

```sql
CREATE TABLE sensor_data (
host STRING,
val DOUBLE,
ts TIMESTAMP TIME INDEX,
PRIMARY KEY (host)
);

INSERT INTO sensor_data VALUES
('web-1', 10.0, '2025-01-01 00:00:00'),
('web-1', 11.0, '2025-01-01 00:01:00'),
('web-1', 10.5, '2025-01-01 00:02:00'),
('web-1', 10.8, '2025-01-01 00:03:00'),
('web-1', 80.0, '2025-01-01 00:04:00'), -- outlier
('web-1', 10.3, '2025-01-01 00:05:00'),
('web-1', 11.2, '2025-01-01 00:06:00');
```

Use a shared named window and round results for readability:

```sql
SELECT
ts,
val,
ROUND(anomaly_score_zscore(val) OVER w, 2) AS zscore,
ROUND(anomaly_score_mad(val) OVER w, 2) AS mad,
ROUND(anomaly_score_iqr(val, 1.5) OVER w, 2) AS iqr
FROM sensor_data
WINDOW w AS (
PARTITION BY host
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
ORDER BY ts;
```

Expected output — the outlier row (`val = 80.0`) produces significantly higher scores across all three metrics:

```
+---------------------+------+--------+--------+-------+
| ts | val | zscore | mad | iqr |
+---------------------+------+--------+--------+-------+
| 2025-01-01 00:00:00 | 10 | NULL | NULL | NULL |
| 2025-01-01 00:01:00 | 11 | 1 | NULL | NULL |
| 2025-01-01 00:02:00 | 10.5 | 0 | 0 | 0 |
| 2025-01-01 00:03:00 | 10.8 | 0.6 | 0.4 | 0 |
| 2025-01-01 00:04:00 | 80 | 2 | 155.58 | 136.5 |
| 2025-01-01 00:05:00 | 10.3 | 0.46 | 0.67 | 0 |
| 2025-01-01 00:06:00 | 11.2 | 0.38 | 0.67 | 0 |
+---------------------+------+--------+--------+-------+
```

### Filter Anomalous Rows

You can wrap the window query in a subquery to keep only rows whose score exceeds a threshold:

```sql
SELECT * FROM (
SELECT
host,
ts,
val,
ROUND(anomaly_score_mad(val) OVER (
PARTITION BY host
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), 2) AS mad
FROM sensor_data
) WHERE mad > 3.0
ORDER BY host, ts;
```

Expected output:

```
+-------+---------------------+------+--------+
| host | ts | val | mad |
+-------+---------------------+------+--------+
| web-1 | 2025-01-01 00:04:00 | 80 | 155.58 |
+-------+---------------------+------+--------+
```
1 change: 1 addition & 0 deletions docs/reference/sql/functions/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ Use this page as a quick index to GreptimeDB function references.
- [JSON Functions](./json.md)
- [Vector Functions](./vector.md)
- [Approximate Functions](./approximate.md)
- [Anomaly Detection Functions](./anomaly.md)
Loading