GreptimeTeam · evenyag · Mar 5, 2026 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026
@@ -0,0 +1,235 @@
+---
+keywords: [anomaly detection, anomaly score, window functions, zscore, MAD, IQR, statistical functions]
+description: Lists and describes the anomaly detection window functions available in GreptimeDB, including Z-Score, MAD, and IQR-based scoring.
+---
+
+# Anomaly Detection Functions
+
+GreptimeDB provides a set of statistical anomaly-scoring **window functions** that compute a numeric score reflecting how anomalous each row is relative to its window of values.
+All three functions must be used with an `OVER` clause (window function syntax).
+
+:::tip
+These functions return `NULL` when the window does not contain enough valid (non-NULL) data points.
+A score of `0.0` means the value is not anomalous; a larger value indicates a stronger anomaly.
+When the spread (stddev / MAD / IQR) is zero but the current value deviates from the window center, the returned score is `+inf`, meaning the deviation is infinitely anomalous.
+:::
+
+## `anomaly_score_zscore`
+
+Computes a Z-Score-based anomaly score for each row in a window.
+
+**Formula:** `|x − mean| / stddev`
+
+**Minimum valid samples:** 2 (uses population stddev, i.e. dividing by n)
+
+```sql
+anomaly_score_zscore(value) OVER (window_spec)
+```
+
+**Arguments:**
+
+- **value**: A numeric column or expression to evaluate.
+
+**Return type:** `DOUBLE`
+
+**Degenerate cases:**
+
+| Condition | Result |
+|---|---|
+| Fewer than 2 valid points in window | `NULL` |
+| `stddev = 0` and `value = mean` | `0.0` |
+| `stddev = 0` and `value ≠ mean` | `+inf` |
+| Normal case | Finite positive `DOUBLE` |
+
+**Example:**
+
+```sql
+SELECT
+    ts,
+    val,
+    anomaly_score_zscore(val) OVER (
+        ORDER BY ts
+        ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
+    ) AS zscore
+FROM metrics
+ORDER BY ts;
+```
+
+## `anomaly_score_mad`
+
+Computes a Median Absolute Deviation (MAD)-based anomaly score for each row in a window.
+MAD is more robust than Z-Score because it is not influenced by extreme outliers.
+
+**Formula:** `|x − median| / (MAD × 1.4826)`
+
+The constant 1.4826 is a consistency factor that makes the MAD-based score asymptotically equivalent to the Z-Score for normally distributed data.
+
+**Minimum valid samples:** 3 (with ≤ 2 samples, MAD is almost always 0, which yields spurious `+inf` scores)
+
+```sql
+anomaly_score_mad(value) OVER (window_spec)
+```
+
+**Arguments:**
+
+- **value**: A numeric column or expression to evaluate.
+
+**Return type:** `DOUBLE`
+
+**Degenerate cases:**
+
+| Condition | Result |
+|---|---|
+| Fewer than 3 valid points in window | `NULL` |
+| `MAD = 0` and `value = median` | `0.0` |
+| `MAD = 0` and `value ≠ median` | `+inf` |
+| Normal case | Finite positive `DOUBLE` |
+
+**Example:**
+
+```sql
+SELECT
+    ts,
+    val,
+    anomaly_score_mad(val) OVER (
+        PARTITION BY host
+        ORDER BY ts
+        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+    ) AS mad_score
+FROM metrics
+ORDER BY host, ts;
+```
+
+## `anomaly_score_iqr`
+
+Computes an IQR (Interquartile Range / Tukey Fences)-based anomaly score for each row in a window.
+The score measures the distance of the value beyond the lower fence (`Q1 − k × IQR`) or upper fence (`Q3 + k × IQR`).
+Values within the fences receive a score of `0.0`.
+
+**Formula:**
+- If `value < Q1 − k × IQR`: score = `(Q1 − k × IQR − value) / IQR`
+- If `value > Q3 + k × IQR`: score = `(value − Q3 − k × IQR) / IQR`
+- Otherwise: score = `0.0`
+
+**Minimum valid samples:** 3 (linear-interpolated Q1 ≠ Q3 is only possible at n ≥ 3)
+
+```sql
+anomaly_score_iqr(value, k) OVER (window_spec)
+```
+
+**Arguments:**
+
+- **value**: A numeric column or expression to evaluate.
+- **k**: A non-negative `DOUBLE` multiplier for the IQR fences (e.g., `1.5` for standard Tukey fences, `3.0` for far-out fences). Returns `NULL` if `k < 0`.
+
+**Return type:** `DOUBLE`
+
+**Degenerate cases:**
+
+| Condition | Result |
+|---|---|
+| Fewer than 3 valid points in window | `NULL` |
+| `IQR = 0` and value is within fences | `0.0` |
+| `IQR = 0` and value is outside fences | `+inf` |
+| Normal case | Finite non-negative `DOUBLE` |
+
+**Example:**
+
+```sql
+SELECT
+    ts,
+    val,
+    anomaly_score_iqr(val, 1.5) OVER (
+        PARTITION BY host
+        ORDER BY ts
+        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+    ) AS iqr_score
+FROM metrics
+ORDER BY host, ts;
+```
+
+## Full Usage Example
+
+This example creates a sample table, inserts time-series data with an injected outlier, and then uses all three anomaly functions together.
+
+```sql
+CREATE TABLE sensor_data (
+    host   STRING,
+    val    DOUBLE,
+    ts     TIMESTAMP TIME INDEX,
+    PRIMARY KEY (host)
+);
+
+INSERT INTO sensor_data VALUES
+    ('web-1', 10.0, '2025-01-01 00:00:00'),
+    ('web-1', 11.0, '2025-01-01 00:01:00'),
+    ('web-1', 10.5, '2025-01-01 00:02:00'),
+    ('web-1', 10.8, '2025-01-01 00:03:00'),
+    ('web-1', 80.0, '2025-01-01 00:04:00'),  -- outlier
+    ('web-1', 10.3, '2025-01-01 00:05:00'),
+    ('web-1', 11.2, '2025-01-01 00:06:00');
+```
+
+Use a shared named window and round results for readability:
+
+```sql
+SELECT
+    ts,
+    val,
+    ROUND(anomaly_score_zscore(val) OVER w, 2) AS zscore,
+    ROUND(anomaly_score_mad(val)    OVER w, 2) AS mad,
+    ROUND(anomaly_score_iqr(val, 1.5) OVER w, 2) AS iqr
+FROM sensor_data
+WINDOW w AS (
+    PARTITION BY host
+    ORDER BY ts
+    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+)
+ORDER BY ts;
+```
+
+Expected output — the outlier row (`val = 80.0`) produces significantly higher scores across all three metrics:
+
+```
++---------------------+------+--------+--------+-------+
+| ts                  | val  | zscore | mad    | iqr   |
++---------------------+------+--------+--------+-------+
+| 2025-01-01 00:00:00 |   10 |   NULL |   NULL |  NULL |
+| 2025-01-01 00:01:00 |   11 |      1 |   NULL |  NULL |
+| 2025-01-01 00:02:00 | 10.5 |      0 |      0 |     0 |
+| 2025-01-01 00:03:00 | 10.8 |    0.6 |    0.4 |     0 |
+| 2025-01-01 00:04:00 |   80 |      2 | 155.58 | 136.5 |
+| 2025-01-01 00:05:00 | 10.3 |   0.46 |   0.67 |     0 |
+| 2025-01-01 00:06:00 | 11.2 |   0.38 |   0.67 |     0 |
++---------------------+------+--------+--------+-------+
+```
+
+### Filter Anomalous Rows
+
+You can wrap the window query in a subquery to keep only rows whose score exceeds a threshold:
+
+```sql
+SELECT * FROM (
+    SELECT
+        host,
+        ts,
+        val,
+        ROUND(anomaly_score_mad(val) OVER (
+            PARTITION BY host
+            ORDER BY ts
+            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+        ), 2) AS mad
+    FROM sensor_data
+) WHERE mad > 3.0
+ORDER BY host, ts;
+```
+
+Expected output:
+
+```
++-------+---------------------+------+--------+
+| host  | ts                  | val  | mad    |
++-------+---------------------+------+--------+
+| web-1 | 2025-01-01 00:04:00 |   80 | 155.58 |
++-------+---------------------+------+--------+
+```
@@ -17,3 +17,4 @@ Use this page as a quick index to GreptimeDB function references.
   - [JSON Functions](./json.md)
   - [Vector Functions](./vector.md)
   - [Approximate Functions](./approximate.md)
+  - [Anomaly Detection Functions](./anomaly.md)