Skip to content

Commit

Permalink
docs: Add ref for range query
Browse files Browse the repository at this point in the history
  • Loading branch information
Taylor-lagrange committed Oct 12, 2023
1 parent 87a2749 commit 96d6386
Show file tree
Hide file tree
Showing 2 changed files with 622 additions and 0 deletions.
311 changes: 311 additions & 0 deletions docs/en/v0.4/reference/sql/range.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
# RANGE QUERY

Querying and aggregating data within a range of time is a common query pattern for time series data, such as the `Range selector` in `PromQL`. GreptimeDB supports Range queries in SQL, which is used to summarize time series data into time chunks and aggregate data on time chunks. As part of the `SELECT` statement, Range query can be flexibly combined with SQL to provide more powerful time series data query capabilities in SQL.

## Syntax


The following `host_cpu` table records the CPU consumed by two machines `host1` and `host2` at a certain time. This table is used as an example to introduce how to perform Range query.

```sql
+---------------------+-------+------+
| ts | host | cpu |
+---------------------+-------+------+
| 1970-01-01 08:00:00 | host1 | 1.1 |
| 1970-01-01 08:00:05 | host1 | 2.2 |
| 1970-01-01 08:00:00 | host2 | 3.3 |
| 1970-01-01 08:00:05 | host2 | 4.4 |
+---------------------+-------+------+
```

Create a data table using the following SQL statement:

```sql
CREATE TABLE host_cpu (
ts timestamp(3) time index,
host STRING PRIMARY KEY,
cpu Float64,
);

INSERT INTO TABLE host_cpu VALUES
(0, 'host1', 1.1),
(5000, 'host1', 2.2),
(0, 'host2', 3.3),
(5000, 'host2', 4.4);
```

Run this Range query:

```sql
SELECT
ts,
host,
min(cpu) RANGE '10s',
max(cpu) RANGE '10s' FILL LINEAR
FROM host_cpu
ALIGN '5s' by (host) FILL PREV;
```

Get after running

```sql
+---------------------+-------+---------------------------------------+-----------------------------------------+
| ts | host | MIN(host_cpu.cpu) RANGE 10s FILL PREV | MAX(host_cpu.cpu) RANGE 10s FILL LINEAR |
+---------------------+-------+---------------------------------------+-----------------------------------------+
| 1970-01-01 08:00:00 | host2 | 3.3 | 3.3 |
| 1970-01-01 08:00:05 | host2 | 3.3 | 4.4 |
| 1970-01-01 08:00:10 | host2 | 4.4 | 4.4 |
| 1970-01-01 08:00:00 | host1 | 1.1 | 1.1 |
| 1970-01-01 08:00:05 | host1 | 1.1 | 2.2 |
| 1970-01-01 08:00:10 | host1 | 2.2 | 2.2 |
+---------------------+-------+---------------------------------------+-----------------------------------------+
```


The above RANGE query counts the minimum and maximum CPU usage of a machine within 10 seconds every 5 seconds:
1. `ALIGN '5s'` specifies the that data statistics should be performed in steps of 5s. The step is aligned to the calendar.
2. `by (host)` specifies the aggregate key, and the `by` keyword supports omission. If the `by` keyword is omitted, the primary key of the data table is used as the aggregate key by default.
3. `max(cpu) RANGE '10s' FILL LINEAR` is a Range expression. `RANGE '10s'` specifies that the time span of the aggregation is 10s, and `FILL LINEAR` specifies that if there is no data within a certain aggregation time, use the `LINEAR` method to fill it.
4. The `FILL` keyword can follow the `RANGE` keyword to indicate the filling method of this Range expression. The `FILL` keyword can also follow the `by` keyword, as if the `FILL` key is not given The default value for a word's Range expression. `min(cpu) RANGE '10s'` This Range expression does not provide `FILL`, so the default FILL filling method is used, which is `PREV`. If the default FILL is not given, `NULL` is used to fill.
5. The time duration parameters (e.g. `5s`) of the `RANGE` and `ALIGN` keywords follow the `Time Durations` type of `PromQL`, visit [Prometheus Documentation](https://prometheus.io/docs/ prometheus/latest/querying/basics/#time-durations)
Get more detailed instructions.

If you want to use Range query, there must be a column of data in the data table declared as `time index` when creating the table. Range query uses this column data as the timeline basis for aggregation. If the data table does not specify a primary key, the keyword `by` cannot be omitted. Users can also use the `by` keyword to declare other columns as the basis for data aggregation. For example, the following RANGE query uses the string length `length(host)` of the `host` column as the basis for data aggregation.

```sql
SELECT
ts,
length(host),
min(cpu) RANGE '10s'
FROM host_cpu ALIGN '5s' by (length(host));
```

Get after running

```sql
+---------------------+---------------------------------+---------------------------------------+
| ts | character_length(host_cpu.host) | MIN(host_cpu.cpu) RANGE 10s FILL NULL |
+---------------------+---------------------------------+---------------------------------------+
| 1970-01-01 08:00:00 | 5 | 1.1 |
| 1970-01-01 08:00:05 | 5 | 1.1 |
| 1970-01-01 08:00:10 | 5 | 2.2 |
+---------------------+---------------------------------+---------------------------------------+
```



## FILL OPTION

FILL has the following filling methods:

| FILL | DESCRIPTION |
| :------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| `NULL` | Fill directly with `NULL` |
| `PREV` | Fill with data from previous point |
| `LINEAR` | Use the data of the two previous points to average. If an integer type is filled with `LINEAR`, the variable type of the column will be implicitly converted to a floating point type during calculation |
| `X` | Fill in a constant, the data type of the constant must be consistent with the variable type of the Range expression |


Take the following table as an example

```sql
+---------------------+-------+------+
| ts | host | cpu |
+---------------------+-------+------+
| 1970-01-01 08:00:00 | host1 | 4.5 |
| 1970-01-01 08:00:05 | host1 | NULL |
| 1970-01-01 08:00:10 | host1 | 6.5 |
+---------------------+-------+------+
```

Use `NULL` for FILL

```sql
SELECT ts, min(cpu) RANGE '5s' FILL NULL FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+--------------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL NULL |
+---------------------+--------------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | NULL |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+--------------------------------------+
```

Use `PREV` for FILL

```sql
SELECT ts, min(cpu) RANGE '5s' FILL PREV FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+--------------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL PREV |
+---------------------+--------------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | 4.5 |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+--------------------------------------+
```

FILL using `LINEAR`

```sql
SELECT ts, min(cpu) RANGE '5s' FILL LINEAR FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+----------------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL LINEAR |
+---------------------+----------------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | 5.5 |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+----------------------------------------+
```

Use constant value `6.0` for FILL

```sql
SELECT ts, min(cpu) RANGE '5s' FILL 6.0 FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-----------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL 6 |
+---------------------+-----------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | 6 |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+-----------------------------------+
```

## Nested Range Expressions

Range expressions support flexible nesting, and range expressions can be combined with various operations to provide more powerful query capabilities.

Take the following table as an example:

```sql
+---------------------+-------+------+
| ts | host | cpu |
+---------------------+-------+------+
| 1970-01-01 08:00:00 | host1 | 1.1 |
| 1970-01-01 08:00:05 | host1 | 2.2 |
| 1970-01-01 08:00:00 | host2 | 3.3 |
| 1970-01-01 08:00:05 | host2 | 4.4 |
+---------------------+-------+------+
```

1. Aggregation functions support calculations both internally and externally:

```sql
SELECT ts, host, 2.0 * min(cpu * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-------+-----------------------------------------------------------------+
| ts | host | Float64(2) * MIN(host_cpu.cpu * Float64(2)) RANGE 10s FILL NULL |
+---------------------+-------+-----------------------------------------------------------------+
| 1970-01-01 08:00:00 | host2 | 13.2 |
| 1970-01-01 08:00:10 | host2 | 17.6 |
| 1970-01-01 08:00:05 | host2 | 13.2 |
| 1970-01-01 08:00:10 | host1 | 8.8 |
| 1970-01-01 08:00:05 | host1 | 4.4 |
| 1970-01-01 08:00:00 | host1 | 4.4 |
+---------------------+-------+-----------------------------------------------------------------+
```


2. Scalar functions are supported both inside and outside aggregate functions:
- `min(round(cpu)) RANGE '10s'` means that each value is rounded using the `round` function before aggregation
- `round(min(cpu) RANGE '10s')` means rounding the result of each aggregation using the `round` function

```sql
SELECT ts, host, min(round(cpu)) RANGE '10s' FROM host_cpu ALIGN '5s';
```
Get after running

```sql
+---------------------+-------+----------------------------------------------+
| ts | host | MIN(round(host_cpu.cpu)) RANGE 10s FILL NULL |
+---------------------+-------+----------------------------------------------+
| 1970-01-01 08:00:05 | host1 | 1 |
| 1970-01-01 08:00:00 | host1 | 1 |
| 1970-01-01 08:00:10 | host1 | 2 |
| 1970-01-01 08:00:00 | host2 | 3 |
| 1970-01-01 08:00:10 | host2 | 4 |
| 1970-01-01 08:00:05 | host2 | 3 |
+---------------------+-------+----------------------------------------------+
```


```sql
SELECT ts, host, round(min(cpu) RANGE '10s') FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-------+----------------------------------------------+
| ts | host | round(MIN(host_cpu.cpu) RANGE 10s FILL NULL) |
+---------------------+-------+----------------------------------------------+
| 1970-01-01 08:00:00 | host2 | 3 |
| 1970-01-01 08:00:10 | host2 | 4 |
| 1970-01-01 08:00:05 | host2 | 3 |
| 1970-01-01 08:00:05 | host1 | 1 |
| 1970-01-01 08:00:00 | host1 | 1 |
| 1970-01-01 08:00:10 | host1 | 2 |
+---------------------+-------+----------------------------------------------+
```

3. Multiple Range expressions can also evaluate together, and Range expressions support the distributive law. The following two Range query are legal and equivalent:

```sql
SELECT ts, host, max(cpu) RANGE '10s' - min(cpu) RANGE '10s' FROM host_cpu ALIGN '5s';

SELECT ts, host, (max(cpu) - min(cpu)) RANGE '10s' FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-------+-------------------------------------------------------------------------------+
| ts | host | MAX(host_cpu.cpu) RANGE 10s FILL NULL - MIN(host_cpu.cpu) RANGE 10s FILL NULL |
+---------------------+-------+-------------------------------------------------------------------------------+
| 1970-01-01 08:00:10 | host1 | 0 |
| 1970-01-01 08:00:05 | host1 | 1.1 |
| 1970-01-01 08:00:00 | host1 | 0 |
| 1970-01-01 08:00:05 | host2 | 1.1 |
| 1970-01-01 08:00:10 | host2 | 0 |
| 1970-01-01 08:00:00 | host2 | 0 |
+---------------------+-------+-------------------------------------------------------------------------------+
```

But note that the `RANGE` keyword apply to the expression before the `RANGE` keyword. The following Range query is illegal because the `RANGE` keyword apply to the expression `2`, not the expression `min(cpu * 2) * 2`

```sql
SELECT ts, host, min(cpu * 2) * 2 RANGE '10s' FROM host_cpu ALIGN '5s';

ERROR 1815 (HY000): sql parser error: Can't use the RANGE keyword in Expr 2 without function
```
Nesting of Range query is not allowed, and nested Range queries are illegal:
```sql
SELECT ts, host, max(min(cpu) RANGE '10s') RANGE '10s' FROM host_cpu ALIGN '5s';
ERROR 1815 (HY000): Range Query: Nest Range Query is not allowed
```

0 comments on commit 96d6386

Please sign in to comment.