Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add ref for range query #618

Merged
merged 3 commits into from
Oct 18, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
332 changes: 332 additions & 0 deletions docs/en/v0.4/reference/sql/range.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
# RANGE QUERY

Querying and aggregating data within a range of time is a common query pattern for time series data, such as the `Range selector` in `PromQL`. GreptimeDB supports Range queries in SQL, which is used to summarize time series data into time chunks and aggregate data on time chunks. As part of the `SELECT` statement, Range query can be flexibly combined with SQL to provide more powerful time series data query capabilities in SQL.

## Syntax

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax should be something like (just an example):

SELECT
  RANGE_EXPRESSION(column1) RANGE TIME_INTERVAL,
  ....
FROM table_name
TIME_DURATION BY(column2) FILLING_METHOD

Then explain every concepts in the syntax:

  • RANGE_EXPRESSION
  • RANGE keyword
  • TIME_INTERVAL
    .....


The following `host_cpu` table records the CPU consumed by two machines `host1` and `host2` at a certain time. This table is used as an example to introduce how to perform Range query.

```sql
+---------------------+-------+------+
| ts | host | cpu |
+---------------------+-------+------+
| 1970-01-01 08:00:00 | host1 | 1.1 |
| 1970-01-01 08:00:05 | host1 | 2.2 |
| 1970-01-01 08:00:00 | host2 | 3.3 |
| 1970-01-01 08:00:05 | host2 | 4.4 |
+---------------------+-------+------+
```

Create a data table using the following SQL statement:

```sql
CREATE TABLE host_cpu (
ts timestamp(3) time index,
host STRING PRIMARY KEY,
cpu Float64,
);

INSERT INTO TABLE host_cpu VALUES
(0, 'host1', 1.1),
(5000, 'host1', 2.2),
(0, 'host2', 3.3),
(5000, 'host2', 4.4);
```

Run this Range query:

```sql
SELECT
ts,
host,
min(cpu) RANGE '10s',
max(cpu) RANGE '10s' FILL LINEAR
FROM host_cpu
ALIGN '5s' BY (host) FILL PREV;
```

Get after running

```sql
+---------------------+-------+---------------------------------------+-----------------------------------------+
| ts | host | MIN(host_cpu.cpu) RANGE 10s FILL PREV | MAX(host_cpu.cpu) RANGE 10s FILL LINEAR |
+---------------------+-------+---------------------------------------+-----------------------------------------+
| 1970-01-01 08:00:00 | host2 | 3.3 | 3.3 |
| 1970-01-01 08:00:05 | host2 | 3.3 | 4.4 |
| 1970-01-01 08:00:10 | host2 | 4.4 | 4.4 |
| 1970-01-01 08:00:00 | host1 | 1.1 | 1.1 |
| 1970-01-01 08:00:05 | host1 | 1.1 | 2.2 |
| 1970-01-01 08:00:10 | host1 | 2.2 | 2.2 |
+---------------------+-------+---------------------------------------+-----------------------------------------+
```


The above RANGE query counts the minimum and maximum CPU usage of a machine within 10 seconds every 5 seconds:
1. `ALIGN '5s'` specifies the that data statistics should be performed in steps of 5s. The step is aligned to the calendar.
2. `BY (host)` specifies the aggregate key, and the `BY` keyword supports omission. If the `BY` keyword is omitted, the primary key of the data table is used as the aggregate key by default.
3. `max(cpu) RANGE '10s' FILL LINEAR` is a Range expression. `RANGE '10s'` specifies that the time span of the aggregation is 10s, and `FILL LINEAR` specifies that if there is no data within a certain aggregation time, use the `LINEAR` method to fill it.
4. The `FILL` keyword can follow the `RANGE` keyword to indicate the filling method of this Range expression. The `FILL` keyword can also follow the `BY` keyword, as if the `FILL` key is not given The default value for a word's Range expression. `min(cpu) RANGE '10s'` This Range expression does not provide `FILL`, so the default FILL filling method is used, which is `PREV`. If the default FILL is not given, `NULL` is used to fill.
5. The time duration parameters (e.g. `5s`) of the `RANGE` and `ALIGN` keywords follow the `Time Durations` type of `PromQL`, visit [Prometheus Documentation](https://prometheus.io/docs/ prometheus/latest/querying/basics/#time-durations)
Get more detailed instructions.

If you want to use Range query, there must be a column of data in the data table declared as `time index` when creating the table. Range query uses this column data as the timeline basis for aggregation. If the data table does not specify a primary key, the keyword `BY` cannot be omitted. Users can also use the `BY` keyword to declare other columns as the basis for data aggregation. For example, the following RANGE query uses the string length `length(host)` of the `host` column as the basis for data aggregation.

```sql
SELECT
ts,
length(host),
min(cpu) RANGE '10s'
FROM host_cpu ALIGN '5s' BY (length(host));
```

Get after running

```sql
+---------------------+---------------------------------+---------------------------------------+
| ts | character_length(host_cpu.host) | MIN(host_cpu.cpu) RANGE 10s FILL NULL |
+---------------------+---------------------------------+---------------------------------------+
| 1970-01-01 08:00:00 | 5 | 1.1 |
| 1970-01-01 08:00:05 | 5 | 1.1 |
| 1970-01-01 08:00:10 | 5 | 2.2 |
+---------------------+---------------------------------+---------------------------------------+
```



## FILL OPTION

FILL has the following filling methods:

| FILL | DESCRIPTION |
| :------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| `NULL` | Fill directly with `NULL` |
| `PREV` | Fill with data from previous point |
| `LINEAR` | Use the data of the two previous points to average. If an integer type is filled with `LINEAR`, the variable type of the column will be implicitly converted to a floating point type during calculation |
| `X` | Fill in a constant, the data type of the constant must be consistent with the variable type of the Range expression |


Take the following table as an example

```sql
+---------------------+-------+------+
| ts | host | cpu |
+---------------------+-------+------+
| 1970-01-01 08:00:00 | host1 | 4.5 |
| 1970-01-01 08:00:05 | host1 | NULL |
| 1970-01-01 08:00:10 | host1 | 6.5 |
+---------------------+-------+------+
```

Use `NULL` for FILL

```sql
SELECT ts, min(cpu) RANGE '5s' FILL NULL FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+--------------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL NULL |
+---------------------+--------------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | NULL |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+--------------------------------------+
```

Use `PREV` for FILL

```sql
SELECT ts, min(cpu) RANGE '5s' FILL PREV FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+--------------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL PREV |
+---------------------+--------------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | 4.5 |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+--------------------------------------+
```

FILL using `LINEAR`

```sql
SELECT ts, min(cpu) RANGE '5s' FILL LINEAR FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+----------------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL LINEAR |
+---------------------+----------------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | 5.5 |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+----------------------------------------+
```

Use constant value `6.0` for FILL

```sql
SELECT ts, min(cpu) RANGE '5s' FILL 6.0 FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-----------------------------------+
| ts | MIN(host_cpu.cpu) RANGE 5s FILL 6 |
+---------------------+-----------------------------------+
| 1970-01-01 08:00:00 | 4.5 |
| 1970-01-01 08:00:05 | 6 |
| 1970-01-01 08:00:10 | 6.5 |
+---------------------+-----------------------------------+
```

## Nested Range Expressions

Range expressions support flexible nesting, and range expressions can be combined with various operations to provide more powerful query capabilities.

Take the following table as an example:

```sql
+---------------------+-------+------+
| ts | host | cpu |
+---------------------+-------+------+
| 1970-01-01 08:00:00 | host1 | 1.1 |
| 1970-01-01 08:00:05 | host1 | 2.2 |
| 1970-01-01 08:00:00 | host2 | 3.3 |
| 1970-01-01 08:00:05 | host2 | 4.4 |
+---------------------+-------+------+
```

1. Aggregation functions support calculations both internally and externally:

```sql
SELECT ts, host, 2.0 * min(cpu * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-------+-----------------------------------------------------------------+
| ts | host | Float64(2) * MIN(host_cpu.cpu * Float64(2)) RANGE 10s FILL NULL |
+---------------------+-------+-----------------------------------------------------------------+
| 1970-01-01 08:00:00 | host2 | 13.2 |
| 1970-01-01 08:00:10 | host2 | 17.6 |
| 1970-01-01 08:00:05 | host2 | 13.2 |
| 1970-01-01 08:00:10 | host1 | 8.8 |
| 1970-01-01 08:00:05 | host1 | 4.4 |
| 1970-01-01 08:00:00 | host1 | 4.4 |
+---------------------+-------+-----------------------------------------------------------------+
```


2. Scalar functions are supported both inside and outside aggregate functions:
- `min(round(cpu)) RANGE '10s'` means that each value is rounded using the `round` function before aggregation
- `round(min(cpu) RANGE '10s')` means rounding the result of each aggregation using the `round` function

```sql
SELECT ts, host, min(round(cpu)) RANGE '10s' FROM host_cpu ALIGN '5s';
```
Get after running

```sql
+---------------------+-------+----------------------------------------------+
| ts | host | MIN(round(host_cpu.cpu)) RANGE 10s FILL NULL |
+---------------------+-------+----------------------------------------------+
| 1970-01-01 08:00:05 | host1 | 1 |
| 1970-01-01 08:00:00 | host1 | 1 |
| 1970-01-01 08:00:10 | host1 | 2 |
| 1970-01-01 08:00:00 | host2 | 3 |
| 1970-01-01 08:00:10 | host2 | 4 |
| 1970-01-01 08:00:05 | host2 | 3 |
+---------------------+-------+----------------------------------------------+
```


```sql
SELECT ts, host, round(min(cpu) RANGE '10s') FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-------+----------------------------------------------+
| ts | host | round(MIN(host_cpu.cpu) RANGE 10s FILL NULL) |
+---------------------+-------+----------------------------------------------+
| 1970-01-01 08:00:00 | host2 | 3 |
| 1970-01-01 08:00:10 | host2 | 4 |
| 1970-01-01 08:00:05 | host2 | 3 |
| 1970-01-01 08:00:05 | host1 | 1 |
| 1970-01-01 08:00:00 | host1 | 1 |
| 1970-01-01 08:00:10 | host1 | 2 |
+---------------------+-------+----------------------------------------------+
```

3. Multiple Range expressions can also evaluate together, and Range expressions support the distributive law. The following two Range query are legal and equivalent:

```sql
SELECT ts, host, max(cpu) RANGE '10s' - min(cpu) RANGE '10s' FROM host_cpu ALIGN '5s';

SELECT ts, host, (max(cpu) - min(cpu)) RANGE '10s' FROM host_cpu ALIGN '5s';
```

Get after running

```sql
+---------------------+-------+-------------------------------------------------------------------------------+
| ts | host | MAX(host_cpu.cpu) RANGE 10s FILL NULL - MIN(host_cpu.cpu) RANGE 10s FILL NULL |
+---------------------+-------+-------------------------------------------------------------------------------+
| 1970-01-01 08:00:10 | host1 | 0 |
| 1970-01-01 08:00:05 | host1 | 1.1 |
| 1970-01-01 08:00:00 | host1 | 0 |
| 1970-01-01 08:00:05 | host2 | 1.1 |
| 1970-01-01 08:00:10 | host2 | 0 |
| 1970-01-01 08:00:00 | host2 | 0 |
+---------------------+-------+-------------------------------------------------------------------------------+
```

But note that the `RANGE` keyword apply to the expression before the `RANGE` keyword. The following Range query is illegal because the `RANGE` keyword apply to the expression `2.0`, not the expression `min(cpu * 2.0) * 2.0`

```sql
SELECT ts, host, min(cpu * 2.0) * 2.0 RANGE '10s' FROM host_cpu ALIGN '5s';

ERROR 1815 (HY000): sql parser error: Can't use the RANGE keyword in Expr 2.0 without function
```

Expressions can be bracketed, and the `RANGE` keyword is automatically applied to any aggregate functions contained within the brackets:

```sql
SELECT ts, host, (min(cpu * 2.0) * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';
```

After running, we get:

```sql
+---------------------+-------+-----------------------------------------------------------------+
| ts | host | MIN(host_cpu.cpu * Float64(2)) RANGE 10s FILL NULL * Float64(2) |
+---------------------+-------+-----------------------------------------------------------------+
| 1970-01-01 08:00:00 | host2 | 13.2 |
| 1970-01-01 08:00:10 | host2 | 17.6 |
| 1970-01-01 08:00:05 | host2 | 13.2 |
| 1970-01-01 08:00:00 | host1 | 4.4 |
| 1970-01-01 08:00:10 | host1 | 8.8 |
| 1970-01-01 08:00:05 | host1 | 4.4 |
+---------------------+-------+-----------------------------------------------------------------+
```

Nesting of Range expressions is not allowed. Nested Range queries are illegal:

```sql
SELECT ts, host, max(min(cpu) RANGE '10s') RANGE '10s' FROM host_cpu ALIGN '5s';

ERROR 1815 (HY000): Range Query: Nest Range Query is not allowed
```
1 change: 1 addition & 0 deletions docs/en/v0.4/summary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@
- limit
- join
- functions
- range
- delete
- show
- tql
Expand Down