Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCAPI-7415: Docs for the -Resample aggregate function combinator. #5972

Merged
merged 3 commits into from
Aug 15, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/en/query_language/agg_functions/combinators.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,71 @@ Merges the intermediate aggregation states in the same way as the -Merge combina

Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results. For example, `sumForEach` for the arrays `[1, 2]`, `[3, 4, 5]`and`[6, 7]`returns the result `[10, 13, 5]` after adding together the corresponding array items.

## -Resample

Allows to divide data by groups, and then separately aggregates the data in those groups. Groups are created by splitting the values of one of the columns into intervals.

```
<aggFunction>Resample(start, end, step)(<aggFunction_params>, resampling_key)
```

**Parameters**

- `start` — Starting value of the whole required interval for the values of `resampling_key`.
- `stop` — Ending value of the whole required interval for the values of `resampling_key`. The whole interval doesn't include the `stop` value `[start, stop)`.
- `step` — Step for separating the whole interval by subintervals. The `aggFunction` is executed over each of those subintervals independently.
- `resampling_key` — Column, which values are used for separating data by intervals.
- `aggFunction_params` — Parameters of `aggFunction`.


**Returned values**

- Array of `aggFunction` results for each of subintervals.

**Example**

Consider the `people` table with the following data:

```text
┌─name───┬─age─┬─wage─┐
│ John │ 16 │ 10 │
│ Alice │ 30 │ 15 │
│ Mary │ 35 │ 8 │
│ Evelyn │ 48 │ 11.5 │
│ David │ 62 │ 9.9 │
│ Brian │ 60 │ 16 │
└────────┴─────┴──────┘
```

Let's get the names of the persons which age lies in the intervals of `[30,60)` and `[60,75)`. As we use integer representation of age, then there are ages of `[30, 59]` and `[60,74]`.

For aggregating names into the array, we use the aggregate function [groupArray](reference.md#agg_function-grouparray). It takes a single argument. For our case, it is the `name` column. The `groupArrayResample` function should use the `age` column to aggregate names by age. To define required intervals, we pass the `(30, 75, 30)` arguments into the `groupArrayResample` function.

```sql
SELECT groupArrayResample(30, 75, 30)(name, age) from people
```
```text
┌─groupArrayResample(30, 75, 30)(name, age)─────┐
│ [['Alice','Mary','Evelyn'],['David','Brian']] │
└───────────────────────────────────────────────┘
```

Consider the results.

`Jonh` is out of the sample because he is too young. Other people are distributed according to the specified age intervals.

Now, let's count the total number of people and their average wage in the specified age intervals.

```sql
SELECT
countResample(30, 75, 30)(name, age) AS amount,
avgResample(30, 75, 30)(wage, age) AS avg_wage
FROM people
```
```text
┌─amount─┬─avg_wage──────────────────┐
│ [3,2] │ [11.5,12.949999809265137] │
└────────┴───────────────────────────┘
```

[Original article](https://clickhouse.yandex/docs/en/query_language/agg_functions/combinators/) <!--hide-->
2 changes: 1 addition & 1 deletion docs/en/query_language/agg_functions/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -650,7 +650,7 @@ The function takes a variable number of parameters. Parameters can be `Tuple`, `
- [uniqHLL12](#agg_function-uniqhll12)


## groupArray(x), groupArray(max_size)(x)
## groupArray(x), groupArray(max_size)(x) {#agg_function-grouparray}

Creates an array of argument values.
Values can be added to the array in any (indeterminate) order.
Expand Down