Skip to content

Commit

Permalink
Merge remote-tracking branch 'ClickHouse/master' into improve_pr_62592
Browse files Browse the repository at this point in the history
  • Loading branch information
rschu1ze committed Apr 29, 2024
2 parents 5946ff0 + 8fd38f2 commit 619d030
Show file tree
Hide file tree
Showing 32 changed files with 667 additions and 49 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/debug.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ jobs:
DebugInfo:
runs-on: ubuntu-latest
steps:
- uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
- uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
2 changes: 1 addition & 1 deletion .github/workflows/master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
data: ${{ steps.runconfig.outputs.CI_DATA }}
steps:
- name: DebugInfo
uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
data: ${{ steps.runconfig.outputs.CI_DATA }}
steps:
- name: DebugInfo
uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/reusable_simple_job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ jobs:
GITHUB_JOB_OVERRIDDEN: ${{inputs.test_name}}
steps:
- name: DebugInfo
uses: hmarr/debug-action@a701ed95a46e6f2fb0df25e1a558c16356fae35a
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:
Expand Down
6 changes: 3 additions & 3 deletions docs/en/engines/table-engines/mergetree-family/mergetree.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,9 +287,9 @@ The number of columns in the primary key is not explicitly limited. Depending on

A long primary key will negatively affect the insert performance and memory consumption, but extra columns in the primary key do not affect ClickHouse performance during `SELECT` queries.

You can create a table without a primary key using the `ORDER BY tuple()` syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by `INSERT ... SELECT` queries, set [max_insert_threads = 1](/docs/en/operations/settings/settings.md/#settings-max-insert-threads).
You can create a table without a primary key using the `ORDER BY tuple()` syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by `INSERT ... SELECT` queries, set [max_insert_threads = 1](/docs/en/operations/settings/settings.md/#max-insert-threads).

To select data in the initial order, use [single-threaded](/docs/en/operations/settings/settings.md/#settings-max_threads) `SELECT` queries.
To select data in the initial order, use [single-threaded](/docs/en/operations/settings/settings.md/#max_threads) `SELECT` queries.

### Choosing a Primary Key that Differs from the Sorting Key {#choosing-a-primary-key-that-differs-from-the-sorting-key}

Expand Down Expand Up @@ -344,7 +344,7 @@ In the example below, the index can’t be used.
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```

To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](/docs/en/operations/settings/settings.md/#settings-force_index_by_date) and [force_primary_key](/docs/en/operations/settings/settings.md/#force-primary-key).
To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](/docs/en/operations/settings/settings.md/#force_index_by_date) and [force_primary_key](/docs/en/operations/settings/settings.md/#force-primary-key).

The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ You can specify any existing ZooKeeper cluster and the system will use a directo

If ZooKeeper is not set in the config file, you can’t create replicated tables, and any existing replicated tables will be read-only.

ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](/docs/en/operations/settings/settings.md/#settings-max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](/docs/en/operations/settings/settings.md/#settings-fallback_to_stale_replicas_for_distributed_queries).
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](/docs/en/operations/settings/settings.md/#max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](/docs/en/operations/settings/settings.md/#fallback_to_stale_replicas_for_distributed_queries).

For each `INSERT` query, approximately ten entries are added to ZooKeeper through several transactions. (To be more precise, this is for each inserted block of data; an INSERT query contains one block or one block per `max_insert_block_size = 1048576` rows.) This leads to slightly longer latencies for `INSERT` compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one `INSERT` per second, it does not create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred `INSERTs` per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.

Expand Down
2 changes: 1 addition & 1 deletion docs/en/engines/table-engines/special/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ When creating a table, the following settings are applied:

#### join_any_take_last_row

[join_any_take_last_row](/docs/en/operations/settings/settings.md/#settings-join_any_take_last_row)
[join_any_take_last_row](/docs/en/operations/settings/settings.md/#join_any_take_last_row)
#### join_use_nulls

#### persistent
Expand Down
112 changes: 110 additions & 2 deletions docs/en/sql-reference/aggregate-functions/parametric-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -505,9 +505,117 @@ HAVING uniqUpTo(4)(UserID) >= 5

`uniqUpTo(4)(UserID)` calculates the number of unique `UserID` values for each `SearchPhrase`, but it only counts up to 4 unique values. If there are more than 4 unique `UserID` values for a `SearchPhrase`, the function returns 5 (4 + 1). The `HAVING` clause then filters out the `SearchPhrase` values for which the number of unique `UserID` values is less than 5. This will give you a list of search keywords that were used by at least 5 unique users.

## sumMapFiltered(keys_to_keep)(keys, values)
## sumMapFiltered

Same behavior as [sumMap](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap) except that an array of keys is passed as a parameter. This can be especially useful when working with a high cardinality of keys.
This function behaves the same as [sumMap](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap) except that it also accepts an array of keys to filter with as a parameter. This can be especially useful when working with a high cardinality of keys.

**Syntax**

`sumMapFiltered(keys_to_keep)(keys, values)`

**Parameters**

- `keys_to_keep`: [Array](../data-types/array.md) of keys to filter with.
- `keys`: [Array](../data-types/array.md) of keys.
- `values`: [Array](../data-types/array.md) of values.

**Returned Value**

- Returns a tuple of two arrays: keys in sorted order, and values ​​summed for the corresponding keys.

**Example**

Query:

```sql
CREATE TABLE sum_map
(
`date` Date,
`timeslot` DateTime,
`statusMap` Nested(status UInt16, requests UInt64)
)
ENGINE = Log

INSERT INTO sum_map VALUES
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10]);
```

```sql
SELECT sumMapFiltered([1, 4, 8])(statusMap.status, statusMap.requests) FROM sum_map;
```

Result:

```response
┌─sumMapFiltered([1, 4, 8])(statusMap.status, statusMap.requests)─┐
1. │ ([1,4,8],[10,20,10]) │
└─────────────────────────────────────────────────────────────────┘
```

## sumMapFilteredWithOverflow

This function behaves the same as [sumMap](../../sql-reference/aggregate-functions/reference/summap.md#agg_functions-summap) except that it also accepts an array of keys to filter with as a parameter. This can be especially useful when working with a high cardinality of keys. It differs from the [sumMapFiltered](#summapfiltered) function in that it does summation with overflow - i.e. returns the same data type for the summation as the argument data type.

**Syntax**

`sumMapFilteredWithOverflow(keys_to_keep)(keys, values)`

**Parameters**

- `keys_to_keep`: [Array](../data-types/array.md) of keys to filter with.
- `keys`: [Array](../data-types/array.md) of keys.
- `values`: [Array](../data-types/array.md) of values.

**Returned Value**

- Returns a tuple of two arrays: keys in sorted order, and values ​​summed for the corresponding keys.

**Example**

In this example we create a table `sum_map`, insert some data into it and then use both `sumMapFilteredWithOverflow` and `sumMapFiltered` and the `toTypeName` function for comparison of the result. Where `requests` was of type `UInt8` in the created table, `sumMapFiltered` has promoted the type of the summed values to `UInt64` to avoid overflow whereas `sumMapFilteredWithOverflow` has kept the type as `UInt8` which is not large enough to store the result - i.e. overflow has occurred.

Query:

```sql
CREATE TABLE sum_map
(
`date` Date,
`timeslot` DateTime,
`statusMap` Nested(status UInt8, requests UInt8)
)
ENGINE = Log

INSERT INTO sum_map VALUES
('2000-01-01', '2000-01-01 00:00:00', [1, 2, 3], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:00:00', [3, 4, 5], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [4, 5, 6], [10, 10, 10]),
('2000-01-01', '2000-01-01 00:01:00', [6, 7, 8], [10, 10, 10]);
```

```sql
SELECT sumMapFilteredWithOverflow([1, 4, 8])(statusMap.status, statusMap.requests) as summap_overflow, toTypeName(summap_overflow) FROM sum_map;
```

```sql
SELECT sumMapFiltered([1, 4, 8])(statusMap.status, statusMap.requests) as summap, toTypeName(summap) FROM sum_map;
```

Result:

```response
┌─sum──────────────────┬─toTypeName(sum)───────────────────┐
1. │ ([1,4,8],[10,20,10]) │ Tuple(Array(UInt8), Array(UInt8)) │
└──────────────────────┴───────────────────────────────────┘
```

```response
┌─summap───────────────┬─toTypeName(summap)─────────────────┐
1. │ ([1,4,8],[10,20,10]) │ Tuple(Array(UInt8), Array(UInt64)) │
└──────────────────────┴────────────────────────────────────┘
```

## sequenceNextNode

Expand Down
5 changes: 5 additions & 0 deletions docs/en/sql-reference/aggregate-functions/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ Standard aggregate functions:
- [avg](/docs/en/sql-reference/aggregate-functions/reference/avg.md)
- [any](/docs/en/sql-reference/aggregate-functions/reference/any.md)
- [stddevPop](/docs/en/sql-reference/aggregate-functions/reference/stddevpop.md)
- [stddevPopStable](/docs/en/sql-reference/aggregate-functions/reference/stddevpopstable.md)
- [stddevSamp](/docs/en/sql-reference/aggregate-functions/reference/stddevsamp.md)
- [stddevSampStable](/docs/en/sql-reference/aggregate-functions/reference/stddevsampstable.md)
- [varPop](/docs/en/sql-reference/aggregate-functions/reference/varpop.md)
- [varSamp](/docs/en/sql-reference/aggregate-functions/reference/varsamp.md)
- [corr](./corr.md)
Expand Down Expand Up @@ -65,6 +67,9 @@ ClickHouse-specific aggregate functions:
- [groupBitmapXor](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapxor.md)
- [sumWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/sumwithoverflow.md)
- [sumMap](/docs/en/sql-reference/aggregate-functions/reference/summap.md)
- [sumMapWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/summapwithoverflow.md)
- [sumMapFiltered](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfiltered)
- [sumMapFilteredWithOverflow](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfilteredwithoverflow)
- [minMap](/docs/en/sql-reference/aggregate-functions/reference/minmap.md)
- [maxMap](/docs/en/sql-reference/aggregate-functions/reference/maxmap.md)
- [skewSamp](/docs/en/sql-reference/aggregate-functions/reference/skewsamp.md)
Expand Down
50 changes: 45 additions & 5 deletions docs/en/sql-reference/aggregate-functions/reference/stddevpop.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,50 @@ sidebar_position: 30

The result is equal to the square root of [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md).

Alias:
- `STD`
- `STDDEV_POP`
Aliases: `STD`, `STDDEV_POP`.

:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `stddevPopStable` function. It works slower but provides a lower computational error.
:::
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`stddevPopStable`](../reference/stddevpopstable.md) function. It works slower but provides a lower computational error.
:::

**Syntax**

```sql
stddevPop(x)
```

**Parameters**

- `x`: Population of values to find the standard deviation of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).

**Returned value**

Square root of standard deviation of `x`. [Float64](../../data-types/float.md).


**Example**

Query:

```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population UInt8,
)
ENGINE = Log;

INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);

SELECT
stddevPop(population) AS stddev
FROM test_data;
```

Result:

```response
┌────────────stddev─┐
│ 3.794733192202055 │
└───────────────────┘
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
slug: /en/sql-reference/aggregate-functions/reference/stddevpopstable
sidebar_position: 30
---

# stddevPopStable

The result is equal to the square root of [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md). Unlike [`stddevPop`](../reference/stddevpop.md), this function uses a numerically stable algorithm. It works slower but provides a lower computational error.

**Syntax**

```sql
stddevPopStable(x)
```

**Parameters**

- `x`: Population of values to find the standard deviation of. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).

**Returned value**

Square root of standard deviation of `x`. [Float64](../../data-types/float.md).

**Example**

Query:

```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population Float64,
)
ENGINE = Log;

INSERT INTO test_data SELECT randUniform(5.5, 10) FROM numbers(1000000)

SELECT
stddevPopStable(population) AS stddev
FROM test_data;
```

Result:

```response
┌─────────────stddev─┐
│ 1.2999977786592576 │
└────────────────────┘
```
45 changes: 43 additions & 2 deletions docs/en/sql-reference/aggregate-functions/reference/stddevsamp.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,46 @@ The result is equal to the square root of [varSamp](../../../sql-reference/aggre
Alias: `STDDEV_SAMP`.

:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `stddevSampStable` function. It works slower but provides a lower computational error.
:::
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`stddevSampStable`](../reference/stddevsampstable.md) function. It works slower but provides a lower computational error.
:::

**Syntax**

```sql
stddevSamp(x)
```

**Parameters**

- `x`: Values for which to find the square root of sample variance. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal*](../../data-types/decimal.md).

**Returned value**

Square root of sample variance of `x`. [Float64](../../data-types/float.md).

**Example**

Query:

```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
population UInt8,
)
ENGINE = Log;

INSERT INTO test_data VALUES (3),(3),(3),(4),(4),(5),(5),(7),(11),(15);

SELECT
stddevSamp(population)
FROM test_data;
```

Result:

```response
┌─stddevSamp(population)─┐
│ 4 │
└────────────────────────┘
```

0 comments on commit 619d030

Please sign in to comment.