Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions docs/cloud/bestpractices/asyncinserts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ sidebar_label: Asynchronous Inserts
title: Asynchronous Inserts (async_insert)
---

import asyncInsert01 from '@site/static/images/cloud/bestpractices/async-01.png';
import asyncInsert02 from '@site/static/images/cloud/bestpractices/async-02.png';
import asyncInsert03 from '@site/static/images/cloud/bestpractices/async-03.png';

Inserting data into ClickHouse in large batches is a best practice. It saves compute cycles and disk I/O, and therefore it saves money. If your use case allows you to batch your inserts external to ClickHouse, then that is one option. If you would like ClickHouse to create the batches, then you can use the asynchronous INSERT mode described here.

Use asynchronous inserts as an alternative to both batching data on the client-side and keeping the insert rate at around one insert query per second by enabling the [async_insert](/operations/settings/settings.md/#async_insert) setting. This causes ClickHouse to handle the batching on the server-side.
Expand All @@ -12,7 +16,10 @@ By default, ClickHouse is writing data synchronously.
Each insert sent to ClickHouse causes ClickHouse to immediately create a part containing the data from the insert.
This is the default behavior when the async_insert setting is set to its default value of 0:

![compression block diagram](images/async-01.png)
<img src={asyncInsert01}
class="image"
alt="Asynchronous insert process - default synchronous inserts"
style={{width: '100%', background: 'none'}} />

By setting async_insert to 1, ClickHouse first stores the incoming inserts into an in-memory buffer before flushing them regularly to disk.

Expand All @@ -30,10 +37,15 @@ With the [wait_for_async_insert](/operations/settings/settings.md/#wait_for_asyn

The following two diagrams illustrate the two settings for async_insert and wait_for_async_insert:

![compression block diagram](images/async-02.png)

![compression block diagram](images/async-03.png)
<img src={asyncInsert02}
class="image"
alt="Asynchronous insert process - async_insert=1, wait_for_async_insert=1"
style={{width: '100%', background: 'none'}} />

<img src={asyncInsert03}
class="image"
alt="Asynchronous insert process - async_insert=1, wait_for_async_insert=0"
style={{width: '100%', background: 'none'}} />

### Enabling asynchronous inserts {#enabling-asynchronous-inserts}

Expand Down
13 changes: 11 additions & 2 deletions docs/cloud/bestpractices/partitioningkey.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,25 @@ sidebar_label: Choose a Low Cardinality Partitioning Key
title: Choose a Low Cardinality Partitioning Key
---

import partitioning01 from '@site/static/images/cloud/bestpractices/partitioning-01.png';
import partitioning02 from '@site/static/images/cloud/bestpractices/partitioning-02.png';

When you send an insert statement (that should contain many rows - see [section above](/optimize/bulk-inserts)) to a table in ClickHouse Cloud, and that
table is not using a [partitioning key](/engines/table-engines/mergetree-family/custom-partitioning-key.md) then all row data from that insert is written into a new part on storage:

![compression block diagram](images/partitioning-01.png)
<img src={partitioning01}
class="image"
alt="Insert without partitioning key - one part created"
style={{width: '100%', background: 'none'}} />

However, when you send an insert statement to a table in ClickHouse Cloud, and that table has a partitioning key, then ClickHouse:
- checks the partitioning key values of the rows contained in the insert
- creates one new part on storage per distinct partitioning key value
- places the rows in the corresponding parts by partitioning key value

![compression block diagram](images/partitioning-02.png)
<img src={partitioning02}
class="image"
alt="Insert with partitioning key - multiple parts created based on partitioning key values"
style={{width: '100%', background: 'none'}} />

Therefore, to minimize the number of write requests to the ClickHouse Cloud object storage, use a low cardinality partitioning key or avoid using any partitioning key for your table.
63 changes: 32 additions & 31 deletions docs/cloud/manage/jan2025_faq/dimensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ keywords: [new pricing, dimensions]
description: Pricing dimensions for data transfer and ClickPipes
---

import clickpipesPricingFaq1 from '@site/static/images/cloud/manage/jan2025_faq/external_clickpipes_pricing_faq_1.png';
import clickpipesPricingFaq2 from '@site/static/images/cloud/manage/jan2025_faq/external_clickpipes_pricing_faq_2.png';
import clickpipesPricingFaq3 from '@site/static/images/cloud/manage/jan2025_faq/external_clickpipes_pricing_faq_3.png';
import NetworkPricing from '@site/docs/cloud/manage/_snippets/_network_transfer_rates.md';


Expand Down Expand Up @@ -34,60 +37,60 @@ Data transfer prices will **not** be tiered as usage increases. Note that the pr
### Why are we introducing a pricing model for ClickPipes now? {#why-are-we-introducing-a-pricing-model-for-clickpipes-now}

We decided to initially launch ClickPipes for free with the idea to gather feedback, refine features,
and ensure it meets user needs.
As the GA platform has grown and effectively stood the test of time by moving trillions of rows,
introducing a pricing model allows us to continue improving the service,
and ensure it meets user needs.
As the GA platform has grown and effectively stood the test of time by moving trillions of rows,
introducing a pricing model allows us to continue improving the service,
maintaining the infrastructure, and providing dedicated support and new connectors.

### What are ClickPipes replicas? {#what-are-clickpipes-replicas}

ClickPipes ingests data from remote data sources via a dedicated infrastructure
that runs and scales independently of the ClickHouse Cloud service.
For this reason, it uses dedicated compute replicas.
ClickPipes ingests data from remote data sources via a dedicated infrastructure
that runs and scales independently of the ClickHouse Cloud service.
For this reason, it uses dedicated compute replicas.
The diagrams below show a simplified architecture.

For streaming ClickPipes, ClickPipes replicas access the remote data sources (e.g., a Kafka broker),
For streaming ClickPipes, ClickPipes replicas access the remote data sources (e.g., a Kafka broker),
pull the data, process and ingest it into the destination ClickHouse service.

![ClickPipes Replicas - Streaming ClickPipes](images/external_clickpipes_pricing_faq_1.png)
<img src={clickpipesPricingFaq1} alt="ClickPipes Replicas - Streaming ClickPipes" />

In the case of object storage ClickPipes,
the ClickPipes replica orchestrates the data loading task
(identifying files to copy, maintaining the state, and moving partitions),
In the case of object storage ClickPipes,
the ClickPipes replica orchestrates the data loading task
(identifying files to copy, maintaining the state, and moving partitions),
while the data is pulled directly from the ClickHouse service.

![ClickPipes Replicas - Object Storage ClickPipes](images/external_clickpipes_pricing_faq_2.png)
<img src={clickpipesPricingFaq2} alt="ClickPipes Replicas - Object Storage ClickPipes" />

### What is the default number of replicas and their size? {#what-is-the-default-number-of-replicas-and-their-size}

Each ClickPipe defaults to 1 replica that is provided with 2 GiB of RAM and 0.5 vCPU.
Each ClickPipe defaults to 1 replica that is provided with 2 GiB of RAM and 0.5 vCPU.
This corresponds to **0.25** ClickHouse compute units (1 unit = 8 GiB RAM, 2 vCPUs).

### Can ClickPipes replicas be scaled? {#can-clickpipes-replicas-be-scaled}

Currently, only ClickPipes for streaming can be scaled horizontally
by adding more replicas each with a base unit of **0.25** ClickHouse compute units.
Currently, only ClickPipes for streaming can be scaled horizontally
by adding more replicas each with a base unit of **0.25** ClickHouse compute units.
Vertical scaling is also available on demand for specific use cases (adding more CPU and RAM per replica).

### How many ClickPipes replicas do I need? {#how-many-clickpipes-replicas-do-i-need}

It depends on the workload throughput and latency requirements.
We recommend starting with the default value of 1 replica, measuring your latency, and adding replicas if needed.
Keep in mind that for Kafka ClickPipes, you also have to scale the Kafka broker partitions accordingly.
It depends on the workload throughput and latency requirements.
We recommend starting with the default value of 1 replica, measuring your latency, and adding replicas if needed.
Keep in mind that for Kafka ClickPipes, you also have to scale the Kafka broker partitions accordingly.
The scaling controls are available under “settings” for each streaming ClickPipe.

![ClickPipes Replicas - How many ClickPipes replicas do I need?](images/external_clickpipes_pricing_faq_3.png)
<img src={clickpipesPricingFaq3} alt="ClickPipes Replicas - How many ClickPipes replicas do I need?" />

### What does the ClickPipes pricing structure look like? {#what-does-the-clickpipes-pricing-structure-look-like}

It consists of two dimensions:
- **Compute**: Price per unit per hour
Compute represents the cost of running the ClickPipes replica pods whether they actively ingest data or not.
- **Compute**: Price per unit per hour
Compute represents the cost of running the ClickPipes replica pods whether they actively ingest data or not.
It applies to all ClickPipes types.
- **Ingested data**: per GB pricing
The ingested data rate applies to all streaming ClickPipes
- **Ingested data**: per GB pricing
The ingested data rate applies to all streaming ClickPipes
(Kafka, Confluent, Amazon MSK, Amazon Kinesis, Redpanda, WarpStream,
Azure Event Hubs) for the data transferred via the replica pods.
Azure Event Hubs) for the data transferred via the replica pods.
The ingested data size (GB) is charged based on bytes received from the source (uncompressed or compressed).

### What are the ClickPipes public prices? {#what-are-the-clickpipes-public-prices}
Expand All @@ -103,8 +106,8 @@ $$
(0.25 \times 0.20 \times 24) + (0.04 \times 1000) = \$41.2
$$

For object storage connectors (S3 and GCS),
only the ClickPipes compute cost is incurred since the ClickPipes pod is not processing data
For object storage connectors (S3 and GCS),
only the ClickPipes compute cost is incurred since the ClickPipes pod is not processing data
but only orchestrating the transfer which is operated by the underlying ClickHouse service:

$$
Expand All @@ -117,13 +120,11 @@ The new pricing model will take effect for all organizations created after Janua

### What happens to current users? {#what-happens-to-current-users}

Existing users will have a **60-day grace period** where the ClickPipes service continues to be offered for free.
Existing users will have a **60-day grace period** where the ClickPipes service continues to be offered for free.
Billing will automatically start for ClickPipes for existing users on **March 24th, 2025.**

### How does ClickPipes pricing compare to the market? {#how-does-clickpipes-pricing-compare-to-the-market}

The philosophy behind ClickPipes pricing is
to cover the operating costs of the platform while offering an easy and reliable way to move data to ClickHouse Cloud.
The philosophy behind ClickPipes pricing is
to cover the operating costs of the platform while offering an easy and reliable way to move data to ClickHouse Cloud.
From that angle, our market analysis revealed that we are positioned competitively.


22 changes: 14 additions & 8 deletions docs/dictionary/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ keywords: [dictionary, dictionaries]
description: A dictionary provides a key-value representation of data for fast lookups.
---

import dictionaryUseCases from '@site/static/images/dictionary/dictionary-use-cases.png';
import dictionaryLeftAnyJoin from '@site/static/images/dictionary/dictionary-left-any-join.png';

# Dictionary

A dictionary in ClickHouse provides an in-memory [key-value](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) representation of data from various [internal and external sources](/sql-reference/dictionaries#dictionary-sources), optimizing for super-low latency lookup queries.
Expand All @@ -13,15 +16,18 @@ Dictionaries are useful for:
- Improving the performance of queries, especially when used with `JOIN`s
- Enriching ingested data on the fly without slowing down the ingestion process

![Uses cases for Dictionary in ClickHouse](./images/dictionary-use-cases.png)
<img src={dictionaryUseCases}
class="image"
alt="Use cases for Dictionary in ClickHouse"
style={{width: '100%', background: 'none'}} />

## Speeding up joins using a Dictionary {#speeding-up-joins-using-a-dictionary}

Dictionaries can be used to speed up a specific type of `JOIN`: the [`LEFT ANY` type](/sql-reference/statements/select/join#supported-types-of-join) where the join key needs to match the key attribute of the underlying key-value storage.

<img src={require('./images/dictionary-left-any-join.png').default}
class='image'
alt='Using Dictionary with LEFT ANY JOIN'
<img src={dictionaryLeftAnyJoin}
class="image"
alt="Using Dictionary with LEFT ANY JOIN"
style={{width: '300px', background: 'none'}} />

If this is the case, ClickHouse can exploit the dictionary to perform a [Direct Join](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4#direct-join). This is ClickHouse's fastest join algorithm and is applicable when the underlying [table engine](/engines/table-engines) for the right-hand side table supports low-latency key-value requests. ClickHouse has three table engines providing this: [Join](/engines/table-engines/special/join) (that is basically a pre-calculated hash table), [EmbeddedRocksDB](/engines/table-engines/integrations/embedded-rocksdb) and [Dictionary](/engines/table-engines/special/dictionary). We will describe the dictionary-based approach, but the mechanics are the same for all three engines.
Expand Down Expand Up @@ -49,7 +55,7 @@ SELECT
Title,
UpVotes,
DownVotes,
abs(UpVotes - DownVotes) AS Controversial_ratio
abs(UpVotes - DownVotes) AS Controversial_ratio
FROM posts
INNER JOIN
(
Expand Down Expand Up @@ -80,7 +86,7 @@ Peak memory usage: 3.18 GiB.

>**Use smaller datasets on the right side of `JOIN`**: This query may seem more verbose than is required, with the filtering on `PostId`s occurring in both the outer and sub queries. This is a performance optimization which ensures the query response time is fast. For optimal performance, always ensure the right side of the `JOIN` is the smaller set and as small as possible. For tips on optimizing JOIN performance and understanding the algorithms available, we recommend [this series of blog articles](https://clickhouse.com/blog/clickhouse-fully-supports-joins-part1).

While this query is fast, it relies on us to write the `JOIN` carefully to achieve good performance. Ideally, we would simply filter the posts to those containing "SQL", before looking at the `UpVote` and `DownVote` counts for the subset of blogs to compute our metric.
While this query is fast, it relies on us to write the `JOIN` carefully to achieve good performance. Ideally, we would simply filter the posts to those containing "SQL", before looking at the `UpVote` and `DownVote` counts for the subset of blogs to compute our metric.

#### Applying a dictionary {#applying-a-dictionary}

Expand Down Expand Up @@ -114,7 +120,7 @@ FROM votes
GROUP BY PostId
```

To create our dictionary requires the following DDL - note the use of our above query:
To create our dictionary requires the following DDL - note the use of our above query:

```sql
CREATE DICTIONARY votes_dict
Expand Down Expand Up @@ -328,7 +334,7 @@ For database sources such as ClickHouse and Postgres, you can set up a query tha

### Other dictionary types {#other-dictionary-types}

ClickHouse also supports [Hierarchical](/sql-reference/dictionaries#hierarchical-dictionaries), [Polygon](/sql-reference/dictionaries#polygon-dictionaries) and [Regular Expression](/sql-reference/dictionaries#regexp-tree-dictionary) dictionaries.
ClickHouse also supports [Hierarchical](/sql-reference/dictionaries#hierarchical-dictionaries), [Polygon](/sql-reference/dictionaries#polygon-dictionaries) and [Regular Expression](/sql-reference/dictionaries#regexp-tree-dictionary) dictionaries.

### More reading {#more-reading}

Expand Down
Binary file not shown.
4 changes: 3 additions & 1 deletion docs/guides/sre/configuring-ssl.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ sidebar_label: Configuring SSL-TLS
sidebar_position: 20
---
import SelfManaged from '@site/docs/_snippets/_self_managed_only_automated.md';
import configuringSsl01 from '@site/static/images/guides/sre/configuring-ssl_01.png';

# Configuring SSL-TLS

Expand Down Expand Up @@ -450,7 +451,8 @@ The typical [4 letter word (4lW)](/guides/sre/keeper/index.md#four-letter-word-c

5. Log into the Play UI using the `https` interface at `https://chnode1.marsnet.local:8443/play`.

![Play UI](images/configuring-ssl_01.png)
<img src={configuringSsl01}
alt="Configuring SSL" />

:::note
the browser will show an untrusted certificate since it is being reached from a workstation and the certificates are not in the root CA stores on the client machine.
Expand Down
Binary file added static/images/interfaces/mysql0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/interfaces/mysql1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/interfaces/mysql2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/interfaces/mysql3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.