Skip to content

Commit 97ce61c

Browse files
committed
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs into clickstack-nginx-guide
2 parents b60b7c1 + a214d55 commit 97ce61c

File tree

45 files changed

+347
-105
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+347
-105
lines changed

docs/cloud/features/08_backups/backups-to-own-cloud/backup_restore_from_ui.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ AWS uses role based authentication, so create an IAM role that the ClickHouse Cl
6262
"AWS": "arn:aws:iam::463754717262:role/CH-S3-bordeaux-ar-90-ue2-29-Role"
6363
},
6464
"Action": "sts:AssumeRole"
65-
},
65+
}
6666
]
6767
}
6868
```

docs/guides/best-practices/skipping-indexes-examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ See [parameter docs](/engines/table-engines/mergetree-family/mergetree#n-gram-bl
128128

129129
## Token Bloom filter (tokenbf\_v1) for word-based search {#token-bloom-filter-tokenbf-v1-for-word-based-search}
130130

131-
`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hastoken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
131+
`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hasToken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
132132

133133
See [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types) pages for more details.
134134

docs/guides/inserting-data.md

Lines changed: 136 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -11,34 +11,6 @@ doc_type: 'guide'
1111
import postgres_inserts from '@site/static/images/guides/postgres-inserts.png';
1212
import Image from '@theme/IdealImage';
1313

14-
## Basic example {#basic-example}
15-
16-
You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables).
17-
18-
```sql
19-
INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
20-
(101, 'Hello, ClickHouse!', now(), -1.0 ),
21-
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
22-
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
23-
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
24-
```
25-
26-
To verify that worked, we'll run the following `SELECT` query:
27-
28-
```sql
29-
SELECT * FROM helloworld.my_first_table
30-
```
31-
32-
Which returns:
33-
34-
```response
35-
user_id message timestamp metric
36-
101 Hello, ClickHouse! 2024-11-13 20:01:22 -1
37-
101 Granules are the smallest chunks of data read 2024-11-13 20:01:27 3.14159
38-
102 Insert a lot of rows per batch 2024-11-12 00:00:00 1.41421
39-
102 Sort your data based on your commonly-used queries 2024-11-13 00:00:00 2.718
40-
```
41-
4214
## Inserting into ClickHouse vs. OLTP databases {#inserting-into-clickhouse-vs-oltp-databases}
4315

4416
As an OLAP (Online Analytical Processing) database, ClickHouse is optimized for high performance and scalability, allowing potentially millions of rows to be inserted per second.
@@ -143,16 +115,149 @@ The native protocol does allow query progress to be easily tracked.
143115

144116
See [HTTP Interface](/interfaces/http) for further details.
145117

118+
## Basic example {#basic-example}
119+
120+
You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables).
121+
122+
```sql
123+
INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
124+
(101, 'Hello, ClickHouse!', now(), -1.0 ),
125+
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
126+
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
127+
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
128+
```
129+
130+
To verify that worked, we'll run the following `SELECT` query:
131+
132+
```sql
133+
SELECT * FROM helloworld.my_first_table
134+
```
135+
136+
Which returns:
137+
138+
```response
139+
user_id message timestamp metric
140+
101 Hello, ClickHouse! 2024-11-13 20:01:22 -1
141+
101 Granules are the smallest chunks of data read 2024-11-13 20:01:27 3.14159
142+
102 Insert a lot of rows per batch 2024-11-12 00:00:00 1.41421
143+
102 Sort your data based on your commonly-used queries 2024-11-13 00:00:00 2.718
144+
```
145+
146146
## Loading data from Postgres {#loading-data-from-postgres}
147147

148148
For loading data from Postgres, users can use:
149149

150-
- `PeerDB by ClickHouse`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
151-
- ClickHouse Cloud - available through our [new connector](/integrations/clickpipes/postgres) in ClickPipes, our managed ingestion service.
152-
- Self-managed - via the [open-source project](https://github.com/PeerDB-io/peerdb).
150+
- `ClickPipes`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
151+
- ClickHouse Cloud - available through our [managed ingestion service](/integrations/clickpipes/postgres) in ClickPipes.
152+
- Self-managed - via the [PeerDB open-source project](https://github.com/PeerDB-io/peerdb).
153153
- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's of millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
154154
- Data can be exported from PostgreSQL in CSV format. This can then be inserted into ClickHouse from either local files or via object storage using table functions.
155155

156156
:::note Need help inserting large datasets?
157157
If you need help inserting large datasets or encounter any errors when importing data into ClickHouse Cloud, please contact us at support@clickhouse.com and we can assist.
158158
:::
159+
160+
## Inserting data from the command line {#inserting-data-from-command-line}
161+
162+
**Prerequisites**
163+
- You have [installed](/install) ClickHouse
164+
- `clickhouse-server` is running
165+
- You have access to a terminal with `wget`, `zcat` and `curl`
166+
167+
In this example you'll see how to insert a CSV file into ClickHouse from the command line using clickhouse-client in batch mode. For more information and examples of inserting data via command line using clickhouse-client in batch mode, see ["Batch mode"](/interfaces/cli#batch-mode).
168+
169+
We'll be using the [Hacker News dataset](/getting-started/example-datasets/hacker-news) for this example, which contains 28 million rows of Hacker News data.
170+
171+
<VerticalStepper headerLevel="h3">
172+
173+
### Download the CSV {#download-csv}
174+
175+
Run the following command to download a CSV version of the dataset from our public S3 bucket:
176+
177+
```bash
178+
wget https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz
179+
```
180+
181+
At 4.6GB, and 28m rows, this compressed file should take 5-10 minutes to download.
182+
183+
### Create the table {#create-table}
184+
185+
With `clickhouse-server` running, you can create an empty table with the following schema directly from the command line using `clickhouse-client` in batch mode:
186+
187+
```bash
188+
clickhouse-client <<'_EOF'
189+
CREATE TABLE hackernews(
190+
`id` UInt32,
191+
`deleted` UInt8,
192+
`type` Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
193+
`by` LowCardinality(String),
194+
`time` DateTime,
195+
`text` String,
196+
`dead` UInt8,
197+
`parent` UInt32,
198+
`poll` UInt32,
199+
`kids` Array(UInt32),
200+
`url` String,
201+
`score` Int32,
202+
`title` String,
203+
`parts` Array(UInt32),
204+
`descendants` Int32
205+
)
206+
ENGINE = MergeTree
207+
ORDER BY id
208+
_EOF
209+
```
210+
211+
If there are no errors, then the table has been successfully created. In the command above single quotes are used around the heredoc delimiter (`_EOF`) to prevent any interpolation. Without single quotes it would be necessary to escape the backticks around the column names.
212+
213+
### Insert the data from the command line {#insert-data-via-cmd}
214+
215+
Next run the command below to insert the data from the file you downloaded earlier into your table:
216+
217+
```bash
218+
zcat < hacknernews.csv.gz | ./clickhouse client --query "INSERT INTO hackernews FORMAT CSV"
219+
```
220+
221+
As our data is compressed, we need to first decompress the file using a tool like `gzip``zcat`, or similar, and then pipe the decompressed data into `clickhouse-client` with the appropriate `INSERT` statement and `FORMAT`.
222+
223+
:::note
224+
When inserting data with clickhouse-client in interactive mode, it is possible to let ClickHouse handle the decompression for you on insert using the `COMPRESSION` clause. ClickHouse can automatically detect the compression type from the file extension, but you can also specify it explicitly.
225+
226+
The query to insert would then look like this:
227+
228+
```bash
229+
clickhouse-client --query "INSERT INTO hackernews FROM INFILE 'hacknernews.csv.gz' COMPRESSION 'gzip' FORMAT CSV;"
230+
```
231+
:::
232+
233+
When the data has finished inserting you can run the following command to see the number of rows in the `hackernews` table:
234+
235+
```bash
236+
clickhouse-client --query "SELECT formatReadableQuantity(count(*)) FROM hackernews"
237+
28.74 million
238+
```
239+
240+
### inserting data via command line with curl {#insert-using-curl}
241+
242+
In the previous steps you first downloaded the csv file to your local machine using `wget`. It is also possible to directly insert the data from the remote URL using a single command.
243+
244+
Run the following command to truncate the data from the `hackernews` table so that you can insert it again without the intermediate step of downloading to your local machine:
245+
246+
```bash
247+
clickhouse-client --query "TRUNCATE hackernews"
248+
```
249+
250+
Now run:
251+
252+
```bash
253+
curl https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz | zcat | clickhouse-client --query "INSERT INTO hackernews FORMAT CSV"
254+
```
255+
256+
You can now run the same command as previously to verify that the data was inserted again:
257+
258+
```bash
259+
clickhouse-client --query "SELECT formatReadableQuantity(count(*)) FROM hackernews"
260+
28.74 million
261+
```
262+
263+
</VerticalStepper>

docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
5757
## Failed {#failed}
5858

5959
If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
60-
61-
## Degraded {#degraded}
62-
63-
:::note
64-
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
65-
:::

docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
5757
## Failed {#failed}
5858

5959
If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
60-
61-
## Degraded {#degraded}
62-
63-
:::note
64-
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
65-
:::

docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
5757
## Failed {#failed}
5858

5959
If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
60-
61-
## Degraded {#degraded}
62-
63-
:::note
64-
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
65-
:::

docs/integrations/data-ingestion/insert-local-files.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ ENGINE = MergeTree
4343
ORDER BY toYYYYMMDD(timestamp)
4444
```
4545

46-
3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractall). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
46+
3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractAll). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
4747

4848
```bash
4949
clickhouse-client \

docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md

Lines changed: 79 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -310,20 +310,93 @@ For additional details check out the official [tutorial](https://docs.confluent.
310310

311311
ClickHouse Kafka Connect reports runtime metrics via [Java Management Extensions (JMX)](https://www.oracle.com/technical-resources/articles/javase/jmx.html). JMX is enabled in Kafka Connector by default.
312312

313-
ClickHouse Connect `MBeanName`:
313+
#### ClickHouse-Specific Metrics {#clickhouse-specific-metrics}
314+
315+
The connector exposes custom metrics via the following MBean name:
314316

315317
```java
316318
com.clickhouse:type=ClickHouseKafkaConnector,name=SinkTask{id}
317319
```
318320

319-
ClickHouse Kafka Connect reports the following metrics:
320-
321-
| Name | Type | Description |
322-
|----------------------|------|-----------------------------------------------------------------------------------------|
323-
| `receivedRecords` | long | The total number of records received. |
321+
| Metric Name | Type | Description |
322+
|-----------------------|------|-----------------------------------------------------------------------------------------|
323+
| `receivedRecords` | long | The total number of records received. |
324324
| `recordProcessingTime` | long | Total time in nanoseconds spent grouping and converting records to a unified structure. |
325325
| `taskProcessingTime` | long | Total time in nanoseconds spent processing and inserting data into ClickHouse. |
326326

327+
#### Kafka Producer/Consumer Metrics {#kafka-producer-consumer-metrics}
328+
329+
The connector exposes standard Kafka producer and consumer metrics that provide insights into data flow, throughput, and performance.
330+
331+
**Topic-Level Metrics:**
332+
- `records-sent-total`: Total number of records sent to the topic
333+
- `bytes-sent-total`: Total bytes sent to the topic
334+
- `record-send-rate`: Average rate of records sent per second
335+
- `byte-rate`: Average bytes sent per second
336+
- `compression-rate`: Compression ratio achieved
337+
338+
**Partition-Level Metrics:**
339+
- `records-sent-total`: Total records sent to the partition
340+
- `bytes-sent-total`: Total bytes sent to the partition
341+
- `records-lag`: Current lag in the partition
342+
- `records-lead`: Current lead in the partition
343+
- `replica-fetch-lag`: Lag information for replicas
344+
345+
**Node-Level Connection Metrics:**
346+
- `connection-creation-total`: Total connections created to the Kafka node
347+
- `connection-close-total`: Total connections closed
348+
- `request-total`: Total requests sent to the node
349+
- `response-total`: Total responses received from the node
350+
- `request-rate`: Average request rate per second
351+
- `response-rate`: Average response rate per second
352+
353+
These metrics help monitor:
354+
- **Throughput**: Track data ingestion rates
355+
- **Lag**: Identify bottlenecks and processing delays
356+
- **Compression**: Measure data compression efficiency
357+
- **Connection Health**: Monitor network connectivity and stability
358+
359+
#### Kafka Connect Framework Metrics {#kafka-connect-framework-metrics}
360+
361+
The connector integrates with the Kafka Connect framework and exposes metrics for task lifecycle and error tracking.
362+
363+
**Task Status Metrics:**
364+
- `task-count`: Total number of tasks in the connector
365+
- `running-task-count`: Number of tasks currently running
366+
- `paused-task-count`: Number of tasks currently paused
367+
- `failed-task-count`: Number of tasks that have failed
368+
- `destroyed-task-count`: Number of destroyed tasks
369+
- `unassigned-task-count`: Number of unassigned tasks
370+
371+
Task status values include: `running`, `paused`, `failed`, `destroyed`, `unassigned`
372+
373+
**Error Metrics:**
374+
- `deadletterqueue-produce-failures`: Number of failed DLQ writes
375+
- `deadletterqueue-produce-requests`: Total DLQ write attempts
376+
- `last-error-timestamp`: Timestamp of the last error
377+
- `records-skip-total`: Total number of records skipped due to errors
378+
- `records-retry-total`: Total number of records that were retried
379+
- `errors-total`: Total number of errors encountered
380+
381+
**Performance Metrics:**
382+
- `offset-commit-failures`: Number of failed offset commits
383+
- `offset-commit-avg-time-ms`: Average time for offset commits
384+
- `offset-commit-max-time-ms`: Maximum time for offset commits
385+
- `put-batch-avg-time-ms`: Average time to process a batch
386+
- `put-batch-max-time-ms`: Maximum time to process a batch
387+
- `source-record-poll-total`: Total records polled
388+
389+
#### Monitoring Best Practices {#monitoring-best-practices}
390+
391+
1. **Monitor Consumer Lag**: Track `records-lag` per partition to identify processing bottlenecks
392+
2. **Track Error Rates**: Watch `errors-total` and `records-skip-total` to detect data quality issues
393+
3. **Observe Task Health**: Monitor task status metrics to ensure tasks are running properly
394+
4. **Measure Throughput**: Use `records-send-rate` and `byte-rate` to track ingestion performance
395+
5. **Monitor Connection Health**: Check node-level connection metrics for network issues
396+
6. **Track Compression Efficiency**: Use `compression-rate` to optimize data transfer
397+
398+
For detailed JMX metric definitions and Prometheus integration, see the [jmx-export-connector.yml](https://github.com/ClickHouse/clickhouse-kafka-connect/blob/main/jmx-export-connector.yml) configuration file.
399+
327400
### Limitations {#limitations}
328401

329402
- Deletes are not supported.

docs/integrations/data-sources/redis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ sidebar_label: 'Redis'
44
title: 'Redis'
55
description: 'Page describing the Redis table function'
66
doc_type: 'reference'
7-
show_title: false
7+
hide_title: true
88
keywords: ['redis', 'cache', 'integration', 'data source', 'key-value store']
99
---
1010

docs/use-cases/observability/build-your-own/schema-design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ Body: 151.233.185.144 - - [22/Jan/2019:19:08:54 +0330] "GET /image/105
115115
LogAttributes: {'log.file.name':'access-unstructured.log'}
116116
```
117117

118-
A similar query for the unstructured logs requires the use of regular expressions via the [`extractAllGroupsVertical` function](/sql-reference/functions/string-search-functions#extractallgroupsvertical).
118+
A similar query for the unstructured logs requires the use of regular expressions via the `extractAllGroupsVertical` function.
119119

120120
```sql
121121
SELECT

0 commit comments

Comments
 (0)