You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/best-practices/skipping-indexes-examples.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,7 +128,7 @@ See [parameter docs](/engines/table-engines/mergetree-family/mergetree#n-gram-bl
128
128
129
129
## Token Bloom filter (tokenbf\_v1) for word-based search {#token-bloom-filter-tokenbf-v1-for-word-based-search}
130
130
131
-
`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hastoken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
131
+
`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hasToken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
132
132
133
133
See [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types) pages for more details.
import postgres_inserts from '@site/static/images/guides/postgres-inserts.png';
12
12
import Image from '@theme/IdealImage';
13
13
14
-
## Basic example {#basic-example}
15
-
16
-
You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables).
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
22
-
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
23
-
(101, 'Granules are the smallest chunks of data read', now() +5, 3.14159 )
24
-
```
25
-
26
-
To verify that worked, we'll run the following `SELECT` query:
27
-
28
-
```sql
29
-
SELECT*FROMhelloworld.my_first_table
30
-
```
31
-
32
-
Which returns:
33
-
34
-
```response
35
-
user_id message timestamp metric
36
-
101 Hello, ClickHouse! 2024-11-13 20:01:22 -1
37
-
101 Granules are the smallest chunks of data read 2024-11-13 20:01:27 3.14159
38
-
102 Insert a lot of rows per batch 2024-11-12 00:00:00 1.41421
39
-
102 Sort your data based on your commonly-used queries 2024-11-13 00:00:00 2.718
40
-
```
41
-
42
14
## Inserting into ClickHouse vs. OLTP databases {#inserting-into-clickhouse-vs-oltp-databases}
43
15
44
16
As an OLAP (Online Analytical Processing) database, ClickHouse is optimized for high performance and scalability, allowing potentially millions of rows to be inserted per second.
@@ -143,16 +115,149 @@ The native protocol does allow query progress to be easily tracked.
143
115
144
116
See [HTTP Interface](/interfaces/http) for further details.
145
117
118
+
## Basic example {#basic-example}
119
+
120
+
You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables).
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
126
+
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
127
+
(101, 'Granules are the smallest chunks of data read', now() +5, 3.14159 )
128
+
```
129
+
130
+
To verify that worked, we'll run the following `SELECT` query:
131
+
132
+
```sql
133
+
SELECT*FROMhelloworld.my_first_table
134
+
```
135
+
136
+
Which returns:
137
+
138
+
```response
139
+
user_id message timestamp metric
140
+
101 Hello, ClickHouse! 2024-11-13 20:01:22 -1
141
+
101 Granules are the smallest chunks of data read 2024-11-13 20:01:27 3.14159
142
+
102 Insert a lot of rows per batch 2024-11-12 00:00:00 1.41421
143
+
102 Sort your data based on your commonly-used queries 2024-11-13 00:00:00 2.718
144
+
```
145
+
146
146
## Loading data from Postgres {#loading-data-from-postgres}
147
147
148
148
For loading data from Postgres, users can use:
149
149
150
-
-`PeerDB by ClickHouse`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
151
-
- ClickHouse Cloud - available through our [new connector](/integrations/clickpipes/postgres) in ClickPipes, our managed ingestion service.
152
-
- Self-managed - via the [open-source project](https://github.com/PeerDB-io/peerdb).
150
+
-`ClickPipes`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
151
+
- ClickHouse Cloud - available through our [managed ingestion service](/integrations/clickpipes/postgres) in ClickPipes.
152
+
- Self-managed - via the [PeerDB open-source project](https://github.com/PeerDB-io/peerdb).
153
153
- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's of millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
154
154
- Data can be exported from PostgreSQL in CSV format. This can then be inserted into ClickHouse from either local files or via object storage using table functions.
155
155
156
156
:::note Need help inserting large datasets?
157
157
If you need help inserting large datasets or encounter any errors when importing data into ClickHouse Cloud, please contact us at support@clickhouse.com and we can assist.
158
158
:::
159
+
160
+
## Inserting data from the command line {#inserting-data-from-command-line}
161
+
162
+
**Prerequisites**
163
+
- You have [installed](/install) ClickHouse
164
+
-`clickhouse-server` is running
165
+
- You have access to a terminal with `wget`, `zcat` and `curl`
166
+
167
+
In this example you'll see how to insert a CSV file into ClickHouse from the command line using clickhouse-client in batch mode. For more information and examples of inserting data via command line using clickhouse-client in batch mode, see ["Batch mode"](/interfaces/cli#batch-mode).
168
+
169
+
We'll be using the [Hacker News dataset](/getting-started/example-datasets/hacker-news) for this example, which contains 28 million rows of Hacker News data.
170
+
171
+
<VerticalStepperheaderLevel="h3">
172
+
173
+
### Download the CSV {#download-csv}
174
+
175
+
Run the following command to download a CSV version of the dataset from our public S3 bucket:
At 4.6GB, and 28m rows, this compressed file should take 5-10 minutes to download.
182
+
183
+
### Create the table {#create-table}
184
+
185
+
With `clickhouse-server` running, you can create an empty table with the following schema directly from the command line using `clickhouse-client` in batch mode:
If there are no errors, then the table has been successfully created. In the command above single quotes are used around the heredoc delimiter (`_EOF`) to prevent any interpolation. Without single quotes it would be necessary to escape the backticks around the column names.
212
+
213
+
### Insert the data from the command line {#insert-data-via-cmd}
214
+
215
+
Next run the command below to insert the data from the file you downloaded earlier into your table:
216
+
217
+
```bash
218
+
zcat < hacknernews.csv.gz | ./clickhouse client --query "INSERT INTO hackernews FORMAT CSV"
219
+
```
220
+
221
+
As our data is compressed, we need to first decompress the file using a tool like `gzip`, `zcat`, or similar, and then pipe the decompressed data into `clickhouse-client` with the appropriate `INSERT` statement and `FORMAT`.
222
+
223
+
:::note
224
+
When inserting data with clickhouse-client in interactive mode, it is possible to let ClickHouse handle the decompression for you on insert using the `COMPRESSION` clause. ClickHouse can automatically detect the compression type from the file extension, but you can also specify it explicitly.
225
+
226
+
The query to insert would then look like this:
227
+
228
+
```bash
229
+
clickhouse-client --query "INSERT INTO hackernews FROM INFILE 'hacknernews.csv.gz' COMPRESSION 'gzip' FORMAT CSV;"
230
+
```
231
+
:::
232
+
233
+
When the data has finished inserting you can run the following command to see the number of rows in the `hackernews` table:
234
+
235
+
```bash
236
+
clickhouse-client --query "SELECT formatReadableQuantity(count(*)) FROM hackernews"
237
+
28.74 million
238
+
```
239
+
240
+
### inserting data via command line with curl {#insert-using-curl}
241
+
242
+
In the previous steps you first downloaded the csv file to your local machine using `wget`. It is also possible to directly insert the data from the remote URL using a single command.
243
+
244
+
Run the following command to truncate the data from the `hackernews` table so that you can insert it again without the intermediate step of downloading to your local machine:
245
+
246
+
```bash
247
+
clickhouse-client --query "TRUNCATE hackernews"
248
+
```
249
+
250
+
Now run:
251
+
252
+
```bash
253
+
curl https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz | zcat | clickhouse-client --query "INSERT INTO hackernews FORMAT CSV"
254
+
```
255
+
256
+
You can now run the same command as previously to verify that the data was inserted again:
257
+
258
+
```bash
259
+
clickhouse-client --query "SELECT formatReadableQuantity(count(*)) FROM hackernews"
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md
-6Lines changed: 0 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
57
57
## Failed {#failed}
58
58
59
59
If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
60
-
61
-
## Degraded {#degraded}
62
-
63
-
:::note
64
-
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md
-6Lines changed: 0 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
57
57
## Failed {#failed}
58
58
59
59
If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
60
-
61
-
## Degraded {#degraded}
62
-
63
-
:::note
64
-
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md
-6Lines changed: 0 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
57
57
## Failed {#failed}
58
58
59
59
If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
60
-
61
-
## Degraded {#degraded}
62
-
63
-
:::note
64
-
This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/insert-local-files.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ ENGINE = MergeTree
43
43
ORDER BY toYYYYMMDD(timestamp)
44
44
```
45
45
46
-
3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractall). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
46
+
3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractAll). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md
+79-6Lines changed: 79 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -310,20 +310,93 @@ For additional details check out the official [tutorial](https://docs.confluent.
310
310
311
311
ClickHouse Kafka Connect reports runtime metrics via [Java Management Extensions (JMX)](https://www.oracle.com/technical-resources/articles/javase/jmx.html). JMX is enabled in Kafka Connector by default.
The connector integrates with the Kafka Connect framework and exposes metrics for task lifecycle and error tracking.
362
+
363
+
**Task Status Metrics:**
364
+
-`task-count`: Total number of tasks in the connector
365
+
-`running-task-count`: Number of tasks currently running
366
+
-`paused-task-count`: Number of tasks currently paused
367
+
-`failed-task-count`: Number of tasks that have failed
368
+
-`destroyed-task-count`: Number of destroyed tasks
369
+
-`unassigned-task-count`: Number of unassigned tasks
370
+
371
+
Task status values include: `running`, `paused`, `failed`, `destroyed`, `unassigned`
372
+
373
+
**Error Metrics:**
374
+
-`deadletterqueue-produce-failures`: Number of failed DLQ writes
375
+
-`deadletterqueue-produce-requests`: Total DLQ write attempts
376
+
-`last-error-timestamp`: Timestamp of the last error
377
+
-`records-skip-total`: Total number of records skipped due to errors
378
+
-`records-retry-total`: Total number of records that were retried
379
+
-`errors-total`: Total number of errors encountered
380
+
381
+
**Performance Metrics:**
382
+
-`offset-commit-failures`: Number of failed offset commits
383
+
-`offset-commit-avg-time-ms`: Average time for offset commits
384
+
-`offset-commit-max-time-ms`: Maximum time for offset commits
385
+
-`put-batch-avg-time-ms`: Average time to process a batch
386
+
-`put-batch-max-time-ms`: Maximum time to process a batch
387
+
-`source-record-poll-total`: Total records polled
388
+
389
+
#### Monitoring Best Practices {#monitoring-best-practices}
390
+
391
+
1.**Monitor Consumer Lag**: Track `records-lag` per partition to identify processing bottlenecks
392
+
2.**Track Error Rates**: Watch `errors-total` and `records-skip-total` to detect data quality issues
393
+
3.**Observe Task Health**: Monitor task status metrics to ensure tasks are running properly
394
+
4.**Measure Throughput**: Use `records-send-rate` and `byte-rate` to track ingestion performance
395
+
5.**Monitor Connection Health**: Check node-level connection metrics for network issues
396
+
6.**Track Compression Efficiency**: Use `compression-rate` to optimize data transfer
397
+
398
+
For detailed JMX metric definitions and Prometheus integration, see the [jmx-export-connector.yml](https://github.com/ClickHouse/clickhouse-kafka-connect/blob/main/jmx-export-connector.yml) configuration file.
A similar query for the unstructured logs requires the use of regular expressions via the [`extractAllGroupsVertical` function](/sql-reference/functions/string-search-functions#extractallgroupsvertical).
118
+
A similar query for the unstructured logs requires the use of regular expressions via the `extractAllGroupsVertical` function.
0 commit comments