ClickHouse
diff --git a/‎docs/cloud/features/08_backups/backups-to-own-cloud/backup_restore_from_ui.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/cloud/features/08_backups/backups-to-own-cloud/backup_restore_from_ui.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/best-practices/skipping-indexes-examples.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/guides/best-practices/skipping-indexes-examples.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/inserting-data.md‎
Lines changed: 136 additions & 31 deletions b/‎docs/guides/inserting-data.md‎
Lines changed: 136 additions & 31 deletions
diff --git a/‎docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md‎
Lines changed: 0 additions & 6 deletions b/‎docs/integrations/data-ingestion/clickpipes/mongodb/lifecycle.md‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md‎
Lines changed: 0 additions & 6 deletions b/‎docs/integrations/data-ingestion/clickpipes/mysql/lifecycle.md‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md‎
Lines changed: 0 additions & 6 deletions b/‎docs/integrations/data-ingestion/clickpipes/postgres/lifecycle.md‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎docs/integrations/data-ingestion/insert-local-files.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/integrations/data-ingestion/insert-local-files.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md‎
Lines changed: 79 additions & 6 deletions b/‎docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md‎
Lines changed: 79 additions & 6 deletions
diff --git a/‎docs/integrations/data-sources/redis.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/integrations/data-sources/redis.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/use-cases/observability/build-your-own/schema-design.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/use-cases/observability/build-your-own/schema-design.md‎
Lines changed: 1 addition & 1 deletion
@@ -62,7 +62,7 @@ AWS uses role based authentication, so create an IAM role that the ClickHouse Cl
         "AWS":  "arn:aws:iam::463754717262:role/CH-S3-bordeaux-ar-90-ue2-29-Role"
       },
       "Action": "sts:AssumeRole"
-    },
+    }
   ]
 }
 ```
 
@@ -128,7 +128,7 @@ See [parameter docs](/engines/table-engines/mergetree-family/mergetree#n-gram-bl
 
 ## Token Bloom filter (tokenbf\_v1) for word-based search {#token-bloom-filter-tokenbf-v1-for-word-based-search}
 
-`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hastoken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
+`tokenbf_v1` indexes tokens separated by non-alphanumeric characters. You should use it with [`hasToken`](/sql-reference/functions/string-search-functions#hasToken), `LIKE` word patterns or equals/IN. It supports `String`/`FixedString`/`Map` types.
 
 See [Token bloom filter](/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) and [Bloom filter types](/optimize/skipping-indexes#skip-index-types) pages for more details.
 
 
@@ -11,34 +11,6 @@ doc_type: 'guide'
 import postgres_inserts from '@site/static/images/guides/postgres-inserts.png';
 import Image from '@theme/IdealImage';
 
-## Basic example {#basic-example}
-
-You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables).
-
-```sql
-INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
-    (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
-    (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
-    (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
-    (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )
-```
-
-To verify that worked, we'll run the following `SELECT` query:
-
-```sql
-SELECT * FROM helloworld.my_first_table
-```
-
-Which returns:
-
-```response
-user_id message                                             timestamp           metric
-101         Hello, ClickHouse!                                  2024-11-13 20:01:22     -1
-101         Granules are the smallest chunks of data read           2024-11-13 20:01:27 3.14159
-102         Insert a lot of rows per batch                          2024-11-12 00:00:00 1.41421
-102         Sort your data based on your commonly-used queries  2024-11-13 00:00:00     2.718
-```
-
 ## Inserting into ClickHouse vs. OLTP databases {#inserting-into-clickhouse-vs-oltp-databases}
 
 As an OLAP (Online Analytical Processing) database, ClickHouse is optimized for high performance and scalability, allowing potentially millions of rows to be inserted per second.
@@ -143,16 +115,149 @@ The native protocol does allow query progress to be easily tracked.
 
 See [HTTP Interface](/interfaces/http) for further details.
 
+## Basic example {#basic-example}
+
+You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables).
+
+```sql
+INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
+    (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
+    (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
+    (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
+    (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )
+```
+
+To verify that worked, we'll run the following `SELECT` query:
+
+```sql
+SELECT * FROM helloworld.my_first_table
+```
+
+Which returns:
+
+```response
+user_id message                                             timestamp           metric
+101         Hello, ClickHouse!                                  2024-11-13 20:01:22     -1
+101         Granules are the smallest chunks of data read           2024-11-13 20:01:27 3.14159
+102         Insert a lot of rows per batch                          2024-11-12 00:00:00 1.41421
+102         Sort your data based on your commonly-used queries  2024-11-13 00:00:00     2.718
+```
+
 ## Loading data from Postgres {#loading-data-from-postgres}
 
 For loading data from Postgres, users can use:
 
-- `PeerDB by ClickHouse`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
-  - ClickHouse Cloud - available through our [new connector](/integrations/clickpipes/postgres) in ClickPipes, our managed ingestion service.
-  - Self-managed - via the [open-source project](https://github.com/PeerDB-io/peerdb).
+- `ClickPipes`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
+  - ClickHouse Cloud - available through our [managed ingestion service](/integrations/clickpipes/postgres) in ClickPipes.
+  - Self-managed - via the [PeerDB open-source project](https://github.com/PeerDB-io/peerdb).
 - The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's of millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
 - Data can be exported from PostgreSQL in CSV format. This can then be inserted into ClickHouse from either local files or via object storage using table functions.
 
 :::note Need help inserting large datasets?
 If you need help inserting large datasets or encounter any errors when importing data into ClickHouse Cloud, please contact us at support@clickhouse.com and we can assist.
 :::
+
+## Inserting data from the command line {#inserting-data-from-command-line}
+
+**Prerequisites**
+- You have [installed](/install) ClickHouse
+- `clickhouse-server` is running
+- You have access to a terminal with `wget`, `zcat` and `curl`
+
+In this example you'll see how to insert a CSV file into ClickHouse from the command line using clickhouse-client in batch mode. For more information and examples of inserting data via command line using clickhouse-client in batch mode, see ["Batch mode"](/interfaces/cli#batch-mode).
+
+We'll be using the [Hacker News dataset](/getting-started/example-datasets/hacker-news) for this example, which contains 28 million rows of Hacker News data.
+
+<VerticalStepper headerLevel="h3">
+    
+### Download the CSV {#download-csv}
+
+Run the following command to download a CSV version of the dataset from our public S3 bucket:
+
+```bash
+wget https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz
+```
+
+At 4.6GB, and 28m rows, this compressed file should take 5-10 minutes to download.
+
+### Create the table {#create-table}
+
+With `clickhouse-server` running, you can create an empty table with the following schema directly from the command line using `clickhouse-client` in batch mode:
+
+```bash
+clickhouse-client <<'_EOF'
+CREATE TABLE hackernews(
+    `id` UInt32,
+    `deleted` UInt8,
+    `type` Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
+    `by` LowCardinality(String),
+    `time` DateTime,
+    `text` String,
+    `dead` UInt8,
+    `parent` UInt32,
+    `poll` UInt32,
+    `kids` Array(UInt32),
+    `url` String,
+    `score` Int32,
+    `title` String,
+    `parts` Array(UInt32),
+    `descendants` Int32
+)
+ENGINE = MergeTree
+ORDER BY id
+_EOF
+```
+
+If there are no errors, then the table has been successfully created. In the command above single quotes are used around the heredoc delimiter (`_EOF`) to prevent any interpolation. Without single quotes it would be necessary to escape the backticks around the column names. 
+
+### Insert the data from the command line {#insert-data-via-cmd}
+
+Next run the command below to insert the data from the file you downloaded earlier into your table:
+
+```bash
+zcat < hacknernews.csv.gz | ./clickhouse client --query "INSERT INTO hackernews FORMAT CSV"
+```
+
+As our data is compressed, we need to first decompress the file using a tool like `gzip`, `zcat`, or similar, and then pipe the decompressed data into `clickhouse-client` with the appropriate `INSERT` statement and `FORMAT`.
+
+:::note
+When inserting data with clickhouse-client in interactive mode, it is possible to let ClickHouse handle the decompression for you on insert using the `COMPRESSION` clause. ClickHouse can automatically detect the compression type from the file extension, but you can also specify it explicitly.
+
+The query to insert would then look like this: 
+
+```bash
+clickhouse-client --query "INSERT INTO hackernews FROM INFILE 'hacknernews.csv.gz' COMPRESSION 'gzip' FORMAT CSV;"
+```
+:::
+
+When the data has finished inserting you can run the following command to see the number of rows in the `hackernews` table:
+
+```bash
+clickhouse-client --query "SELECT formatReadableQuantity(count(*)) FROM hackernews"
+28.74 million
+```
+
+### inserting data via command line with curl {#insert-using-curl}
+
+In the previous steps you first downloaded the csv file to your local machine using `wget`. It is also possible to directly insert the data from the remote URL using a single command.
+
+Run the following command to truncate the data from the `hackernews` table so that you can insert it again without the intermediate step of downloading to your local machine:
+
+```bash
+clickhouse-client --query "TRUNCATE hackernews"
+```
+
+Now run:
+
+```bash
+curl https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz | zcat | clickhouse-client --query "INSERT INTO hackernews FORMAT CSV"
+```
+
+You can now run the same command as previously to verify that the data was inserted again:
+
+```bash
+clickhouse-client --query "SELECT formatReadableQuantity(count(*)) FROM hackernews"
+28.74 million
+```
+
+</VerticalStepper>
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
 ## Failed {#failed}
 
 If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
-
-## Degraded {#degraded}
-
-:::note
-This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
-:::
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
 ## Failed {#failed}
 
 If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
-
-## Degraded {#degraded}
-
-:::note
-This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
-:::
@@ -57,9 +57,3 @@ This state applies to snapshot-only pipes and indicates that the snapshot has be
 ## Failed {#failed}
 
 If there is an irrecoverable error in the pipe, it will enter the `Failed` state. You can reach out to support or [resync](./resync) your pipe to recover from this state.
-
-## Degraded {#degraded}
-
-:::note
-This state is coming soon. If you're using our [OpenAPI](https://clickhouse.com/docs/cloud/manage/openapi), consider adding support for it now to ensure your integration continues working when it's released.
-:::
@@ -43,7 +43,7 @@ ENGINE = MergeTree
 ORDER BY toYYYYMMDD(timestamp)
 ```
 
-3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractall). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
+3. We want to lowercase the `author` column, which is easily done with the [`lower` function](/sql-reference/functions/string-functions#lower). We also want to split the `comment` string into tokens and store the result in the `tokens` column, which can be done using the [`extractAll` function](/sql-reference/functions/string-search-functions#extractAll). You do all of this in one `clickhouse-client` command - notice how the `comments.tsv` file is piped into the `clickhouse-client` using the `<` operator:
 
 ```bash
 clickhouse-client \
 
@@ -310,20 +310,93 @@ For additional details check out the official [tutorial](https://docs.confluent.
 
 ClickHouse Kafka Connect reports runtime metrics via [Java Management Extensions (JMX)](https://www.oracle.com/technical-resources/articles/javase/jmx.html). JMX is enabled in Kafka Connector by default.
 
-ClickHouse Connect `MBeanName`:
+#### ClickHouse-Specific Metrics {#clickhouse-specific-metrics}
+
+The connector exposes custom metrics via the following MBean name:
 
 ```java
 com.clickhouse:type=ClickHouseKafkaConnector,name=SinkTask{id}
 ```
 
-ClickHouse Kafka Connect reports the following metrics:
-
-| Name                 | Type | Description                                                                             |
-|----------------------|------|-----------------------------------------------------------------------------------------|
-| `receivedRecords`      | long | The total number of records received.                                                   |
+| Metric Name           | Type | Description                                                                             |
+|-----------------------|------|-----------------------------------------------------------------------------------------|
+| `receivedRecords`    | long | The total number of records received.                                                   |
 | `recordProcessingTime` | long | Total time in nanoseconds spent grouping and converting records to a unified structure. |
 | `taskProcessingTime`   | long | Total time in nanoseconds spent processing and inserting data into ClickHouse.          |
 
+#### Kafka Producer/Consumer Metrics {#kafka-producer-consumer-metrics}
+
+The connector exposes standard Kafka producer and consumer metrics that provide insights into data flow, throughput, and performance.
+
+**Topic-Level Metrics:**
+- `records-sent-total`: Total number of records sent to the topic
+- `bytes-sent-total`: Total bytes sent to the topic
+- `record-send-rate`: Average rate of records sent per second
+- `byte-rate`: Average bytes sent per second
+- `compression-rate`: Compression ratio achieved
+
+**Partition-Level Metrics:**
+- `records-sent-total`: Total records sent to the partition
+- `bytes-sent-total`: Total bytes sent to the partition
+- `records-lag`: Current lag in the partition
+- `records-lead`: Current lead in the partition
+- `replica-fetch-lag`: Lag information for replicas
+
+**Node-Level Connection Metrics:**
+- `connection-creation-total`: Total connections created to the Kafka node
+- `connection-close-total`: Total connections closed
+- `request-total`: Total requests sent to the node
+- `response-total`: Total responses received from the node
+- `request-rate`: Average request rate per second
+- `response-rate`: Average response rate per second
+
+These metrics help monitor:
+- **Throughput**: Track data ingestion rates
+- **Lag**: Identify bottlenecks and processing delays
+- **Compression**: Measure data compression efficiency
+- **Connection Health**: Monitor network connectivity and stability
+
+#### Kafka Connect Framework Metrics {#kafka-connect-framework-metrics}
+
+The connector integrates with the Kafka Connect framework and exposes metrics for task lifecycle and error tracking.
+
+**Task Status Metrics:**
+- `task-count`: Total number of tasks in the connector
+- `running-task-count`: Number of tasks currently running
+- `paused-task-count`: Number of tasks currently paused
+- `failed-task-count`: Number of tasks that have failed
+- `destroyed-task-count`: Number of destroyed tasks
+- `unassigned-task-count`: Number of unassigned tasks
+
+Task status values include: `running`, `paused`, `failed`, `destroyed`, `unassigned`
+
+**Error Metrics:**
+- `deadletterqueue-produce-failures`: Number of failed DLQ writes
+- `deadletterqueue-produce-requests`: Total DLQ write attempts
+- `last-error-timestamp`: Timestamp of the last error
+- `records-skip-total`: Total number of records skipped due to errors
+- `records-retry-total`: Total number of records that were retried
+- `errors-total`: Total number of errors encountered
+
+**Performance Metrics:**
+- `offset-commit-failures`: Number of failed offset commits
+- `offset-commit-avg-time-ms`: Average time for offset commits
+- `offset-commit-max-time-ms`: Maximum time for offset commits
+- `put-batch-avg-time-ms`: Average time to process a batch
+- `put-batch-max-time-ms`: Maximum time to process a batch
+- `source-record-poll-total`: Total records polled
+
+#### Monitoring Best Practices {#monitoring-best-practices}
+
+1. **Monitor Consumer Lag**: Track `records-lag` per partition to identify processing bottlenecks
+2. **Track Error Rates**: Watch `errors-total` and `records-skip-total` to detect data quality issues
+3. **Observe Task Health**: Monitor task status metrics to ensure tasks are running properly
+4. **Measure Throughput**: Use `records-send-rate` and `byte-rate` to track ingestion performance
+5. **Monitor Connection Health**: Check node-level connection metrics for network issues
+6. **Track Compression Efficiency**: Use `compression-rate` to optimize data transfer
+
+For detailed JMX metric definitions and Prometheus integration, see the [jmx-export-connector.yml](https://github.com/ClickHouse/clickhouse-kafka-connect/blob/main/jmx-export-connector.yml) configuration file.
+
 ### Limitations {#limitations}
 
 - Deletes are not supported.
 
@@ -4,7 +4,7 @@ sidebar_label: 'Redis'
 title: 'Redis'
 description: 'Page describing the Redis table function'
 doc_type: 'reference'
-show_title: false
+hide_title: true
 keywords: ['redis', 'cache', 'integration', 'data source', 'key-value store']
 ---
 
 
@@ -115,7 +115,7 @@ Body:           151.233.185.144 - - [22/Jan/2019:19:08:54 +0330] "GET /image/105
 LogAttributes: {'log.file.name':'access-unstructured.log'}
 ```
 
-A similar query for the unstructured logs requires the use of regular expressions via the [`extractAllGroupsVertical` function](/sql-reference/functions/string-search-functions#extractallgroupsvertical).
+A similar query for the unstructured logs requires the use of regular expressions via the `extractAllGroupsVertical` function.
 
 ```sql
 SELECT
Original file line number	Diff line number	Diff line change
`@@ -62,7 +62,7 @@ AWS uses role based authentication, so create an IAM role that the ClickHouse Cl`
`62`	`62`	`"AWS": "arn:aws:iam::463754717262:role/CH-S3-bordeaux-ar-90-ue2-29-Role"`
`63`	`63`	`},`
`64`	`64`	`"Action": "sts:AssumeRole"`
`65`		`- },`
	`65`	`+ }`
`66`	`66`	`]`
`67`	`67`	`}`
`68`	`68`	```