[Doc][Improve] connector-v2 clickhouse/hbase/pulsar sink support chin…

…ese (#6811) * [Doc][Improve] connector-v2 clickhouse/hbase/pulsar sink support chinese * update doc style --------- Co-authored-by: fanchengbo <fanchengbo@dobest.com>
apache · May 15, 2024 · 874f904 · 874f904
1 parent 4b6c13e
commit 874f904
Show file tree

Hide file tree

Showing 3 changed files with 469 additions and 0 deletions.
diff --git a/docs/zh/connector-v2/sink/Clickhouse.md b/docs/zh/connector-v2/sink/Clickhouse.md
@@ -0,0 +1,179 @@
+# Clickhouse
+
+> Clickhouse 数据连接器
+
+## 支持引擎
+
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
+
+## 核心特性
+
+- [ ] [精准一次](../../concept/connector-v2-features.md)
+- [x] [cdc](../../concept/connector-v2-features.md)
+
+> Clickhouse sink 插件通过实现幂等写入可以达到精准一次，需要配合 aggregating merge tree 支持重复数据删除的引擎。
+
+## 描述
+
+用于将数据写入 Clickhouse。
+
+## 支持的数据源信息
+
+为了使用 Clickhouse 连接器，需要以下依赖项。它们可以通过 install-plugin.sh 或从 Maven 中央存储库下载。
+
+|    数据源     |   支持的版本   |                                                     依赖                                                     |
+|------------|-----------|------------------------------------------------------------------------------------------------------------|
+| Clickhouse | universal | [下载](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) |
+
+## 数据类型映射
+
+| SeaTunnel 数据类型 |                                                                Clickhouse 数据类型                                                                |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
+| STRING         | String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon MultiPolygon                                                            |
+| INT            | Int8 / UInt8 / Int16 / UInt16 / Int32                                                                                                         |
+| BIGINT         | UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecond |
+| DOUBLE         | Float64                                                                                                                                       |
+| DECIMAL        | Decimal                                                                                                                                       |
+| FLOAT          | Float32                                                                                                                                       |
+| DATE           | Date                                                                                                                                          |
+| TIME           | DateTime                                                                                                                                      |
+| ARRAY          | Array                                                                                                                                         |
+| MAP            | Map                                                                                                                                           |
+
+## 输出选项
+
+|                  名称                   |   类型    | 是否必须 |  默认值  |                                                                                        描述                                                                                        |
+|---------------------------------------|---------|------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| host                                  | String  | Yes  | -     | `ClickHouse` 集群地址, 格式是`host:port` , 允许多个`hosts`配置. 例如 `"host1:8123,host2:8123"`.                                                                                                 |
+| database                              | String  | Yes  | -     | `ClickHouse` 数据库名称.                                                                                                                                                              |
+| table                                 | String  | Yes  | -     | 表名称.                                                                                                                                                                             |
+| username                              | String  | Yes  | -     | `ClickHouse` 用户账号.                                                                                                                                                               |
+| password                              | String  | Yes  | -     | `ClickHouse` 用户密码.                                                                                                                                                               |
+| clickhouse.config                     | Map     | No   |       | 除了上述必须由 `clickhouse-jdbc` 指定的必填参数外，用户还可以指定多个可选参数，这些参数涵盖了 `clickhouse-jdbc` 提供的所有[参数](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration). |
+| bulk_size                             | String  | No   | 20000 | 每次通过[Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) 写入的行数，即默认是20000.                                                                                            |
+| split_mode                            | String  | No   | false | 此模式仅支持引擎为`Distributed`的 `clickhouse` 表。选项 `internal_replication` 应该是 `true` 。他们将在 seatunnel 中拆分分布式表数据，并直接对每个分片进行写入。分片权重定义为 `clickhouse` 将计算在内。                                   |
+| sharding_key                          | String  | No   | -     | 使用 `split_mode` 时，将数据发送到哪个节点是个问题，默认为随机选择，但可以使用`sharding_key`参数来指定分片算法的字段。此选项仅在`split_mode`为 `true` 时有效.                                                                          |
+| primary_key                           | String  | No   | -     | 标记`clickhouse`表中的主键列，并根据主键执行INSERT/UPDATE/DELETE到`clickhouse`表.                                                                                                                  |
+| support_upsert                        | Boolean | No   | false | 支持按查询主键更新插入行.                                                                                                                                                                    |
+| allow_experimental_lightweight_delete | Boolean | No   | false | 允许基于`MergeTree`表引擎实验性轻量级删除.                                                                                                                                                      |
+| common-options                        |         | No   | -     | Sink插件查用参数,详见[Sink常用选项](common-options.md).                                                                                                                                      |
+
+## 如何创建一个clickhouse 同步任务
+
+以下示例演示如何创建将随机生成的数据写入Clickhouse数据库的数据同步作业。
+
+```bash
+# Set the basic configuration of the task to be performed
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+  checkpoint.interval  = 1000
+}
+
+source {
+  FakeSource {
+      row.num = 2
+      bigint.min = 0
+      bigint.max = 10000000
+      split.num = 1
+      split.read-interval = 300
+      schema {
+        fields {
+          c_bigint = bigint
+        }
+      }
+    }
+}
+
+sink {
+  Clickhouse {
+    host = "127.0.0.1:9092"
+    database = "default"
+    table = "test"
+    username = "xxxxx"
+    password = "xxxxx"
+  }
+}
+```
+
+### 小提示
+
+> 1.[SeaTunnel 部署文档](../../start-v2/locally/deployment.md). <br/>
+> 2.需要在同步前提前创建要写入的表.<br/>
+> 3.当写入 ClickHouse 表,无需设置其结构，因为连接器会在写入前向 ClickHouse 查询当前表的结构信息.<br/>
+
+## Clickhouse 接收器配置
+
+```hocon
+sink {
+  Clickhouse {
+    host = "localhost:8123"
+    database = "default"
+    table = "fake_all"
+    username = "xxxxx"
+    password = "xxxxx"
+    clickhouse.config = {
+      max_rows_to_read = "100"
+      read_overflow_mode = "throw"
+    }
+  }
+}
+```
+
+## 切分模式
+
+```hocon
+sink {
+  Clickhouse {
+    host = "localhost:8123"
+    database = "default"
+    table = "fake_all"
+    username = "xxxxx"
+    password = "xxxxx"
+    
+    # split mode options
+    split_mode = true
+    sharding_key = "age"
+  }
+}
+```
+
+## CDC(Change data capture) Sink
+
+```hocon
+sink {
+  Clickhouse {
+    host = "localhost:8123"
+    database = "default"
+    table = "fake_all"
+    username = "xxxxx"
+    password = "xxxxx"
+    
+    # cdc options
+    primary_key = "id"
+    support_upsert = true
+  }
+}
+```
+
+## CDC(Change data capture) for *MergeTree engine
+
+```hocon
+sink {
+  Clickhouse {
+    host = "localhost:8123"
+    database = "default"
+    table = "fake_all"
+    username = "xxxxx"
+    password = "xxxxx"
+    
+    # cdc options
+    primary_key = "id"
+    support_upsert = true
+    allow_experimental_lightweight_delete = true
+  }
+}
+```
+
diff --git a/docs/zh/connector-v2/sink/Hbase.md b/docs/zh/connector-v2/sink/Hbase.md
@@ -0,0 +1,122 @@
+# Hbase
+
+> Hbase 数据连接器
+
+## 描述
+
+将数据输出到hbase
+
+## 主要特性
+
+- [ ] [精准一次](../../concept/connector-v2-features.md)
+
+## 选项
+
+|         名称         |   类型    | 是否必须 |       默认值       |
+|--------------------|---------|------|-----------------|
+| zookeeper_quorum   | string  | yes  | -               |
+| table              | string  | yes  | -               |
+| rowkey_column      | list    | yes  | -               |
+| family_name        | config  | yes  | -               |
+| rowkey_delimiter   | string  | no   | ""              |
+| version_column     | string  | no   | -               |
+| null_mode          | string  | no   | skip            |
+| wal_write          | boolean | yes  | false           |
+| write_buffer_size  | string  | no   | 8 * 1024 * 1024 |
+| encoding           | string  | no   | utf8            |
+| hbase_extra_config | string  | no   | -               |
+| common-options     |         | no   | -               |
+
+### zookeeper_quorum [string]
+
+hbase的zookeeper集群主机, 示例: "hadoop001:2181,hadoop002:2181,hadoop003:2181"
+
+### table [string]
+
+要写入的表名, 例如: "seatunnel"
+
+### rowkey_column [list]
+
+行键的列名列表, 例如: ["id", "uuid"]
+
+### family_name [config]
+
+字段的列簇名称映射。例如,上游的行如下所示：
+
+| id |     name      | age |
+|----|---------------|-----|
+| 1  | tyrantlucifer | 27  |
+
+id作为行键和其他写入不同列簇的字段，可以分配
+
+family_name {
+name = "info1"
+age = "info2"
+}
+
+这主要是name写入列簇info1,age写入将写给列簇 info2
+
+如果要将其他字段写入同一列簇，可以分配
+
+family_name {
+all_columns = "info"
+}
+
+这意味着所有字段都将写入该列簇 info
+
+### rowkey_delimiter [string]
+
+连接多行键的分隔符，默认 ""
+
+### version_column [string]
+
+版本列名称，您可以使用它来分配 hbase 记录的时间戳
+
+### null_mode [double]
+
+写入 null 值的模式，支持 [ skip , empty], 默认 skip
+
+- skip: 当字段为 null ,连接器不会将此字段写入 hbase
+- empty: 当字段为null时,连接器将写入并为此字段生成空值
+
+### wal_write [boolean]
+
+wal log 写入标志，默认值 false
+
+### write_buffer_size [int]
+
+hbase 客户端的写入缓冲区大小，默认 8 * 1024 * 1024
+
+### encoding [string]
+
+字符串字段的编码，支持[ utf8 ， gbk]，默认 utf8
+
+### hbase_extra_config [config]
+
+hbase扩展配置
+
+### 常见选项
+
+Sink 插件常用参数，详见 Sink 常用选项 [Sink Common Options](common-options.md)
+
+## 案例
+
+```hocon
+
+Hbase {
+  zookeeper_quorum = "hadoop001:2181,hadoop002:2181,hadoop003:2181"
+  table = "seatunnel_test"
+  rowkey_column = ["name"]
+  family_name {
+    all_columns = seatunnel
+  }
+}
+
+```
+
+## 更改日志
+
+### 下一个版本
+
+- 添加 hbase 输出连接器 ([4049](https://github.com/apache/seatunnel/pull/4049))
+