Skip to content

Commit 2f96f2e

Browse files
authored
[Feature][Connector-V2] Support databend source/sink connector (#9331)
1 parent 6cf2581 commit 2f96f2e

File tree

49 files changed

+5470
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+5470
-2
lines changed

.github/workflows/labeler/label-scope-conf.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,11 @@ clickhouse:
9393
- changed-files:
9494
- any-glob-to-any-file: seatunnel-connectors-v2/connector-clickhouse/**
9595
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(clickhouse)/**'
96+
databend:
97+
- all:
98+
- changed-files:
99+
- any-glob-to-any-file: seatunnel-connectors-v2/connector-databend/**
100+
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(databend)/**'
96101
datahub:
97102
- all:
98103
- changed-files:

config/plugin_config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ connector-cdc-oracle
3131
connector-cdc-tidb
3232
connector-clickhouse
3333
connector-datahub
34+
connector-databend
3435
connector-dingtalk
3536
connector-doris
3637
connector-elasticsearch
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<details><summary> Change Log </summary>
2+
3+
| Change | Commit | Version |
4+
|--------------------------------------------------------------| --- |---------|
5+
| [Feature][Connector-V2]Support Databend sink/source (#9331) |https://github.com/apache/seatunnel/pull/9331| TODO |
6+
7+
</details>
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
import ChangeLog from '../changelog/connector-databend.md';
2+
3+
# Databend
4+
5+
> Databend sink connector
6+
7+
## Supported Engines
8+
9+
> Spark<br/>
10+
> Flink<br/>
11+
> SeaTunnel Zeta<br/>
12+
13+
## Key Features
14+
15+
- [ ] [Exactly-Once](../../concept/connector-v2-features.md)
16+
- [ ] [Support Multi-table Writing](../../concept/connector-v2-features.md)
17+
- [ ] [CDC](../../concept/connector-v2-features.md)
18+
- [x] [Parallelism](../../concept/connector-v2-features.md)
19+
20+
## Description
21+
22+
A sink connector for writing data to Databend. Supports both batch and streaming processing modes.
23+
The Databend sink internally implements bulk data import through stage attachment.
24+
25+
## Dependencies
26+
27+
### For Spark/Flink
28+
29+
> 1. You need to download the [Databend JDBC driver jar package](https://github.com/databendlabs/databend-jdbc/) and add it to the directory `${SEATUNNEL_HOME}/plugins/`.
30+
31+
### For SeaTunnel Zeta
32+
33+
> 1. You need to download the [Databend JDBC driver jar package](https://github.com/databendlabs/databend-jdbc/) and add it to the directory `${SEATUNNEL_HOME}/lib/`.
34+
35+
## Sink Options
36+
37+
| Name | Type | Required | Default Value | Description |
38+
|------|------|----------|---------------|---------------------------------------------|
39+
| url | String | Yes | - | Databend JDBC connection URL |
40+
| username | String | Yes | - | Databend database username |
41+
| password | String | Yes | - | Databend database password |
42+
| database | String | No | - | Databend database name, defaults to the database name specified in the connection URL |
43+
| table | String | No | - | Databend table name |
44+
| batch_size | Integer | No | 1000 | Number of records for batch writing |
45+
| auto_commit | Boolean | No | true | Whether to auto-commit transactions |
46+
| max_retries | Integer | No | 3 | Maximum retry attempts on write failure |
47+
| schema_save_mode | Enum | No | CREATE_SCHEMA_WHEN_NOT_EXIST | Schema save mode |
48+
| data_save_mode | Enum | No | APPEND_DATA | Data save mode |
49+
| custom_sql | String | No | - | Custom write SQL, typically used for complex write scenarios |
50+
| execute_timeout_sec | Integer | No | 300 | SQL execution timeout (seconds) |
51+
| jdbc_config | Map | No | - | Additional JDBC connection configuration, such as connection timeout parameters |
52+
53+
### schema_save_mode[Enum]
54+
55+
Before starting the synchronization task, choose different processing schemes for existing table structures.
56+
Option descriptions:
57+
`RECREATE_SCHEMA`: Create when table doesn't exist, drop and recreate when table exists.
58+
`CREATE_SCHEMA_WHEN_NOT_EXIST`: Create when table doesn't exist, skip when table exists.
59+
`ERROR_WHEN_SCHEMA_NOT_EXIST`: Report error when table doesn't exist.
60+
`IGNORE`: Ignore table processing.
61+
62+
### data_save_mode[Enum]
63+
64+
Before starting the synchronization task, choose different processing schemes for existing data on the target side.
65+
Option descriptions:
66+
`DROP_DATA`: Retain database structure and delete data.
67+
`APPEND_DATA`: Retain database structure and data.
68+
`CUSTOM_PROCESSING`: User-defined processing.
69+
`ERROR_WHEN_DATA_EXISTS`: Report error when data exists.
70+
71+
## Data Type Mapping
72+
73+
| SeaTunnel Data Type | Databend Data Type |
74+
|-----------------|---------------|
75+
| BOOLEAN | BOOLEAN |
76+
| TINYINT | TINYINT |
77+
| SMALLINT | SMALLINT |
78+
| INT | INT |
79+
| BIGINT | BIGINT |
80+
| FLOAT | FLOAT |
81+
| DOUBLE | DOUBLE |
82+
| DECIMAL | DECIMAL |
83+
| STRING | STRING |
84+
| BYTES | VARBINARY |
85+
| DATE | DATE |
86+
| TIME | TIME |
87+
| TIMESTAMP | TIMESTAMP |
88+
89+
## Task Examples
90+
91+
### Simple Example
92+
93+
```hocon
94+
env {
95+
execution.parallelism = 1
96+
job.mode = "BATCH"
97+
}
98+
99+
source {
100+
FakeSource {
101+
row.num = 10
102+
schema = {
103+
fields {
104+
name = string
105+
age = int
106+
score = double
107+
}
108+
}
109+
}
110+
}
111+
112+
sink {
113+
Databend {
114+
url = "jdbc:databend://localhost:8000"
115+
username = "root"
116+
password = ""
117+
database = "default"
118+
table = "target_table"
119+
batch_size = 1000
120+
}
121+
}
122+
```
123+
124+
### Writing with Custom SQL
125+
126+
```hocon
127+
sink {
128+
Databend {
129+
url = "jdbc:databend://localhost:8000"
130+
username = "root"
131+
password = ""
132+
database = "default"
133+
table = "target_table"
134+
custom_sql = "INSERT INTO default.target_table(name, age, score) VALUES(?, ?, ?)"
135+
}
136+
}
137+
```
138+
139+
### Using Schema Save Mode
140+
141+
```hocon
142+
sink {
143+
Databend {
144+
url = "jdbc:databend://localhost:8000"
145+
username = "root"
146+
password = ""
147+
database = "default"
148+
table = "target_table"
149+
schema_save_mode = "RECREATE_SCHEMA"
150+
data_save_mode = "APPEND_DATA"
151+
}
152+
}
153+
```
154+
155+
## Related Links
156+
157+
- [Databend Official Website](https://databend.rs/)
158+
- [Databend JDBC Driver](https://github.com/databendlabs/databend-jdbc/)
159+
160+
## Changelog
161+
162+
<ChangeLog />
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
import ChangeLog from '../changelog/connector-databend.md';
2+
3+
# Databend
4+
5+
> Databend source connector
6+
7+
## Supported Engines
8+
9+
> Spark<br/>
10+
> Flink<br/>
11+
> SeaTunnel Zeta<br/>
12+
13+
14+
## Key Features
15+
16+
- [x] [Batch Processing](../../concept/connector-v2-features.md)
17+
- [ ] [Stream Processing](../../concept/connector-v2-features.md)
18+
- [x] [Parallelism](../../concept/connector-v2-features.md)
19+
- [ ] [Support User-defined Sharding](../../concept/connector-v2-features.md)
20+
- [ ] [Support Multi-table Reading](../../concept/connector-v2-features.md)
21+
22+
## Description
23+
24+
A source connector for reading data from Databend.
25+
26+
## Dependencies
27+
28+
### For Spark/Flink
29+
30+
> 1. You need to download the [Databend JDBC driver jar package](https://github.com/databendlabs/databend-jdbc/) and add it to the directory `${SEATUNNEL_HOME}/plugins/`.
31+
32+
### For SeaTunnel Zeta
33+
34+
> 1. You need to download the [Databend JDBC driver jar package](https://github.com/databendlabs/databend-jdbc/) and add it to the directory `${SEATUNNEL_HOME}/lib/`.
35+
36+
## Supported Data Source Information
37+
38+
| Data Source | Supported Version | Driver | URL | Maven |
39+
|-------------|-------------------|--------|-----|-------|
40+
| Databend | 1.2.x and above | - | - | - |
41+
42+
## Data Type Mapping
43+
44+
| Databend Data Type | SeaTunnel Data Type |
45+
|-------------------|-------------------|
46+
| BOOLEAN | BOOLEAN |
47+
| TINYINT | TINYINT |
48+
| SMALLINT | SMALLINT |
49+
| INT | INT |
50+
| BIGINT | BIGINT |
51+
| FLOAT | FLOAT |
52+
| DOUBLE | DOUBLE |
53+
| DECIMAL | DECIMAL |
54+
| STRING | STRING |
55+
| VARCHAR | STRING |
56+
| CHAR | STRING |
57+
| TIMESTAMP | TIMESTAMP |
58+
| DATE | DATE |
59+
| TIME | TIME |
60+
| BINARY | BYTES |
61+
62+
## Source Options
63+
64+
Basic Configuration:
65+
66+
| Name | Type | Required | Default Value | Description |
67+
|------|------|----------|---------------|-------------|
68+
| url | String | Yes | - | Databend JDBC connection URL |
69+
| username | String | Yes | - | Databend database username |
70+
| password | String | Yes | - | Databend database password |
71+
| database | String | No | - | Databend database name, defaults to the database name specified in the connection URL |
72+
| table | String | No | - | Databend table name |
73+
| query | String | No | - | Databend query statement, if set will override database and table settings |
74+
| fetch_size | Integer | No | 0 | Number of records to fetch from database at once, set to 0 to use JDBC driver default value |
75+
| jdbc_config | Map | No | - | Additional JDBC connection configuration, such as load balancing strategies |
76+
77+
Table List Configuration:
78+
79+
| Name | Type | Required | Default Value | Description |
80+
|------|------|----------|---------------|-------------|
81+
| database | String | Yes | - | Database name |
82+
| table | String | Yes | - | Table name |
83+
| query | String | No | - | Custom query statement |
84+
| fetch_size | Integer | No | 0 | Number of records to fetch from database at once |
85+
86+
Note: When this configuration corresponds to a single table, you can flatten the configuration items from table_list to the outer level.
87+
88+
## Task Examples
89+
90+
### Single Table Reading
91+
92+
```hocon
93+
env {
94+
parallelism = 2
95+
job.mode = "BATCH"
96+
}
97+
98+
source {
99+
Databend {
100+
url = "jdbc:databend://localhost:8000"
101+
username = "root"
102+
password = ""
103+
database = "default"
104+
table = "users"
105+
}
106+
}
107+
108+
sink {
109+
Console {}
110+
}
111+
```
112+
113+
### Using Custom Query
114+
115+
```hocon
116+
source {
117+
Databend {
118+
url = "jdbc:databend://localhost:8000"
119+
username = "root"
120+
password = ""
121+
query = "SELECT id, name, age FROM default.users WHERE age > 18"
122+
}
123+
}
124+
```
125+
126+
## Related Links
127+
128+
- [Databend Official Website](https://databend.rs/)
129+
- [Databend JDBC Driver](https://github.com/databendlabs/databend-jdbc/)
130+
131+
## Changelog
132+
133+
<ChangeLog />
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<details><summary> Change Log </summary>
2+
3+
| Change | Commit | Version |
4+
|--------------------------------------------------------------| --- |---------|
5+
| [Feature][Connector-V2]Support Databend sink/source (#9331) |https://github.com/apache/seatunnel/pull/9331| TODO |
6+
7+
</details>

0 commit comments

Comments
 (0)