Skip to content

Commit

Permalink
Merge branch 'dev' into jdbc-gbase8a
Browse files Browse the repository at this point in the history
  • Loading branch information
Hisoka-X committed Oct 18, 2022
2 parents b14234c + 44ee9a8 commit d0430dd
Show file tree
Hide file tree
Showing 120 changed files with 1,979 additions and 733 deletions.
52 changes: 52 additions & 0 deletions docs/en/command/usage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ import TabItem from '@theme/TabItem';
values={[
{label: 'Spark', value: 'spark'},
{label: 'Flink', value: 'flink'},
{label: 'Spark V2', value: 'spark V2'},
{label: 'Flink V2', value: 'flink V2'},
]}>
<TabItem value="spark">

Expand All @@ -25,6 +27,20 @@ bin/start-seatunnel-spark.sh
bin/start-seatunnel-flink.sh
```

</TabItem>
<TabItem value="spark V2">

```bash
bin/start-seatunnel-spark-connector-v2.sh
```

</TabItem>
<TabItem value="flink V2">

```bash
bin/start-seatunnel-flink-connector-v2.sh
```

</TabItem>
</Tabs>

Expand All @@ -37,6 +53,8 @@ bin/start-seatunnel-flink.sh
values={[
{label: 'Spark', value: 'spark'},
{label: 'Flink', value: 'flink'},
{label: 'Spark V2', value: 'spark V2'},
{label: 'Flink V2', value: 'flink V2'},
]}>
<TabItem value="spark">

Expand All @@ -52,6 +70,24 @@ bin/start-seatunnel-spark.sh \

- Use `-e` or `--deploy-mode` to specify the deployment mode

</TabItem>
<TabItem value="spark V2">

```bash
bin/start-seatunnel-spark-connector-v2.sh \
-c config-path \
-m master \
-e deploy-mode \
-i city=beijing \
-n spark-test
```

- Use `-m` or `--master` to specify the cluster manager

- Use `-e` or `--deploy-mode` to specify the deployment mode

- Use `-n` or `--name` to specify the app name

</TabItem>
<TabItem value="flink">

Expand All @@ -65,6 +101,22 @@ bin/start-seatunnel-flink.sh \

- Use `-r` or `--run-mode` to specify the flink job run mode, you can use `run-application` or `run` (default value)

</TabItem>
<TabItem value="flink V2">

```bash
bin/start-seatunnel-flink-connector-v2.sh \
-c config-path \
-i key=value \
-r run-application \
-n flink-test \
[other params]
```

- Use `-r` or `--run-mode` to specify the flink job run mode, you can use `run-application` or `run` (default value)

- Use `-n` or `--name` to specify the app name

</TabItem>
</Tabs>

Expand Down
1 change: 1 addition & 0 deletions docs/en/connector-v2/sink/Jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ there are some reference value for params above.
| sqlserver | com.microsoft.sqlserver.jdbc.SQLServerDriver | jdbc:microsoft:sqlserver://localhost:1433 | com.microsoft.sqlserver.jdbc.SQLServerXADataSource | https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc |
| oracle | oracle.jdbc.OracleDriver | jdbc:oracle:thin:@localhost:1521/xepdb1 | oracle.jdbc.xa.OracleXADataSource | https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8 |
| gbase8a | com.gbase.jdbc.Driver | jdbc:gbase://e2e_gbase8aDb:5258/test | / | https://www.gbase8.cn/wp-content/uploads/2020/10/gbase-connector-java-8.3.81.53-build55.5.7-bin_min_mix.jar |
| starrocks | com.mysql.cj.jdbc.Driver | jdbc:mysql://localhost:3306/test | / | https://mvnrepository.com/artifact/mysql/mysql-connector-java |

## Example

Expand Down
24 changes: 22 additions & 2 deletions docs/en/connector-v2/sink/Kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ By default, we will use 2pc to guarantee the message is sent to kafka exactly on
| bootstrap.servers | string | yes | - |
| kafka.* | kafka producer config | no | - |
| semantic | string | no | NON |
| partition_key | string | no | - |
| partition | int | no | - |
| assign_partitions | list | no | - |
| transaction_prefix | string | no | - |
Expand Down Expand Up @@ -50,6 +51,23 @@ In AT_LEAST_ONCE, producer will wait for all outstanding messages in the Kafka b

NON does not provide any guarantees: messages may be lost in case of issues on the Kafka broker and messages may be duplicated.

### partition_key [string]

Configure which field is used as the key of the kafka message.

For example, if you want to use value of a field from upstream data as key, you can assign it to the field name.

Upstream data is the following:

| name | age | data |
| ---- | ---- | ------------- |
| Jack | 16 | data-example1 |
| Mary | 23 | data-example2 |

If name is set as the key, then the hash value of the name column will determine which partition the message is sent to.

If the field name does not exist in the upstream data, the configured parameter will be used as the key.

### partition [int]

We can specify the partition, all messages will be sent to this partition.
Expand Down Expand Up @@ -93,7 +111,9 @@ sink {

### change log
#### next version

- Add kafka sink doc
- New feature : Kafka specified partition to send
- New feature : Determine the partition that kafka send based on the message content
- New feature : Determine the partition that kafka send message based on the message content
- New feature : Configure which field is used as the key of the kafka message

10 changes: 10 additions & 0 deletions docs/en/connector-v2/sink/common-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,27 @@
| name | type | required | default value |
| ----------------- | ------ | -------- | ------------- |
| source_table_name | string | no | - |
| parallelism | int | no | - |


### source_table_name [string]

When `source_table_name` is not specified, the current plug-in processes the data set `dataset` output by the previous plugin in the configuration file;

When `source_table_name` is specified, the current plug-in is processing the data set corresponding to this parameter.

### parallelism [int]

When `parallelism` is not specified, the `parallelism` in env is used by default.

When parallelism is specified, it will override the parallelism in env.

## Examples

```bash
source {
FakeSourceStream {
parallelism = 2
result_table_name = "fake"
field_name = "name,age"
}
Expand All @@ -37,6 +46,7 @@ transform {

sink {
console {
parallelism = 3
source_table_name = "fake_name"
}
}
Expand Down
30 changes: 20 additions & 10 deletions docs/en/connector-v2/source/FakeSource.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,17 @@ just for some test cases such as type conversion or connector new feature testin

## Options

| name | type | required | default value |
| -------------- | ------ | -------- | ------------- |
| schema | config | yes | - |
| row.num | int | no | 5 |
| map.size | int | no | 5 |
| array.size | int | no | 5 |
| bytes.length | int | no | 5 |
| string.length | int | no | 5 |
| common-options | | no | - |
| name | type | required | default value |
|---------------------|--------|----------|---------------|
| schema | config | yes | - |
| row.num | int | no | 5 |
| split.num | int | no | 1 |
| split.read-interval | long | no | 1 |
| map.size | int | no | 5 |
| array.size | int | no | 5 |
| bytes.length | int | no | 5 |
| string.length | int | no | 5 |
| common-options | | no | - |

### schema [config]

Expand Down Expand Up @@ -81,7 +83,15 @@ Source plugin common parameters, please refer to [Source Common Options](common-

### row.num

Total num of data that connector generated
The total number of data generated per degree of parallelism

### split.num

the number of splits generated by the enumerator for each degree of parallelism

### split.read-interval

The interval(mills) between two split reads in a reader

### map.size

Expand Down
43 changes: 29 additions & 14 deletions docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,21 @@ Read data from ftp file server.

## Options

| name | type | required | default value |
|-----------------|---------|----------|---------------------|
| host | string | yes | - |
| port | int | yes | - |
| user | string | yes | - |
| password | string | yes | - |
| path | string | yes | - |
| type | string | yes | - |
| delimiter | string | no | \001 |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
| common-options | | no | - |
| name | type | required | default value |
|----------------------------|---------|----------|---------------------|
| host | string | yes | - |
| port | int | yes | - |
| user | string | yes | - |
| password | string | yes | - |
| path | string | yes | - |
| type | string | yes | - |
| delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
| common-options | | no | - |

### host [string]

Expand Down Expand Up @@ -62,6 +63,20 @@ Field delimiter, used to tell connector how to slice and dice fields when readin

default `\001`, the same as hive's default delimiter

### parse_partition_from_path [boolean]

Control whether parse the partition keys and values from file path

For example if you read a file from path `ftp://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`

Every record data from file will be added these two fields:

| name | age |
|----------------|-----|
| tyrantlucifer | 26 |

Tips: **Do not define partition fields in schema option**

### date_format [string]

Date type format, used to tell connector how to convert string to date, supported as the following formats:
Expand Down
37 changes: 26 additions & 11 deletions docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,18 @@ Read all the data in a split in a pollNext call. What splits are read will be sa

## Options

| name | type | required | default value |
|-----------------|--------|----------|---------------------|
| path | string | yes | - |
| type | string | yes | - |
| fs.defaultFS | string | yes | - |
| delimiter | string | no | \001 |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
| common-options | | no | - |
| name | type | required | default value |
|----------------------------|---------|----------|---------------------|
| path | string | yes | - |
| type | string | yes | - |
| fs.defaultFS | string | yes | - |
| delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
| common-options | | no | - |

### path [string]

Expand All @@ -48,6 +49,20 @@ Field delimiter, used to tell connector how to slice and dice fields when readin

default `\001`, the same as hive's default delimiter

### parse_partition_from_path [boolean]

Control whether parse the partition keys and values from file path

For example if you read a file from path `hdfs://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`

Every record data from file will be added these two fields:

| name | age |
|----------------|-----|
| tyrantlucifer | 26 |

Tips: **Do not define partition fields in schema option**

### date_format [string]

Date type format, used to tell connector how to convert string to date, supported as the following formats:
Expand Down
1 change: 1 addition & 0 deletions docs/en/connector-v2/source/Jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ there are some reference value for params above.
| sqlserver | com.microsoft.sqlserver.jdbc.SQLServerDriver | jdbc:microsoft:sqlserver://localhost:1433 | https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc |
| oracle | oracle.jdbc.OracleDriver | jdbc:oracle:thin:@localhost:1521/xepdb1 | https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8 |
| gbase8a | com.gbase.jdbc.Driver | jdbc:gbase://e2e_gbase8aDb:5258/test | https://www.gbase8.cn/wp-content/uploads/2020/10/gbase-connector-java-8.3.81.53-build55.5.7-bin_min_mix.jar |
| starrocks | com.mysql.cj.jdbc.Driver | jdbc:mysql://localhost:3306/test | https://mvnrepository.com/artifact/mysql/mysql-connector-java |

## Example

Expand Down
35 changes: 25 additions & 10 deletions docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,17 @@ Read all the data in a split in a pollNext call. What splits are read will be sa

## Options

| name | type | required | default value |
|-----------------|--------|----------|---------------------|
| path | string | yes | - |
| type | string | yes | - |
| delimiter | string | no | \001 |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
| common-options | | no | - |
| name | type | required | default value |
|----------------------------|-----------|----------|---------------------|
| path | string | yes | - |
| type | string | yes | - |
| delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
| common-options | | no | - |

### path [string]

Expand All @@ -47,6 +48,20 @@ Field delimiter, used to tell connector how to slice and dice fields when readin

default `\001`, the same as hive's default delimiter

### parse_partition_from_path [boolean]

Control whether parse the partition keys and values from file path

For example if you read a file from path `file://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`

Every record data from file will be added these two fields:

| name | age |
|----------------|-----|
| tyrantlucifer | 26 |

Tips: **Do not define partition fields in schema option**

### date_format [string]

Date type format, used to tell connector how to convert string to date, supported as the following formats:
Expand Down

0 comments on commit d0430dd

Please sign in to comment.