Skip to content

Commit 973a2fa

Browse files
ruanwenjunashulin
authored andcommitted
Change file type to file_format_type in file source/sink (#4249)
# Conflicts: # docs/en/connector-v2/sink/OssFile.md # docs/en/connector-v2/sink/OssJindoFile.md # docs/en/connector-v2/sink/S3-Redshift.md # docs/en/connector-v2/sink/SftpFile.md # docs/en/connector-v2/source/FtpFile.md # docs/en/connector-v2/source/LocalFile.md # docs/en/connector-v2/source/SftpFile.md # seatunnel-connectors-v2/connector-file/connector-file-base-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/BaseHdfsFileSource.java # seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseFileSinkConfig.java # seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSinkConfig.java # seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java # seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java # seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/sink/FtpFileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/source/FtpFileSource.java # seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/source/FtpFileSourceFactory.java # seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/sink/HdfsFileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/HdfsFileSourceFactory.java # seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/sink/LocalFileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/LocalFileSource.java # seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/LocalFileSourceFactory.java # seatunnel-connectors-v2/connector-file/connector-file-oss-jindo/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/sink/OssFileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-oss-jindo/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSource.java # seatunnel-connectors-v2/connector-file/connector-file-oss-jindo/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java # seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/sink/OssFileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSource.java # seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java # seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/catalog/S3Catalog.java # seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/sink/S3FileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/source/S3FileSource.java # seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/source/S3FileSourceFactory.java # seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSinkFactory.java # seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSource.java # seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSourceFactory.java # seatunnel-connectors-v2/connector-hive/src/main/java/org/apache/seatunnel/connectors/seatunnel/hive/sink/HiveSink.java # seatunnel-connectors-v2/connector-hive/src/main/java/org/apache/seatunnel/connectors/seatunnel/hive/source/HiveSource.java # seatunnel-connectors-v2/connector-s3-redshift/src/main/java/org/apache/seatunnel/connectors/seatunnel/redshift/sink/S3RedshiftFactory.java
1 parent 10447ae commit 973a2fa

File tree

60 files changed

+285
-193
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+285
-193
lines changed

docs/en/connector-v2/sink/HdfsFile.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
2020

2121
By default, we use 2PC commit to ensure `exactly-once`
2222

23-
- [x] file format
23+
- [x] file format type
2424
- [x] text
2525
- [x] csv
2626
- [x] parquet
@@ -39,7 +39,7 @@ By default, we use 2PC commit to ensure `exactly-once`
3939
| custom_filename | boolean | no | false | Whether you need custom the filename |
4040
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
4141
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
42-
| file_format | string | no | "csv" | |
42+
| file_format_type | string | no | "csv" | |
4343
| field_delimiter | string | no | '\001' | Only used when file_format is text |
4444
| row_delimiter | string | no | "\n" | Only used when file_format is text |
4545
| have_partition | boolean | no | false | Whether you need processing partitions. |
@@ -95,7 +95,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
9595
| m | Minute in hour |
9696
| s | Second in minute |
9797

98-
### file_format [string]
98+
### file_format_type [string]
9999

100100
We supported as the following file types:
101101

@@ -198,7 +198,7 @@ For text file format with `have_partition` and `custom_filename` and `sink_colum
198198
HdfsFile {
199199
fs.defaultFS = "hdfs://hadoopcluster"
200200
path = "/tmp/hive/warehouse/test2"
201-
file_format = "text"
201+
file_format_type = "text"
202202
field_delimiter = "\t"
203203
row_delimiter = "\n"
204204
have_partition = true
@@ -228,7 +228,7 @@ HdfsFile {
228228
custom_filename = true
229229
file_name_expression = "${transactionId}_${now}"
230230
filename_time_format = "yyyy.MM.dd"
231-
file_format = "parquet"
231+
file_format_type = "parquet"
232232
sink_columns = ["name","age"]
233233
is_enable_transaction = true
234234
}

docs/en/connector-v2/sink/OssFile.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ It only supports hadoop version **2.9.X+**.
2323

2424
By default, we use 2PC commit to ensure `exactly-once`
2525

26-
- [x] file format
26+
- [x] file format type
2727
- [x] text
2828
- [x] csv
2929
- [x] parquet
@@ -42,7 +42,7 @@ By default, we use 2PC commit to ensure `exactly-once`
4242
| custom_filename | boolean | no | false | Whether you need custom the filename |
4343
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
4444
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
45-
| file_format | string | no | "csv" | |
45+
| file_format_type | string | no | "csv" | |
4646
| field_delimiter | string | no | '\001' | Only used when file_format is text |
4747
| row_delimiter | string | no | "\n" | Only used when file_format is text |
4848
| have_partition | boolean | no | false | Whether you need processing partitions. |
@@ -103,7 +103,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
103103
| m | Minute in hour |
104104
| s | Second in minute |
105105

106-
### file_format [string]
106+
### file_format_type [string]
107107

108108
We supported as the following file types:
109109

@@ -188,7 +188,7 @@ For text file format with `have_partition` and `custom_filename` and `sink_colum
188188
access_key = "xxxxxxxxxxx"
189189
access_secret = "xxxxxxxxxxx"
190190
endpoint = "oss-cn-beijing.aliyuncs.com"
191-
file_format = "text"
191+
file_format_type = "text"
192192
field_delimiter = "\t"
193193
row_delimiter = "\n"
194194
have_partition = true
@@ -218,7 +218,7 @@ For parquet file format with `have_partition` and `sink_columns`
218218
partition_by = ["age"]
219219
partition_dir_expression = "${k0}=${v0}"
220220
is_partition_field_write_in_file = true
221-
file_format = "parquet"
221+
file_format_type = "parquet"
222222
sink_columns = ["name","age"]
223223
}
224224
@@ -234,7 +234,7 @@ For orc file format simple config
234234
access_key = "xxxxxxxxxxx"
235235
access_secret = "xxxxxxxxxxx"
236236
endpoint = "oss-cn-beijing.aliyuncs.com"
237-
file_format = "orc"
237+
file_format_type = "orc"
238238
}
239239

240240
```

docs/en/connector-v2/sink/OssJindoFile.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ It only supports hadoop version **2.9.X+**.
2323

2424
By default, we use 2PC commit to ensure `exactly-once`
2525

26-
- [x] file format
26+
- [x] file format type
2727
- [x] text
2828
- [x] csv
2929
- [x] parquet
@@ -42,7 +42,7 @@ By default, we use 2PC commit to ensure `exactly-once`
4242
| custom_filename | boolean | no | false | Whether you need custom the filename |
4343
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
4444
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
45-
| file_format | string | no | "csv" | |
45+
| file_format_type | string | no | "csv" | |
4646
| field_delimiter | string | no | '\001' | Only used when file_format is text |
4747
| row_delimiter | string | no | "\n" | Only used when file_format is text |
4848
| have_partition | boolean | no | false | Whether you need processing partitions. |
@@ -103,7 +103,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
103103
| m | Minute in hour |
104104
| s | Second in minute |
105105

106-
### file_format [string]
106+
### file_format_type [string]
107107

108108
We supported as the following file types:
109109

@@ -188,7 +188,7 @@ For text file format with `have_partition` and `custom_filename` and `sink_colum
188188
access_key = "xxxxxxxxxxx"
189189
access_secret = "xxxxxxxxxxx"
190190
endpoint = "oss-cn-beijing.aliyuncs.com"
191-
file_format = "text"
191+
file_format_type = "text"
192192
field_delimiter = "\t"
193193
row_delimiter = "\n"
194194
have_partition = true
@@ -214,7 +214,7 @@ For parquet file format with `sink_columns`
214214
access_key = "xxxxxxxxxxx"
215215
access_secret = "xxxxxxxxxxxxxxxxx"
216216
endpoint = "oss-cn-beijing.aliyuncs.com"
217-
file_format = "parquet"
217+
file_format_type = "parquet"
218218
sink_columns = ["name","age"]
219219
}
220220
@@ -230,7 +230,7 @@ For orc file format simple config
230230
access_key = "xxxxxxxxxxx"
231231
access_secret = "xxxxxxxxxxx"
232232
endpoint = "oss-cn-beijing.aliyuncs.com"
233-
file_format = "orc"
233+
file_format_type = "orc"
234234
}
235235

236236
```

docs/en/connector-v2/sink/S3-Redshift.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Output data to AWS Redshift.
1717

1818
By default, we use 2PC commit to ensure `exactly-once`
1919

20-
- [x] file format
20+
- [x] file format type
2121
- [x] text
2222
- [x] csv
2323
- [x] parquet
@@ -38,7 +38,7 @@ By default, we use 2PC commit to ensure `exactly-once`
3838
| access_secret | string | no | - |
3939
| hadoop_s3_properties | map | no | - |
4040
| file_name_expression | string | no | "${transactionId}" |
41-
| file_format | string | no | "text" |
41+
| file_format_type | string | no | "text" |
4242
| filename_time_format | string | no | "yyyy.MM.dd" |
4343
| field_delimiter | string | no | '\001' |
4444
| row_delimiter | string | no | "\n" |
@@ -118,7 +118,7 @@ hadoop_s3_properties {
118118

119119
Please note that, If `is_enable_transaction` is `true`, we will auto add `${transactionId}_` in the head of the file.
120120

121-
### file_format [string]
121+
### file_format_type [string]
122122

123123
We supported as the following file types:
124124

@@ -206,7 +206,7 @@ For text file format
206206
partition_dir_expression="${k0}=${v0}"
207207
is_partition_field_write_in_file=true
208208
file_name_expression="${transactionId}_${now}"
209-
file_format="text"
209+
file_format_type = "text"
210210
filename_time_format="yyyy.MM.dd"
211211
is_enable_transaction=true
212212
hadoop_s3_properties {
@@ -234,7 +234,7 @@ For parquet file format
234234
partition_dir_expression="${k0}=${v0}"
235235
is_partition_field_write_in_file=true
236236
file_name_expression="${transactionId}_${now}"
237-
file_format="parquet"
237+
file_format_type = "parquet"
238238
filename_time_format="yyyy.MM.dd"
239239
is_enable_transaction=true
240240
hadoop_s3_properties {
@@ -262,7 +262,7 @@ For orc file format
262262
partition_dir_expression="${k0}=${v0}"
263263
is_partition_field_write_in_file=true
264264
file_name_expression="${transactionId}_${now}"
265-
file_format="orc"
265+
file_format_type = "orc"
266266
filename_time_format="yyyy.MM.dd"
267267
is_enable_transaction=true
268268
hadoop_s3_properties {

docs/en/connector-v2/sink/S3File.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ To use this connector you need put hadoop-aws-3.1.4.jar and aws-java-sdk-bundle-
2222

2323
By default, we use 2PC commit to ensure `exactly-once`
2424

25-
- [x] file format
25+
- [x] file format type
2626
- [x] text
2727
- [x] csv
2828
- [x] parquet
@@ -42,7 +42,7 @@ By default, we use 2PC commit to ensure `exactly-once`
4242
| custom_filename | boolean | no | false | Whether you need custom the filename |
4343
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
4444
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
45-
| file_format | string | no | "csv" | |
45+
| file_format_type | string | no | "csv" | |
4646
| field_delimiter | string | no | '\001' | Only used when file_format is text |
4747
| row_delimiter | string | no | "\n" | Only used when file_format is text |
4848
| have_partition | boolean | no | false | Whether you need processing partitions. |
@@ -120,7 +120,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
120120
| m | Minute in hour |
121121
| s | Second in minute |
122122

123-
### file_format [string]
123+
### file_format_type [string]
124124

125125
We supported as the following file types:
126126

@@ -205,7 +205,7 @@ For text file format with `have_partition` and `custom_filename` and `sink_colum
205205
path="/seatunnel/text"
206206
fs.s3a.endpoint="s3.cn-north-1.amazonaws.com.cn"
207207
fs.s3a.aws.credentials.provider="com.amazonaws.auth.InstanceProfileCredentialsProvider"
208-
file_format="text"
208+
file_format_type = "text"
209209
field_delimiter = "\t"
210210
row_delimiter = "\n"
211211
have_partition = true
@@ -237,7 +237,7 @@ For parquet file format simple config with `org.apache.hadoop.fs.s3a.SimpleAWSCr
237237
fs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
238238
access_key = "xxxxxxxxxxxxxxxxx"
239239
secret_key = "xxxxxxxxxxxxxxxxx"
240-
file_format="parquet"
240+
file_format_type = "parquet"
241241
hadoop_s3_properties {
242242
"fs.s3a.buffer.dir" = "/data/st_test/s3a"
243243
"fs.s3a.fast.upload.buffer" = "disk"
@@ -258,7 +258,7 @@ For orc file format simple config with `org.apache.hadoop.fs.s3a.SimpleAWSCreden
258258
fs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
259259
access_key = "xxxxxxxxxxxxxxxxx"
260260
secret_key = "xxxxxxxxxxxxxxxxx"
261-
file_format="orc"
261+
file_format_type = "orc"
262262
}
263263
264264
```

docs/en/connector-v2/sink/SftpFile.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
2020

2121
By default, we use 2PC commit to ensure `exactly-once`
2222

23-
- [x] file format
23+
- [x] file format type
2424
- [x] text
2525
- [x] csv
2626
- [x] parquet
@@ -39,7 +39,7 @@ By default, we use 2PC commit to ensure `exactly-once`
3939
| custom_filename | boolean | no | false | Whether you need custom the filename |
4040
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
4141
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
42-
| file_format | string | no | "csv" | |
42+
| file_format_type | string | no | "csv" | |
4343
| field_delimiter | string | no | '\001' | Only used when file_format is text |
4444
| row_delimiter | string | no | "\n" | Only used when file_format is text |
4545
| have_partition | boolean | no | false | Whether you need processing partitions. |
@@ -100,7 +100,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
100100
| m | Minute in hour |
101101
| s | Second in minute |
102102

103-
### file_format [string]
103+
### file_format_type [string]
104104

105105
We supported as the following file types:
106106

@@ -185,7 +185,7 @@ SftpFile {
185185
username = "username"
186186
password = "password"
187187
path = "/data/sftp"
188-
file_format = "text"
188+
file_format_type = "text"
189189
field_delimiter = "\t"
190190
row_delimiter = "\n"
191191
have_partition = true

docs/en/connector-v2/source/FtpFile.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
2222
- [x] [column projection](../../concept/connector-v2-features.md)
2323
- [x] [parallelism](../../concept/connector-v2-features.md)
2424
- [ ] [support user-defined split](../../concept/connector-v2-features.md)
25-
- [x] file format
25+
- [x] file format type
2626
- [x] text
2727
- [x] csv
2828
- [x] json
@@ -36,7 +36,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
3636
| user | string | yes | - |
3737
| password | string | yes | - |
3838
| path | string | yes | - |
39-
| type | string | yes | - |
39+
| file_format_type | string | yes | - |
4040
| read_columns | list | no | - |
4141
| delimiter | string | no | \001 |
4242
| parse_partition_from_path | boolean | no | true |
@@ -139,7 +139,7 @@ The file type supported column projection as the following shown:
139139

140140
**Tips: If the user wants to use this feature when reading `text` `json` `csv` files, the schema option must be configured**
141141

142-
### type [string]
142+
### file_format_type [string]
143143

144144
File type, supported as the following file types:
145145

@@ -230,7 +230,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
230230
port = 21
231231
user = tyrantlucifer
232232
password = tianchao
233-
type = "text"
233+
file_format_type = "text"
234234
schema = {
235235
name = string
236236
age = int

docs/en/connector-v2/source/HdfsFile.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Read all the data in a split in a pollNext call. What splits are read will be sa
2525
- [x] [column projection](../../concept/connector-v2-features.md)
2626
- [x] [parallelism](../../concept/connector-v2-features.md)
2727
- [ ] [support user-defined split](../../concept/connector-v2-features.md)
28-
- [x] file format
28+
- [x] file format file
2929
- [x] text
3030
- [x] csv
3131
- [x] parquet
@@ -37,7 +37,7 @@ Read all the data in a split in a pollNext call. What splits are read will be sa
3737
| name | type | required | default value |
3838
|---------------------------|---------|----------|---------------------|
3939
| path | string | yes | - |
40-
| type | string | yes | - |
40+
| file_format_type | string | yes | - |
4141
| fs.defaultFS | string | yes | - |
4242
| read_columns | list | yes | - |
4343
| hdfs_site_path | string | no | - |
@@ -110,7 +110,7 @@ For example, set like following:
110110

111111
then Seatunnel will skip the first 2 lines from source files
112112

113-
### type [string]
113+
### file_format_type [string]
114114

115115
File type, supported as the following file types:
116116

@@ -244,7 +244,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
244244
245245
HdfsFile {
246246
path = "/apps/hive/demo/student"
247-
type = "parquet"
247+
file_format_type = "parquet"
248248
fs.defaultFS = "hdfs://namenode001"
249249
}
250250

0 commit comments

Comments
 (0)