Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2] Add support for XML file type to various file connectors #6327

Merged
merged 1 commit into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 17 additions & 1 deletion docs/en/connector-v2/sink/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] orc
- [x] json
- [x] excel
- [x] xml

## Options

Expand Down Expand Up @@ -57,6 +58,9 @@ By default, we use 2PC commit to ensure `exactly-once`
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |

### path [string]

Expand Down Expand Up @@ -110,7 +114,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file

We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`
`text` `json` `csv` `orc` `parquet` `excel` `xml`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.

Expand Down Expand Up @@ -189,6 +193,18 @@ When File Format is Excel,The maximum number of data items that can be cached in

Writer the sheet of the workbook

### xml_root_tag [string]

Specifies the tag name of the root element within the XML file.

### xml_row_tag [string]

Specifies the tag name of the data rows within the XML file.

### xml_use_attr_format [boolean]

Specifies Whether to process data using the tag attribute format.

## Example

For text file format with `have_partition` and `custom_filename` and `sink_columns`
Expand Down
18 changes: 17 additions & 1 deletion docs/en/connector-v2/sink/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] orc
- [x] json
- [x] excel
- [x] xml

## Options

Expand Down Expand Up @@ -56,6 +57,9 @@ By default, we use 2PC commit to ensure `exactly-once`
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |

### host [string]

Expand Down Expand Up @@ -115,7 +119,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file

We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`
`text` `json` `csv` `orc` `parquet` `excel` `xml`

Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

Expand Down Expand Up @@ -194,6 +198,18 @@ When File Format is Excel,The maximum number of data items that can be cached in

Writer the sheet of the workbook

### xml_root_tag [string]

Specifies the tag name of the root element within the XML file.

### xml_row_tag [string]

Specifies the tag name of the data rows within the XML file.

### xml_use_attr_format [boolean]

Specifies Whether to process data using the tag attribute format.

## Example

For text file format simple config
Expand Down
6 changes: 5 additions & 1 deletion docs/en/connector-v2/sink/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] orc
- [x] json
- [x] excel
- [x] xml
- [x] compress codec
- [x] lzo

Expand All @@ -45,7 +46,7 @@ Output data to hdfs file
| custom_filename | boolean | no | false | Whether you need custom the filename |
| file_name_expression | string | no | "${transactionId}" | Only used when `custom_filename` is `true`.`file_name_expression` describes the file expression which will be created into the `path`. We can add the variable `${now}` or `${uuid}` in the `file_name_expression`, like `test_${uuid}_${now}`,`${now}` represents the current time, and its format can be defined by specifying the option `filename_time_format`.Please note that, If `is_enable_transaction` is `true`, we will auto add `${transactionId}_` in the head of the file. |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when `custom_filename` is `true`.When the format in the `file_name_expression` parameter is `xxxx-${now}` , `filename_time_format` can specify the time format of the path, and the default value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:[y:Year,M:Month,d:Day of month,H:Hour in day (0-23),m:Minute in hour,s:Second in minute] |
| file_format_type | string | no | "csv" | We supported as the following file types:`text` `json` `csv` `orc` `parquet` `excel`.Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`. |
| file_format_type | string | no | "csv" | We supported as the following file types:`text` `json` `csv` `orc` `parquet` `excel` `xml`.Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`. |
| field_delimiter | string | no | '\001' | Only used when file_format is text,The separator between columns in a row of data. Only needed by `text` file format. |
| row_delimiter | string | no | "\n" | Only used when file_format is text,The separator between rows in a file. Only needed by `text` file format. |
| have_partition | boolean | no | false | Whether you need processing partitions. |
Expand All @@ -63,6 +64,9 @@ Output data to hdfs file
| common-options | object | no | - | Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details |
| max_rows_in_memory | int | no | - | Only used when file_format is excel.When File Format is Excel,The maximum number of data items that can be cached in the memory. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel.Writer the sheet of the workbook |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml, specifies the tag name of the root element within the XML file. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml, specifies the tag name of the data rows within the XML file |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml, specifies Whether to process data using the tag attribute format. |

### Tips

Expand Down
18 changes: 17 additions & 1 deletion docs/en/connector-v2/sink/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] orc
- [x] json
- [x] excel
- [x] xml

## Options

Expand All @@ -51,6 +52,9 @@ By default, we use 2PC commit to ensure `exactly-once`
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |
| enable_header_write | boolean | no | false | Only used when file_format_type is text,csv.<br/> false:don't write header,true:write header. |

### path [string]
Expand Down Expand Up @@ -89,7 +93,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file

We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`
`text` `json` `csv` `orc` `parquet` `excel` `xml`

Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

Expand Down Expand Up @@ -168,6 +172,18 @@ When File Format is Excel,The maximum number of data items that can be cached in

Writer the sheet of the workbook

### xml_root_tag [string]

Specifies the tag name of the root element within the XML file.

### xml_row_tag [string]

Specifies the tag name of the data rows within the XML file.

### xml_use_attr_format [boolean]

Specifies Whether to process data using the tag attribute format.

### enable_header_write [boolean]

Only used when file_format_type is text,csv.false:don't write header,true:write header.
Expand Down
18 changes: 17 additions & 1 deletion docs/en/connector-v2/sink/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] orc
- [x] json
- [x] excel
- [x] xml

## Data Type Mapping

Expand Down Expand Up @@ -108,6 +109,9 @@ If write to `csv`, `text` file type, All column will be string.
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |

### path [string]

Expand Down Expand Up @@ -161,7 +165,7 @@ When the format in the `file_name_expression` parameter is `xxxx-${Now}` , `file

We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`
`text` `json` `csv` `orc` `parquet` `excel` `xml`

Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

Expand Down Expand Up @@ -240,6 +244,18 @@ When File Format is Excel,The maximum number of data items that can be cached in

Writer the sheet of the workbook

### xml_root_tag [string]

Specifies the tag name of the root element within the XML file.

### xml_row_tag [string]

Specifies the tag name of the data rows within the XML file.

### xml_use_attr_format [boolean]

Specifies Whether to process data using the tag attribute format.

## How to Create an Oss Data Synchronization Jobs

The following example demonstrates how to create a data synchronization job that reads data from Fake Source and writes it to the Oss:
Expand Down