From 1611470488aa53bec5646e5c19c782b8c8cfb5eb Mon Sep 17 00:00:00 2001 From: Anonymitaet <50226895+Anonymitaet@users.noreply.github.com> Date: Thu, 10 Oct 2019 11:47:51 +0800 Subject: [PATCH] [Doc] Add *HDFS2 sink connector guide* (#5226) * Add *HDFS2 sink connector guide* * Update * Update --- site2/docs/io-connectors.md | 4 +-- site2/docs/io-hdfs.md | 26 ------------------ site2/docs/io-hdfs2-sink.md | 53 +++++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+), 28 deletions(-) delete mode 100644 site2/docs/io-hdfs.md create mode 100644 site2/docs/io-hdfs2-sink.md diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md index 03e0685198311..9ffffb59da332 100644 --- a/site2/docs/io-connectors.md +++ b/site2/docs/io-connectors.md @@ -50,9 +50,9 @@ Pulsar has various sink connectors, which are sorted alphabetically as below. - [HBase sink connector](io-hbase.md) -- [HDFS2 sink connector](io-hdfs2.md) +- [HDFS2 sink connector](io-hdfs2-sink.md) -- [HDFS3 sink connector](io-hdfs3.md) +- [HDFS3 sink connector](io-hdfs3-sink.md) - [InfluxDB sink connector](io-influxdb-sink.md) diff --git a/site2/docs/io-hdfs.md b/site2/docs/io-hdfs.md deleted file mode 100644 index e5c70f2ade79c..0000000000000 --- a/site2/docs/io-hdfs.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -id: io-hdfs -title: Hdfs Connector -sidebar_label: Hdfs Connector ---- - -## Sink - -The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages -to an hdfs file. - -## Sink Configuration Options - -| Name | Default | Required | Description | -|------|---------|----------|-------------| -| `hdfsConfigResources` | `null` | `true` | A file or comma separated list of files which contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'. | -| `directory` | `null` | `true` | The HDFS directory from which files should be read from or written to. | -| `encoding` | `null` | `false` | The character encoding for the files, e.g. UTF-8, ASCII, etc. | -| `compression` | `null` | `false` | The compression codec used to compress/de-compress the files on HDFS. | -| `kerberosUserPrincipal` | `null` | `false` | The Kerberos user principal account to use for authentication. | -| `keytab` | `null` | `false` | The full pathname to the Kerberos keytab file to use for authentication. | -| `filenamePrefix` | `null` | `false` | The prefix of the files to create inside the HDFS directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being produced. | -| `fileExtension` | `null` | `false` | The extension to add to the files written to HDFS, e.g. '.txt', '.seq', etc. | -| `separator` | `null` | `false` | The character to use to separate records in a text file. If no value is provided then the content from all of the records will be concatenated together in one continuous byte array. | -| `syncInterval` | `null` | `false` | The interval (in milliseconds) between calls to flush data to HDFS disk. | -| `maxPendingRecords` | `Integer.MAX_VALUE` | `false` | The maximum number of records that we hold in memory before acking. Default is `Integer.MAX_VALUE`. Setting this value to one, results in every record being sent to disk before the record is acked, while setting it to a higher values allows us to buffer records before flushing them all to disk. | \ No newline at end of file diff --git a/site2/docs/io-hdfs2-sink.md b/site2/docs/io-hdfs2-sink.md new file mode 100644 index 0000000000000..976969553df6e --- /dev/null +++ b/site2/docs/io-hdfs2-sink.md @@ -0,0 +1,53 @@ +--- +id: io-hdfs2-sink +title: HDFS2 sink connector +sidebar_label: HDFS2 sink connector +--- + +The HDFS2 sink connector pulls the messages from Pulsar topics +and persists the messages to HDFS files. + +## Configuration + +The configuration of the HDFS2 sink connector has the following properties. + +### Property + +| Name | Type|Required | Default | Description +|------|----------|----------|---------|-------------| +| `hdfsConfigResources` | String|true| None | A file or a comma-separated list containing the Hadoop file system configuration.

**Example**
'core-site.xml'
'hdfs-site.xml' | +| `directory` | String | true | None|The HDFS directory where files read from or written to. | +| `encoding` | String |false |None |The character encoding for the files.

**Example**
UTF-8
ASCII | +| `compression` | Compression |false |None |The compression code used to compress or de-compress the files on HDFS.

Below are the available options:
  • BZIP2
  • DEFLATE
  • GZIP
  • LZ4
  • SNAPPY| +| `kerberosUserPrincipal` |String| false| None|The principal account of Kerberos user used for authentication. | +| `keytab` | String|false|None| The full pathname of the Kerberos keytab file used for authentication. | +| `filenamePrefix` |String| false |None |The prefix of the files created inside the HDFS directory.

    **Example**
    The value of topicA result in files named topicA-. | +| `fileExtension` | String| false | None| The extension added to the files written to HDFS.

    **Example**
    '.txt'
    '.seq' | +| `separator` | char|false |None |The character used to separate records in a text file.

    If no value is provided, the contents from all records are concatenated together in one continuous byte array. | +| `syncInterval` | long| false |0| The interval between calls to flush data to HDFS disk in milliseconds. | +| `maxPendingRecords` |int| false|Integer.MAX_VALUE | The maximum number of records that hold in memory before acking.

    Setting this property to 1 makes every record send to disk before the record is acked.

    Setting this property to a higher value allows buffering records before flushing them to disk. + +### Example + +Before using the HDFS2 sink connector, you need to create a configuration file through one of the following methods. + +* JSON + + ```json + { + "hdfsConfigResources": "core-site.xml", + "directory": "/foo/bar", + "filenamePrefix": "prefix", + "compression": "SNAPPY" + } + ``` + +* YAML + + ```yaml + configs: + hdfsConfigResources: "core-site.xml" + directory: "/foo/bar" + filenamePrefix: "prefix" + compression: "SNAPPY" + ```