[FLINK-20180][fs-connector][translation] Translate FileSink document into Chinese #14077

gaoyunhaii · 2020-11-16T06:33:42Z

What is the purpose of the change

Translating the FileSink document into Chinese.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable

flinkbot · 2020-11-16T06:36:55Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 605e600 (Mon Nov 16 06:36:55 UTC 2020)

✅no warnings

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2020-11-16T06:46:29Z

CI report:

ca8f49b UNKNOWN
0756072 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

guoweiM · 2020-11-16T12:05:04Z

docs/dev/connectors/file_sink.zh.md


-桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。
+桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。对于行编码格式（参考 [File Formats](#file-formats) ）默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。对于批量编码格式我们需要在每次 Checkpoint 时切割文件，但是用户也可以指定额外的基于文件大小和超时时间的条件。


Maybe we could change the "我们" to more specific. For example ”批量编码格式的默认策略是每次在checkpoint时滚动文件。“

A little difference is that the english specify that "must roll on checkpoint", thus translated to "必须切割文件", the other part is changed.

guoweiM · 2020-11-16T12:07:26Z

docs/dev/connectors/file_sink.zh.md


-桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。
+桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。对于行编码格式（参考 [File Formats](#file-formats) ）默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。对于批量编码格式我们需要在每次 Checkpoint 时切割文件，但是用户也可以指定额外的基于文件大小和超时时间的条件。


”时间的条件“

Maybe we could change the "条件" to "策略" for keeping the consistence.

kl0u · 2020-11-16T12:22:23Z

Hi @gaoyunhaii , just to avoid doing redundant work, it may make sense to wait for the english docs to get finalised before merging this one. I hope this will be done soon #14061.

guoweiM · 2020-11-16T12:28:31Z

docs/dev/connectors/file_sink.zh.md

@@ -603,7 +604,7 @@ Flink 有两个内置的 BucketAssigners ：

 ## 滚动策略

-滚动策略 [RollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html) 定义了指定的文件在何时关闭（closed）并将其变为 Pending 状态，随后变为 Finished 状态。处于 Pending 状态的文件会在下一次 Checkpoint 时变为 Finished 状态，通过设置 Checkpoint 间隔时间，可以控制部分文件（part file）对下游读取者可用的速度、大小和数量。
+在流模式下，滚动策略 [RollingPolicy]({{ site.javadocs_baseurl }}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html) 定义了指定的文件在何时关闭（closed）并将其变为 Pending 状态，随后变为 Finished 状态。处于 Pending 状态的文件会在下一次 Checkpoint 时变为 Finished 状态，通过设置 Checkpoint 间隔时间，可以控制部分文件（part file）对下游读取者可用的速度、大小和数量。在批模式下，临时文件只会在作业处理完所有输入数据后提交，此时滚动策略可以用来控制每个文件的大小。


Maybe we could not introduce a new concept "临时文件". So maybe we could change to following ：
在批模式下，所有文件只会在作业处理完所有输入数据后才变为Finished状态

guoweiM · 2020-11-16T12:42:16Z

docs/dev/connectors/file_sink.zh.md


-Users who want to add user metadata to the ORC files can do so by calling `addUserMetadata(...)` inside the overriding
-`vectorize(...)` method.
+给 ORC 文件添加自定义元数据可以通过在覆盖的 `vectorize(...)` 方法中调用 `addUserMetadata(...)` 实现：


”覆盖“ -》重载

这里应该是覆盖不是重载哈？

和下面一样改成实现了

guoweiM · 2020-11-16T12:43:02Z

docs/dev/connectors/file_sink.zh.md

@@ -454,8 +449,7 @@ input.sinkTo(sink)
 </div>
 </div>

-OrcBulkWriterFactory can also take Hadoop `Configuration` and `Properties` so that a custom Hadoop configuration and ORC
-writer properties can be provided.
+用户还可以通过 Hadoop `Configuration` 和 `Properties` 来设置 OrcBulkWriterFactory 中涉及的 Hadoop 属性和 Writer 属性：


Writer ---》 ORC Writer

guoweiM · 2020-11-16T12:43:30Z

docs/dev/connectors/file_sink.zh.md

@@ -404,7 +399,7 @@ class PersonVectorizer(schema: String) extends Vectorizer[Person](schema) {
 </div>
 </div>

-To use the ORC bulk encoder in an application, users need to add the following dependency:
+为了在应用使用 ORC 批量编码，用户需要添加如下依赖：


为了在应用中

guoweiM · 2020-11-16T12:49:19Z

docs/dev/connectors/file_sink.zh.md


-Like any other columnar format that encodes data in bulk fashion, Flink's `OrcBulkWriter` writes the input elements in batches. It uses
-ORC's `VectorizedRowBatch` to achieve this.
+和其它基于列式存储的批量编码格式类似，Flink中的 `OrcBulkWriter` 将数据按批写出，它通过 ORC 的 VectorizedRowBatch 来实现这一点。


guoweiM · 2020-11-16T12:51:42Z

docs/dev/connectors/file_sink.zh.md

-class and override the `vectorize(T element, VectorizedRowBatch batch)` method. As you can see, the method provides an
-instance of `VectorizedRowBatch` to be used directly by the users so users just have to write the logic to transform the
-input `element` to `ColumnVectors` and set them in the provided `VectorizedRowBatch` instance.
+由于输入数据必须先缓存为一个完整的 `VectorizedRowBatch` ，用户需要继承 `Vectorizer` 抽像类并且覆盖其中的 `vectorize(T element, VectorizedRowBatch batch)` 方法。方法参数中传入的 `VectorizedRowBatch` 使用户只需将输入 `element` 转化为 `ColumnVectors` 并将它存储到所提供的 `VectorizedRowBatch` 实例中。


覆盖---》实现

gaoyunhaii · 2020-11-16T14:00:25Z

Hi @gaoyunhaii , just to avoid doing redundant work, it may make sense to wait for the english docs to get finalised before merging this one. I hope this will be done soon #14061.

+1 for wait till the english doc get merged and then I would also modify the translation to reveal the changes.

gaoyunhaii · 2020-11-16T14:00:41Z

Very thanks @guoweiM for the reviewing and I will update the PR~

kl0u · 2020-11-16T17:18:52Z

@gaoyunhaii I merged the english version.

… zh doc

gaoyunhaii · 2020-11-17T06:31:44Z

@guoweiM @kl0u Very thanks, I have updated the rebased the PR, and modified it according to the latest English version and comments.

guoweiM

Thanks @gaoyunhaii for resolving the comments. All parts are LGTM expect one I comment following.

guoweiM · 2020-11-17T09:17:00Z

docs/dev/connectors/file_sink.zh.md


 File Sink 会将数据写入到桶中。由于输入流可能是无界的，因此每个桶中的数据被划分为多个有限大小的文件。如何分桶是可以配置的，默认使用基于时间的分桶策略，这种策略每个小时创建一个新的桶，桶中包含的文件将记录所有该小时内从流中接收到的数据。

-桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。对于行编码格式（参考 [File Formats](#file-formats) ）默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。对于批量编码格式我们需要在每次 Checkpoint 时切割文件，但是用户也可以指定额外的基于文件大小和超时时间的条件。
+桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。对于行编码格式（参考 [File Formats](#file-formats) ）默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。批量编码格式必须在每次 Checkpoint 时切割文件，但是用户也可以指定额外的基于文件大小和超时时间的策略。


Why we use the "切割文件" not "滚动文件", which is consistent with other part in this section.

guoweiM

Thanks for resolving comments. and Look good to me. +1 for merge.

gaoyunhaii · 2020-11-23T09:19:24Z

Very thanks @guoweiM !

gaoyunhaii changed the title ~~Pr14061 add doc zh~~ [FLINK-20141][fs-connector] Translate FileSink document into Chinese Nov 16, 2020

gaoyunhaii mentioned this pull request Nov 16, 2020

[FLINK-20141][fs-connector] Add FileSink documentation #14061

Closed

rmetzger added the review=description? label Nov 16, 2020

rmetzger added component=Connectors/FileSystem component=Documentation labels Nov 16, 2020

guoweiM reviewed Nov 16, 2020

View reviewed changes

gaoyunhaii added 3 commits November 17, 2020 14:20

[FLINK-20180][fs-connector][translation] Copy the streaming file sink…

072732f

… zh doc

[FLINK-20180][fs-connector][translation] Modify the document to FileSink

78e0ce0

[FLINK-20180][fs-connector][translation] Translate ORC format

ce9da33

gaoyunhaii force-pushed the pr14061_add_doc_zh branch from 605e600 to ca8f49b Compare November 17, 2020 06:21

gaoyunhaii changed the title ~~[FLINK-20141][fs-connector] Translate FileSink document into Chinese~~ [FLINK-20180][fs-connector][translation] Translate FileSink document into Chinese Nov 17, 2020

Address comment & newly changed

58a2d81

gaoyunhaii force-pushed the pr14061_add_doc_zh branch from ca8f49b to 58a2d81 Compare November 17, 2020 06:29

rmetzger added component=chinese-translation and removed component=Documentation labels Nov 17, 2020

guoweiM reviewed Nov 17, 2020

View reviewed changes

Changing the word

0756072

guoweiM approved these changes Nov 23, 2020

View reviewed changes

kl0u closed this in 8abaaec Nov 23, 2020

flinkbot added the component=Documentation label Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-20180][fs-connector][translation] Translate FileSink document into Chinese #14077

[FLINK-20180][fs-connector][translation] Translate FileSink document into Chinese #14077

gaoyunhaii commented Nov 16, 2020

flinkbot commented Nov 16, 2020

flinkbot commented Nov 16, 2020 •

edited

guoweiM Nov 16, 2020

gaoyunhaii Nov 17, 2020

guoweiM Nov 16, 2020 •

edited

kl0u commented Nov 16, 2020

guoweiM Nov 16, 2020 •

edited

guoweiM Nov 16, 2020

gaoyunhaii Nov 17, 2020

gaoyunhaii Nov 17, 2020

guoweiM Nov 16, 2020

guoweiM Nov 16, 2020

guoweiM Nov 16, 2020

guoweiM Nov 16, 2020

gaoyunhaii commented Nov 16, 2020

gaoyunhaii commented Nov 16, 2020

kl0u commented Nov 16, 2020

gaoyunhaii commented Nov 17, 2020

guoweiM left a comment

guoweiM Nov 17, 2020

guoweiM left a comment

gaoyunhaii commented Nov 23, 2020


		桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。
		桶目录中的实际输出数据会被划分为多个部分文件（part file），每一个接收桶数据的 Sink Subtask ，至少包含一个部分文件（part file）。额外的部分文件（part file）将根据滚动策略创建，滚动策略是可以配置的。对于行编码格式（参考 [File Formats](#file-formats) ）默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间，以及文件关闭前的最长非活动时间。对于批量编码格式我们需要在每次 Checkpoint 时切割文件，但是用户也可以指定额外的基于文件大小和超时时间的条件。

[FLINK-20180][fs-connector][translation] Translate FileSink document into Chinese #14077

[FLINK-20180][fs-connector][translation] Translate FileSink document into Chinese #14077

Conversation

gaoyunhaii commented Nov 16, 2020

What is the purpose of the change

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Nov 16, 2020

Automated Checks

Review Progress

flinkbot commented Nov 16, 2020 • edited

CI report:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guoweiM Nov 16, 2020 • edited

Choose a reason for hiding this comment

kl0u commented Nov 16, 2020

guoweiM Nov 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaoyunhaii commented Nov 16, 2020

gaoyunhaii commented Nov 16, 2020

kl0u commented Nov 16, 2020

gaoyunhaii commented Nov 17, 2020

guoweiM left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guoweiM left a comment

Choose a reason for hiding this comment

gaoyunhaii commented Nov 23, 2020

flinkbot commented Nov 16, 2020 •

edited

guoweiM Nov 16, 2020 •

edited

guoweiM Nov 16, 2020 •

edited