Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-19247][docs-zh] Update Chinese documentation after removal of Kafka 0.10 and 0.11 #13410

Merged
merged 4 commits into from Oct 12, 2020

Conversation

Shawn-Hx
Copy link
Contributor

What is the purpose of the change

Update Chinese documentation after removal of Kafka 0.10 and 0.11.

Brief change log

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit d785011 (Thu Sep 17 12:29:10 UTC 2020)

✅no warnings

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 17, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@Shawn-Hx Shawn-Hx changed the title [FLINK-19247] Update Chinese documentation after removal of Kafka 0.10 and 0.11 [FLINK-19247][docs-zh] Update Chinese documentation after removal of Kafka 0.10 and 0.11 Sep 17, 2020
@Shawn-Hx
Copy link
Contributor Author

Hi, @klion26
This PR is ready to be reviewed. Could you help to review it at your convenience?
Thanks~

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shawn-Hx thanks for your contribution, left some comments, please take a look

@@ -23,90 +23,32 @@ specific language governing permissions and limitations
under the License.
-->

Flink 提供了 [Apache Kafka](https://kafka.apache.org) 连接器,用于向 Kafka topic 中读取或者写入数据,可提供精确一次的处理语义。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用于向 Kafka topic 中读取或者写入 这里需要把这句话拆开成 从xxx读取 向xxx 写入吗?现在读起来更像 向xxx读取有一点奇怪

要使用通用的 Kafka 连接器,请为它添加依赖关系:
Apache Flink 集成了通用的 Kafka 连接器,它会尽力与 Kafka client 的最新版本保持同步。
该连接器使用的 Kafka client 版本可能会在 Flink 版本之间发生变化。
当前 Kafka client 向后兼容 0.10.0 或更高版本的 Kafka broker。

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我不是很确定。
英文文档中的原文是:"Modern Kafka clients are backwards compatible with broker versions 0.10.0 or later"。"backwards compatible" 直译的话应该是 “向后兼容” ?


要使用通用的 Kafka 连接器,请为它添加依赖关系:
Apache Flink 集成了通用的 Kafka 连接器,它会尽力与 Kafka client 的最新版本保持同步。
该连接器使用的 Kafka client 版本可能会在 Flink 版本之间发生变化。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议这些放到一行,否则渲染之后,两行中间会有空格


然后,实例化 source( `FlinkKafkaConsumer`)和 sink( `FlinkKafkaProducer`)。除了从模块和类名中删除了特定的 Kafka 版本外,这个 API 向后兼容 Kafka 0.11 版本的 connector。
Flink 目前的流连接器还不是二进制发行版的一部分。
[在此处]({{ site.baseurl }}/zh/dev/project-configuration.html)可以了解到如何链接它们以实现在集群中执行。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 链接建议使用 {% link %} 的方式,参考邮件列表 [1]
2 如何链接它们以实现在集群中执行 这个还能否优化下呢,现在知道是什么意思,但是读起来还有点拗口

[1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html

用户如果要自己去实现一个`DeserializationSchema`,需要自己去实现 `getProducedType(...)`方法。

为了访问 Kafka 消息的 key、value 和元数据,`KafkaDeserializationSchema` 具有以下反序列化方法 `T deserialize(ConsumerRecord<byte[], byte[]> record)`。
Flink Kafka Consumer 需要知道如何将 Kafka 中的二进制数据转换为 Java 或者 Scala 对象。`KafkaDeserializationSchema` 允许用户指定这样的 schema,为每条 Kafka 消息调用 `T deserialize(ConsumerRecord<byte[], byte[]> record)` 方法,传递来自 Kafka 的值。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“为每条 Kafka 消息调用 T deserialize(ConsumerRecord<byte[], byte[]> record) 方法,传递来自 Kafka 的值” 这里如果改成 “每条 Kafka 中的消息会调用 T deserialize(ConsumerRecord<byte[], byte[]> record) 反序列化” 这样会好一些吗?

在内部,每个 Kafka 分区执行一个 assigner 实例。当指定了这样的 assigner 时,对于从 Kafka 读取的每条消息,调用 `extractTimestamp(T element, long previousElementTimestamp)` 来为记录分配时间戳,并为 `Watermark getCurrentWatermark()`(定期形式)或 `Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp)`(打点形式)以确定是否应该发出新的 watermark 以及使用哪个时间戳。

**请注意**:如果 watermark assigner 依赖于从 Kafka 读取的消息来上涨其 watermark (通常就是这种情况),那么所有主题和分区都需要有连续的消息流。否则,整个应用程序的 watermark 将无法上涨,所有基于时间的算子(例如时间窗口或带有计时器的函数)也无法运行。单个的 Kafka 分区也会导致这种反应。这是一个已在计划中的 Flink 改进,目的是为了防止这种情况发生(请见[FLINK-5479: Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions](https://issues.apache.org/jira/browse/FLINK-5479))。同时,可能的解决方法是将*心跳消息*发送到所有 consumer 的分区里,从而上涨空闲分区的 watermark。
**请注意**:如果 watermark assigner 依赖于从 Kafka 读取的消息来上涨其 watermark (通常就是这种情况),那么所有主题和分区都需要有连续的消息流。否则,整个应用程序的 watermark 将无法上涨,所有基于时间的算子(例如时间窗口或带有计时器的函数)也无法运行。单个的 Kafka 分区也会导致这种反应。考虑设置适当的 [idelness timeouts]({{ site.baseurl }}/dev/event_timestamps_watermarks.html#dealing-with-idle-sources) 来缓解这个问题。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

链接换成 {% link %} 的形式

@@ -302,8 +217,6 @@ Flink Kafka Consumer 支持发现动态创建的 Kafka 分区,并使用精准

默认情况下,是禁用了分区发现的。若要启用它,请在提供的属性配置中为 `flink.partition-discovery.interval-millis` 设置大于 0 的值,表示发现分区的间隔是以毫秒为单位的。

<span class="label label-danger">局限性</span> 当从 Flink 1.3.x 之前的 Flink 版本的 savepoint 恢复 consumer 时,分区发现无法在恢复运行时启用。如果启用了,那么还原将会失败并且出现异常。在这种情况下,为了使用分区发现,请首先在 Flink 1.3.x 中使用 savepoint,然后再从 savepoint 中恢复。

#### Topic 发现
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然更像这个文档了,也按照 wiki 的方式,给所有标题添加一下 标签吧,可以仿照英文版的 url 进行添加


默认情况下,如果没有为 Flink Kafka Producer 指定自定义分区程序,则 producer 将使用 `FlinkFixedPartitioner` 为每个 Flink Kafka Producer 并行子任务映射到单个 Kafka 分区(即,接收子任务接收到的所有消息都将位于同一个 Kafka 分区中)。
用户可以对如何将数据写到Kafka进行细粒度的控制。你可以通过 producer record:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
用户可以对如何将数据写到Kafka进行细粒度的控制。你可以通过 producer record:
用户可以对如何将数据写到 Kafka 进行细粒度的控制。你可以通过 producer record:


### Kafka Producer 和容错

启用 Flink 的checkpointing 后,`FlinkKafkaProducer` 可以提供精确一次的语义保证。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
启用 Flink 的checkpointing 后,`FlinkKafkaProducer` 可以提供精确一次的语义保证。
启用 Flink 的 checkpointing 后,`FlinkKafkaProducer` 可以提供精确一次的语义保证。

@@ -512,6 +434,17 @@ Flink 通过 Kafka 连接器提供了一流的支持,可以对 Kerberos 配置

有关 Kerberos 安全性 Flink 配置的更多信息,请参见[这里]({{ site.baseurl }}/zh/ops/config.html)。你也可以在[这里]({{ site.baseurl }}/zh/ops/security-kerberos.html)进一步了解 Flink 如何在内部设置基于 kerberos 的安全性。

## 升级到最近的连接器版本

通用的升级步骤概述见 [升级 Jobs 和 Flink 版本指南]({{ site.baseurl }}/ops/upgrading.html)。对于 Kafka,你还需要遵循这些步骤:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

链接替换成 {% link %}

@Shawn-Hx
Copy link
Contributor Author

Hi, @klion26
Thanks for your good suggestions !
I have make some changes according to your advice. Please take a look.

* `Semantic.NONE`:Flink 不会有任何语义的保证,产生的记录可能会丢失或重复。
* `Semantic.AT_LEAST_ONCE`(默认设置):可以保证不会丢失任何记录(但是记录可能会重复)
* `Semantic.EXACTLY_ONCE`:使用 Kafka 事务提供精确一次语义。无论何时,在使用事务写入 Kafka 时,都要记得为所有消费 Kafka 消息的应用程序设置所需的 `isolation.level`(`read_committed` 或 `read_uncommitted` - 后者是默认值)。

##### 注意事项
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是漏了 标签

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminding. Have added the tag.

@Shawn-Hx
Copy link
Contributor Author

Hi, @klion26 .
I have made changes according to your advice. Please take a look.
Thanks~

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shawn-Hx thanks for the update. LGTM now
cc @wuchong @rmetzger

@wuchong wuchong merged commit f234fad into apache:master Oct 12, 2020
@Shawn-Hx Shawn-Hx deleted the FLINK-19247 branch October 14, 2020 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants