Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-25198][docs] Add doc about name and description of operator #18400

Closed
wants to merge 2 commits into from

Conversation

wenlong88
Copy link
Contributor

What is the purpose of the change

this is part of This PR is part of https://cwiki.apache.org/confluence/display/FLINK/FLIP-195%3A+Improve+the+name+and+structure+of+vertex+and+operator+name+for+job , aims at to add doc about name and description.

Brief change log

add docs about name and description in the section of operators.

Verifying this change

This change is a trivial work without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 0ee72a0 (Wed Jan 19 03:59:29 UTC 2022)

Warnings:

  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 19, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@gaoyunhaii gaoyunhaii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wenlong88 for the PR! I have left some comments.

@@ -755,5 +755,43 @@ someStream.filter(...).slotSharingGroup("name")
```python
some_stream.filter(...).slot_sharing_group("name")
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wenlong88 If convenient could you also add the following two lines to fix the previous incomplete labels? Or it could be put into a separate hotfix commit in this PR.

{{< /tab >}}
{{< /tabs>}}

描述主要用在执行计划展示,以及用户界面展示。节点的描述同样是根据节点中算子的描述来构建。
描述可以包括详细的算子行为的信息,以便我们在运行时进行debug分析。

{{< tabs slotsharing >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use the same tabs name, otherwise it would cause issues. Might change to namedescription ? Same to the English version

@@ -755,3 +755,38 @@ some_stream.filter(...).slot_sharing_group("name")
```
{{< /tab >}}
{{< /tabs>}}

## 名字和描述
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line after the title? Same to the English version

{{< /tab >}}
{{< /tabs>}}

The format of description of a job vertex would be a tree format string by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be a tree -> is a tree ?


The name of operator and job vertex will be used in web ui, thread name, logging, metrics, etc.
The name of a job vertex is constructed based on the name of operators in it.
The name need to be as concise as possible to avoid big pressure on external systems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need -> needs ?
big pressure -> high pressure ?

Operators generated by Flink SQL will have a name consisted by type of operator and id, and a detailed description, by default.
Users can set `table.optimizer.simplify-operator-name-enabled` to be `false`, if they want to set name to be the detailed description as what it is in former versions.

When the topology of the pipeline is complex, users can add a topological index in the name of vertex by set `pipeline.vertex-name-include-index-prefix` to be `true`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set ... to be -> set ... to ?

Flink里的算子和作业节点会有一个名字和一个描述。名字和描述。名字和描述都是用来介绍一个算子或者节点是在做什么操作,但是他们会被用在不同地方。

名字会用在用户界面、线程名、日志、指标等场景。节点的名字会根据节点中算子的名字来构建。
名字需要尽可能的简介,避免对外部系统产生大的压力。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

简介 -> 简洁 ?

{{< /tab >}}
{{< /tabs>}}

节点的描述默认是按照一个多行的树形结构来构建的,用户可以通过把`pipeline.vertex-description-mode`设为`CASCADING`, 实现将描述改外老版本的单行递归模式。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改外 -> 改为?

Flink SQL框架生成的算子的名字是由算子的类型以及id构成一个简化的名字,详细的描述信息会被放到描述里。
用户可以通过将`table.optimizer.simplify-operator-name-enabled`设为`false`,关闭名字简化的功能,关闭后,名字会像以前的版本一样,带有详细的描述信息。

当一个作业的拓扑很复杂是,用户可以把`pipeline.vertex-name-include-index-prefix`设为`true`,在节点的名字前增加一个一个拓扑序的前缀,这样就可以很容易根据指标以及日志的信息快速找到拓扑图中对应节点。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一个一个 -> 一个 ?

Flink SQL框架生成的算子的名字是由算子的类型以及id构成一个简化的名字,详细的描述信息会被放到描述里。
用户可以通过将`table.optimizer.simplify-operator-name-enabled`设为`false`,关闭名字简化的功能,关闭后,名字会像以前的版本一样,带有详细的描述信息。

当一个作业的拓扑很复杂是,用户可以把`pipeline.vertex-name-include-index-prefix`设为`true`,在节点的名字前增加一个一个拓扑序的前缀,这样就可以很容易根据指标以及日志的信息快速找到拓扑图中对应节点。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

很复杂是 -> 很复杂时

@wenlong88
Copy link
Contributor Author

@gaoyunhaii thanks for the detailed review, I have addressed the comments

Copy link
Contributor

@gaoyunhaii gaoyunhaii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wenlong88 for the update! LGTM

MrWhiteSike pushed a commit to MrWhiteSike/flink that referenced this pull request Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants