Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-15639][k8s] Support to set tolerations for jobmanager and taskmanger pod #11606

Closed

Conversation

wangyang0918
Copy link
Contributor

What is the purpose of the change

The toleration is used to separate the K8s cluster into several individual partitions. Usually it it related with business group. So i treat is as a first class feature. Since at least in our production environment, every Flink job need to specify the toleration so that it could be scheduled to the corresponding resource pool.

Moreover, it is sth very like YARN partition.

Brief change log

  • Introduce the config option and set toleration to pod

Verifying this change

  • Covered by unit test
  • Manually test in a real k8s cluster

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs)

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 1, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit f95d4be (Wed Apr 01 15:15:00 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 1, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@wangyang0918
Copy link
Contributor Author

@tisonkun @zhengcanbin Do you mind to take a look at your convenience?

Copy link
Contributor

@zhengcanbin zhengcanbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @wangyang0918! The PR generally looks good to me. I have left two comments.

@wangyang0918
Copy link
Contributor Author

@zhengcanbin I have integrated your suggestions and rebased the latest master. Please take another look at your convenience.

Copy link
Contributor

@zhengcanbin zhengcanbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @wangyang0918! The PR looks good to me now.

@wangyang0918
Copy link
Contributor Author

wangyang0918 commented Apr 8, 2020

@tisonkun Do you mind to also take a look and help with merging? I think the changes in this PR is quite straightforward.

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @wangyang0918 ! One minor comment: the form of list of map config "should be" in the strict form.

.mapType()
.asList()
.noDefaultValue()
.withDescription("The user-specified tolerations to be set to the JobManager pod. The value could be " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.withDescription("The user-specified tolerations to be set to the JobManager pod. The value could be " +
.withDescription("The user-specified tolerations to be set to the JobManager pod. The value should be " +

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I will fix it.

.mapType()
.asList()
.noDefaultValue()
.withDescription("The user-specified tolerations to be set to the TaskManager pod. The value could be " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.withDescription("The user-specified tolerations to be set to the TaskManager pod. The value could be " +
.withDescription("The user-specified tolerations to be set to the TaskManager pod. The value should be " +

…manger pod

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints. Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.
@wangyang0918
Copy link
Contributor Author

@tisonkun Thanks for your review. I have integrated your comments and force pushed the PR. Please have another look.

@tisonkun tisonkun closed this in 30311ec Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants