Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-20664][k8s] Support setting service account for TaskManager pod. #14447

Closed
wants to merge 4 commits into from
Closed

[FLINK-20664][k8s] Support setting service account for TaskManager pod. #14447

wants to merge 4 commits into from

Conversation

blublinsky
Copy link

What is the purpose of the change

Currently, we only set the service account for JobManager. The TaskManager is using the default service account. Before the KubernetesHAService is introduced, it works because the TaskManager does not need to access the K8s resource(e.g. ConfigMap) directly. But now the TaskManager needs to watch ConfigMap and retrieve leader address. So if the default service account does not have enough permission, users could not specify a valid service account for TaskManager.

Brief change log

  • New configuration parameter TASK_MANAGER_SERVICE_ACCOUNT created
  • Method getServiceAccount added to KubernetesTaskManagerParameters
  • Service account added to the pod creation in InitTaskManagerDecorator
  • InitTaskManagerDecoratorTest is extended to incorporate service account test

Verifying this change

This change added tests and can be verified by running unit tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no) no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no) no
  • The serializers: (yes / no / don't know) no
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know) no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know) yes
  • The S3 file system connector: (yes / no / don't know) no

Documentation

  • Does this pull request introduce a new feature? (yes / no) yes
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) not documented

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit d07b609 (Mon Dec 21 14:49:33 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 21, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@wangyang0918 wangyang0918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blublinsky Thanks for creating this PR. Generally, it looks good to me. I left some comments and let's check it before merging.

Moreover, I have verified this change in minikube with K8s HA configured. It works well.

@@ -60,6 +60,13 @@
.withDescription("Service account that is used by jobmanager within kubernetes cluster. " +
"The job manager uses this service account when requesting taskmanager pods from the API server.");

public static final ConfigOption<String> TASK_MANAGER_SERVICE_ACCOUNT =
key("kubernetes.taskmanager.service-account")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we also need to update the native-kubernetes.md#RBAC section. At least, we need to tell users they could set the service account for TaskManager via kubernetes.taskmanager.service-account. And it is necessary only Kubernetes HA mode,.

@wangyang0918
Copy link
Contributor

The commit message also needs to be refined. I think it could be [FLINK-20664][k8s] Support setting service account for TaskManager pod.

@wangyang0918
Copy link
Contributor

@tillrohrmann is suggesting to introduce a common config option kubernetes.service-account as per default. @blublinsky would you mind to integrate this suggestion?

@blublinsky blublinsky changed the title [FLINK-20664][k8s] Support setting service account for TaskManager pod [FLINK-20664][k8s] Support setting service account for TaskManager pod. Dec 22, 2020
@blublinsky
Copy link
Author

Fixed requested. Not sure about Till's request

@wangyang0918
Copy link
Contributor

@blublinsky I think what till means is to introduce a common config option for service account. It could be kubernetes.service-account. It should only take effect when the jobmanager/taskmanager specific config option is not configured.

Then the KubernetesJobManagerParameters#getServiceAccount() could be like following. Same is for KubernetesTaskManagerParameters.

public String getServiceAccount() {
    return flinkConfig.getOptional(KubernetesConfigOptions.JOB_MANAGER_SERVICE_ACCOUNT)
	.orElse(flinkConfig.getString(KubernetesConfigOptions.SERVICE_ACCOUNT));
}

Moreover, we need a test for such fallback logics.

@blublinsky
Copy link
Author

Even with this approach the sequence for job manager and task manager should be different. otherwise, if a job manager account is defined it will always take precedence for both

@wangyang0918
Copy link
Contributor

@blublinsky Why the sequence for job manager and task manager should be different? In my opinion, the specific config option for jobmanager/taskmanager should always have the precedence.

@blublinsky
Copy link
Author

If I want to define tm and jm accounts differently, then sequence definitely matters.

@xintongsong
Copy link
Contributor

@blublinsky
I think what @tillrohrmann and @wangyang0918 mean is to have a general config option in addition to the jm/tm specific options.

There are three options in total:

  • kubernetes.service-account
  • kubernetes.service-account.jobmanager
  • kubernetes.service-account.taskmanager

JM uses the account defined by kubernetes.service-account.jobmanager, and if that is not defined fallback to kubernetes.service-account.
TM uses the account defined by kubernetes.service-account.taskmanager, and if that is not defined fallback to kubernetes.service-account.

In this way, a user can choose to either define a common account for JM/TM via kubernetes.service-account, or separate accounts via kubernetes.service-account.jobmanager and kubernetes.service-account.taskmanager.

@xintongsong
Copy link
Contributor

You may take a look at how the following config options work as an example.

  • env.java.opts
  • env.java.opts.jobmanager
  • env.java.opts.taskmanager
  • env.java.opts.historyserver
  • env.java.opts.client

@blublinsky
Copy link
Author

Oh, thats not what the code snipet reads. Thats fine

@blublinsky
Copy link
Author

Introduced a common config option for service account and added unit tests

Copy link
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blublinsky for preparing and updating the PR, and thank @wangyang0918 for the review.
Then changes LGTM. I have only a few minor comments, which I'll address myself during merging.

.stringType()
.defaultValue("default")
.withDescription("Service account that is used by jobmanager and taskmanger within kubernetes cluster. " +
"if specific JOB_MANAGER_SERVICE_ACCOUNT and/or TASK_MANAGER_SERVICE_ACCOUNT are not defined.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more user friendly to mention the configuration keys rather than the java field names.

And I believe we should also mention in descriptions of JOB_MANAGER_SERVICE_ACCOUNT and TASK_MANAGER_SERVICE_ACCOUNT that they will by default fallback to KUBERNETES_SERVICE_ACCOUNT.

import static org.junit.Assert.assertThat;

/**
* General tests for the {@link InitJobManagerDecorator}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent JavaDoc.

import static org.junit.Assert.assertEquals;

/**
* General tests for the {@link InitJobManagerDecorator}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent JavaDoc.

private static final String JOB_MANGER_SERVICE_ACCOUNT_NAME = "jm-service-test";

private Pod resultPod;
private Container resultMainContainer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

private static final String TASK_MANAGER_SERVICE_ACCOUNT_NAME = "tm-service-test";

private Pod resultPod;
private Container resultMainContainer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused


@Test
public void testPodServiceAccountName() {
assertThat(TASK_MANAGER_SERVICE_ACCOUNT_NAME, is(this.resultPod.getSpec().getServiceAccountName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(TASK_MANAGER_SERVICE_ACCOUNT_NAME, is(this.resultPod.getSpec().getServiceAccountName()));
assertThat(this.resultPod.getSpec().getServiceAccountName(), is(TASK_MANAGER_SERVICE_ACCOUNT_NAME));

The util method is defined as follows. Out-of-ordered arguments could lead to improper error messages.

public static <T> void assertThat(T actual, Matcher<? super T> matcher)


@Test
public void testPodServiceAccountName() {
assertEquals(JOB_MANGER_SERVICE_ACCOUNT_NAME, this.resultPod.getSpec().getServiceAccountName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(actual, is(expected)) is preferred.

Copy link
Contributor

@wangyang0918 wangyang0918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

key("kubernetes.service-account")
.stringType()
.defaultValue("default")
.withDescription("Service account that is used by jobmanager and taskmanger within kubernetes cluster. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo taskmanger -> taskmanager

.defaultValue("default")
.withDescription("Service account that is used by jobmanager and taskmanger within kubernetes cluster. " +
"Notice that this can be overwritten by config options '" + JOB_MANAGER_SERVICE_ACCOUNT.key() +
"' and '" + TASK_MANAGER_SERVICE_ACCOUNT.key() + "' for jobmanager and taskmanager respectively.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"' and '" + TASK_MANAGER_SERVICE_ACCOUNT.key() + "' for jobmanager and taskmanager respectively.");
"' and '" + TASK_MANAGER_SERVICE_ACCOUNT.key() + "' for jobmanager and taskmanager respectively.");

A useless space here.

@@ -187,6 +189,11 @@ public void testPodAnnotations() {
assertThat(resultAnnotations, is(equalTo(ANNOTATIONS)));
}

@Test
public void testPodServiceAccountName() {
assertThat(SERVICE_ACCOUNT_NAME, is(this.resultPod.getSpec().getServiceAccountName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(SERVICE_ACCOUNT_NAME, is(this.resultPod.getSpec().getServiceAccountName()));
assertThat(this.resultPod.getSpec().getServiceAccountName(), is(SERVICE_ACCOUNT_NAME));

@blublinsky
Copy link
Author

Thanks guys, can we do the same with other PRs?

@xintongsong
Copy link
Contributor

@blublinsky
Which other PRs do you mean?

@blublinsky
Copy link
Author

@wangyang0918
Copy link
Contributor

@blublinsky For FLINK-20359, I will have a look in this week. And the other two tickets, let's do the pod template first.

@xintongsong
Copy link
Contributor

xintongsong commented Dec 25, 2020

FLINK-20359,
FLINK-20324,
FLINK-15649

@blublinsky

I'm sure we can continue with the discussions and code reviews for these issues, in the corresponding JIRA tickets and pull requests.

I also noticed that, for the three PRs, non of the corresponding JIRA tickets are assigned. I'd like to kindly remind that, the Apache Flink community suggests to reaching consensus on the approach and getting assigned to the ticket by a committer before start working on the implementation. Please take a close look at How to Contribute Code.

meijies pushed a commit to meijies/flink that referenced this pull request Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants