Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9955] Add Kubernetes ClusterDescriptor to support deploying session cluster. #9973

Closed
wants to merge 2 commits into from

Conversation

wangyang0918
Copy link
Contributor

@wangyang0918 wangyang0918 commented Oct 23, 2019

What is the purpose of the change

This PR is part of FLINK-9953. KubernetesClusterDescriptor is added to support deploy session cluster.

This PR is based on #9957 #9965 #9986.

Brief change log

  • Add Kubernetes ClusterDescriptor to support deploying session cluster

Verifying this change

This change added related unit tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@wangyang0918 wangyang0918 changed the title [FLINK-9955] Add Kubernetes ClusterDescriptor to support deploy session cluster. [FLINK-9955] Add Kubernetes ClusterDescriptor to support deploying session cluster. Oct 23, 2019
@flinkbot
Copy link
Collaborator

flinkbot commented Oct 23, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit bce759c (Wed Dec 04 15:57:18 UTC 2019)

Warnings:

  • 1 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 23, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

Copy link
Contributor

@KarmaGYZ KarmaGYZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the definition of logException. It generally LGTM. +1 for merging.


private static final Logger LOG = LoggerFactory.getLogger(KubernetesClusterDescriptor.class);

private static final String CLUSTER_DESCRIPTION = "Kubernetes cluster";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a follow up we give detailed description as on YARN. Just thought, not requirement for this patch :-) It would be better if we track it as a ticket on JIRA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. I have created a ticket FLINK-14986 to track this.

@wangyang0918 wangyang0918 force-pushed the FLINK-9955 branch 3 times, most recently from bce759c to 63bc2b0 Compare December 5, 2019 06:16
@wangyang0918
Copy link
Contributor Author

@tisonkun I have rebase the master. Could you please take a look again?

flinkConfig.setString(KubernetesConfigOptionsInternal.ENTRY_POINT_CLASS, entryPoint);

// Rpc(6123), blob(6124), rest(8081) taskManagerRpc(6122) port need to be exposed, so update them to fixed port.
flinkConfig.setString(BlobServerOptions.PORT, String.valueOf(Constants.BLOB_SERVER_PORT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So user setups on these PORTs have no power? It is a strong constraint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.
Fixed port is required, because we need to expose the port in kubernetes. I will update the PR so that it could respect the user specified port value. If no BlobServerOptions.PORT and TaskManagerOptions.RPC_PORT are set, the default value will be used.

@wangyang0918 wangyang0918 force-pushed the FLINK-9955 branch 3 times, most recently from 5bfa111 to e8267bc Compare December 7, 2019 03:01
flinkConfig.setString(KubernetesConfigOptionsInternal.ENTRY_POINT_CLASS, entryPoint);

// Rpc(6123), blob(6124), rest(8081) taskManagerRpc(6122) port need to be exposed, so update them to fixed port.
if (Integer.valueOf(flinkConfig.get(BlobServerOptions.PORT)) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly we catch the parse exception and throw/log a specific information that port range doesn't support in K8S deployment. And when we config the default port on user configure 0, it'd better we log a warning so that user knows his original purpose "random choose a port" isn't respected.

        {
			final int blobServerPort;
			try {
				blobServerPort = Integer.parseInt(flinkConfig.get(BlobServerOptions.PORT));
			} catch (NumberFormatException e) {
				// log...
				throw new ClusterDeploymentException("...");
			}
			
			if (blobServerPort == 0) {
				flinkConfig.setString(BlobServerOptions.PORT, String.valueOf(Constants.BLOB_SERVER_PORT));
				// log ...
			}
		}

		{
			final int taskManagerRpcPort;
			try {
				taskManagerRpcPort = Integer.parseInt(flinkConfig.get(BlobServerOptions.PORT));
			} catch (NumberFormatException e) {
				// log...
				throw new ClusterDeploymentException("...");
			}

			if (taskManagerRpcPort == 0) {
				flinkConfig.setString(TaskManagerOptions.RPC_PORT, String.valueOf(Constants.TASK_MANAGER_RPC_PORT));
				// log ...
			}
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a method parsePort in KubernetesUtils so that it could be reused by FlinkMasterDeploymentDecorator and TaskManagerPodDecorator.

…ager pod respect user defined config

BlobServerOptions.PORT and TaskManagerOptions.RPC_PORT need to be respected.
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging...

@tisonkun tisonkun closed this in ef5314a Dec 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants