[FLINK-15816][k8s] Prevent labels using kubernetes.cluster-id to exceed the limit of 63 characters #17819

alpreu · 2021-11-17T10:29:51Z

What is the purpose of the change

Kubernetes labels have a maximum length of 63 characters. We generate a variety of labels by applying suffixes and prefixes to represent certain components. Previously they have often exceeded the maximum length because the label generation happened by just concatenating strings together. This PR aims to provide a safe mechanism for the label creation that also keeps the documentation up to date.

Brief change log

Introduced KubernetesLabel enum with helper methods for label generation
Updated docs to always include the current maximum length

Verifying this change

This change is already covered by existing tests, such as (please describe tests).

AbstractKubernetesParametersTest#testClusterIdLengthLimitation

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes - kubernetes)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (docs + JavaDocs)

flinkbot · 2021-11-17T10:33:28Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 7f0824c (Wed Nov 17 10:33:28 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2021-11-17T10:33:42Z

CI report:

2a61cd8 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

XComp

Thanks for the contribution, @alpreu. 👍 There's only once concern, I want to share (related to the comment @tillrohrmann shared in the corresponding ticket): Shouldn't we make this threshold dynamic? It's hard to grasp why we set it to 40.

Instead, we could replace AbstractKubernetesParameters.getClusterId() by something like AbstractKubernetesParameters.generateBasedOnClusterId(String) and do the check internally. That way, we could fail if the generated String is longer than 63 characters. The clusterId can be hidden in AbstractKubernetesParameters as a private field (of course, we're not preventing cases where somebody access the configuration directly. But that's not covered in the current code, either.

The flaw of that approach would be that we wouldn't be able to set a specific value in the cluster-id parameter description. A way to fix that could be to move the entire label generation into AbstractKubernetesParameters. That way, the class would have knowledge about all available suffixes and could derive the maximum length of cluster ID from them. This would lower the risk of becoming inconsistent again when introducing additional k8s suffixes. WDYT?
@wangyang0918 is the case of adding new k8s prefixes unlikely?

XComp · 2021-12-07T15:20:15Z

...ernetes/src/main/java/org/apache/flink/kubernetes/configuration/KubernetesConfigOptions.java

@@ -220,7 +220,7 @@
                    .withDescription(
                            Description.builder()
                                    .text(
-                                            "The cluster-id, which should be no more than 45 characters, is used for identifying a unique Flink cluster. "
+                                            "The cluster-id, which should be no more than 40 characters, is used for identifying a unique Flink cluster. "


Is there anything preventing us from using Constants.MAXIMUM_CHARACTERS_OF_CLUSTER_ID here? 🤔

@alpreu I'm just puzzled. Why did we change the limit to 40 initially? The current version of the PR sets the limit to 45 again? I double-checked the code that we didn't miss any other label and couldn't find anything else. But it looks like I'm missing something. I remember that the initial change was valid in terms of the number but we just wanted to make it dynamic. Do you see what I'm missing here, @alpreu ? 🤔

Yes, it changed back to the original value. The reason is that the 40 was derived from the "-resourcemanager-leader" in KubernetesHaServices. In that class only configmap names are generated, for which the 63 character limit does not apply. I don't know if it makes sense to move this generation also into KubernetesLabel. Especially because the JobManager one does contain the JobID which would restrict the clusterId length extremely.

wangyang0918 · 2021-12-11T15:12:26Z

IIRC, not only the labels, but also pod names, services names, have the most 63 characters limitation. So we might need to have generateBasedOnClusterId for all the specific cases.

is the case of adding new k8s prefixes unlikely?

Yes, we might introduce new or change some labels, names in the future, which will make the length pre-check for cluster-id not work again. So I agree with you that moving all the generation of labels, pod names, service
names to AbstractKubernetesParameters is a right direction. But it might not be very easy.

XComp · 2021-12-13T11:23:50Z

So I agree with you that moving all the generation of labels, pod names, service
names to AbstractKubernetesParameters is a right direction. But it might not be very easy.

Why is it not that easy? As far as I can see, AbstractKubernetesParameters.getClusterId() is always called in the context of a static method that adds some prefix to it. Moving these static methods into AbstractKubernetesParameters shouldn't be too complicated. Or am I missing something here, @wangyang0918 ?

wangyang0918 · 2021-12-15T10:58:48Z

@XComp Hmm. You are right. It is not too complicated and we just need to take some time to find out the K8s resource names generation(e.g. ExternalServiceDecorator.getExternalServiceName()) and move them to the AbstractKubernetesParameters.

XComp · 2021-12-16T21:56:42Z

@alpreu would that be something you could do in this PR?

alpreu · 2021-12-17T07:55:55Z

Sure, I can do this, just have to deprioritize it for some SDK stuff for now

alpreu · 2021-12-20T13:51:26Z

I was looking into this again and came up with the idea of using an Enum to store the prefixes and suffixes. The cluster-id parameter description could then find the longest string value by iterating over the enum values.
However, this breaks in case a label would contain both a suffix and prefix. Currently it would not be a problem but I'm not sure if it makes sense to do it this way then. What do you think?

XComp · 2021-12-21T11:20:02Z

What's the benefit of having an enum? Initially, I thought of just implementing factory methods in AbstractKubernetesParameters which generate a String based on the internally stored cluster ID. I'm wondering whether there is some reason where we would benefit from adding an enum here.

alpreu · 2021-12-21T12:02:32Z

What's the benefit of having an enum? Initially, I thought of just implementing factory methods in AbstractKubernetesParameters which generate a String based on the internally stored cluster ID. I'm wondering whether there is some reason where we would benefit from adding an enum here.

But then once again you have to touch the maximum length every time a parameter is added in the future, that was what I wanted to avoid

XComp · 2021-12-21T13:21:39Z

not if you iterate over the factory methods to determine the maximum length. But yeah, you are right. You have to have some collection (in your case an enum) to be able to iterate over the generated labels to determine the maximum length. In this sense, an enum would be a viable solution. 👍

wangyang0918 · 2021-12-22T03:43:01Z

It is a good idea to introduce such a enum. And I do not think we will have the prefix and suffix for the K8s resource names or labels. +1

alpreu · 2021-12-22T14:09:01Z

I just pushed a work in progress commit that illustrates the idea.
One thing I really don't like is that getClusterId() cannot be hidden/private because:

it needs the reference to the flink Configuration object
it is used to retrieve the (unmodified) clusterId in a lot of places

I tried to change that but almost all of the classes extending AbstractKubernetesStepDecorator provide some static method to retrieve the label (e.g. getFlinkConfigMapName that takes the clusterId as string input because they are also called with the string input provided by the CLI.
TLDR: I'm not really happy with the solution right now and the problem is more tricky than it looks. Do you guys have any ideas how we can clean this up?

XComp

Thanks @alpreu . The changes look good. I have some minor suggestions that might improve the code a bit more.

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabelAffix.java

...ain/java/org/apache/flink/kubernetes/kubeclient/parameters/AbstractKubernetesParameters.java

alpreu · 2022-01-07T08:20:12Z

I updated the PR, PTAL :)

XComp

Thanks @alpreu. I guess we're almost there. Good work. There are still a few minor things that need to be fixed. Additionally, I added some nitpicking comments which might be worth considering as well.

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabel.java

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/Constants.java

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabel.java

.../src/main/java/org/apache/flink/kubernetes/kubeclient/decorators/KerberosMountDecorator.java

...ernetes/src/main/java/org/apache/flink/kubernetes/configuration/KubernetesConfigOptions.java

…ed the limit of 63 characters

XComp · 2022-01-07T14:05:19Z

@flinkbot run azure

XComp · 2022-01-10T09:46:56Z

@flinkbot run azure

XComp

Ok, looks good from my end now. Thanks for the changes.

@wangyang0918 can you have a second pass over it to verify that we're incorporating the correct Strings as KubernetesLabel which fall under the restriction of having a maximum length of 63 characters?

I'm gonna merge the PR after the 👍 from @wangyang0918 and AzureCI

wangyang0918 · 2022-01-10T11:41:48Z

Really thanks @alpreu for working on this PR and @XComp helping with review.

I believe it is the right direction to use the KubernetesLabel for generating and validating labels and names. I have one more concern about this PR. IIRC, only the labels, pod names, service names have the limitation of 63 characters. The ConfigMap name is not one of them.

Another thing is about the pod name. The names of JobManager pod(e.g. <clusterId>-fd5c8ffc4-c6dfw) and TaskManager pod(e.g. <clusterId>-taskmanager-<attemptId>-<index>) are generated dynamically. Do we also need to add the limitation check for them?

alpreu · 2022-01-10T13:18:45Z

I believe it is the right direction to use the KubernetesLabel for generating and validating labels and names. I have one more concern about this PR. IIRC, only the labels, pod names, service names have the limitation of 63 characters. The ConfigMap name is not one of them.

You are right about the configmap-affixes, we could remove them from the Enum. I think the question is if we want to have the generation in one place or not.

Another thing is about the pod name. The names of JobManager pod(e.g. <clusterId>-fd5c8ffc4-c6dfw) and TaskManager pod(e.g. <clusterId>-taskmanager-<attemptId>-<index>) are generated dynamically. Do we also need to add the limitation check for them?

As far as I could see both of those only relate to configmaps or podnames, which should support a maximum of 253 characters. So if we discount the -taskmanger- (13 chars) and the current maximum for the clusterId (40chars) we still have 200 characters left so it should not be an issue. What do you think?

wangyang0918 · 2022-01-11T03:59:37Z

I think the question is if we want to have the generation in one place or not.

Having the generation in one place make more senses to me if we could get this done.

As far as I could see both of those only relate to configmaps or podnames, which should support a maximum of 253 characters. So if we discount the -taskmanger- (13 chars) and the current maximum for the clusterId (40chars) we still have 200 characters left so it should not be an issue. What do you think?

Yes, my bad. The pod names also do not have such limitation of 63 characters, but 253 characters. We do not need to add the check for them.

alpreu · 2022-01-12T09:10:36Z

Hi @wangyang0918 could you clarify once more please what code changes would like to see for the PR to be merged? I think it would be great if we can still get this into the 1.15 release

wangyang0918 · 2022-01-12T09:43:30Z

@alpreu In current implementation, the ConfigMap name could not be logger than 63 characters. This is my only concern about this PR. Before this change, I think we do not have such assumption. This also make the max length of cluster-id smaller.

I think we could have the names/labels generation in one place but without always check the 63 characters limit. Right?

wangyang0918 · 2022-01-12T09:49:59Z

Another thing is that maybe we could also add the pre-condition check for KubernetesUtils#getCommonLabels. This is the root cause why we reopen FLINK-15816. Not the ConfigMap name, but the labels exceed the 63 characters limit.

alpreu · 2022-01-12T14:25:39Z

I checked again and I am not sure we are on the right track to the root cause. The enum has these values currently:

    CONFIG_MAP("flink-config-", ""),
    HADOOP_CONF_CONFIG_MAP("hadoop-config-", ""),
    KERBEROS_KEYTAB_SECRET("kerberos-keytab-", ""),
    KERBEROS_KRB5CONF_CONFIG_MAP("kerberos-krb5conf-", ""),
    POD_TEMPLATE_CONFIG_MAP("pod-template-", ""),
    FLINK_REST_SERVICE("", "-rest");

If we remove all the ones with CONFIG_MAP in their name we are left with:

KERBEROS_KEYTAB_SECRET("kerberos-keytab-", ""),
FLINK_REST_SERVICE("", "-rest");

The KERBEROS_KEYTAB_SECRET is used as a a secret name which should allow 253 characters, so we can also remove that. Therefore the only limiting factor is the FLINK_REST_SERVICE which is used in a service name, which only allows 63 characters.
This would mean we only need to limit the clusterId to 58 characters (63 - len("-rest")). But are we really sure we are this is the case? Because then #11708 should have already fixed this.

wangyang0918 · 2022-01-13T03:57:16Z

@alpreu Currently, the 45 characters pre-check could only work for native K8s HA/non-HA mode. But for standalone cluster with K8s HA enabled, we do not have the pre-check and will cause the exception in FLINK-15816 if the cluster-id is longer than 63 characters.

I have verified that cluster id with 45 characters works well.

wangyang-pc:scripts danrtsey.wy$ kubectl get pods
NAME                                                             READY   STATUS    RESTARTS   AGE
xxxxx1234567890123456789012345678901234567890-77895db8dd-mrcnm   1/1     Running   0          22m
xxxxx1234567890123456789012345678901234567890-taskmanager-1-1    1/1     Running   0          21m
wangyang-pc:scripts danrtsey.wy$ kubectl get cm
NAME                                                                                               DATA   AGE
flink-config-xxxxx1234567890123456789012345678901234567890                                         3      22m
kube-root-ca.crt                                                                                   1      22d
xxxxx1234567890123456789012345678901234567890-00000000000000000000000000000000-jobmanager-leader   4      21m
xxxxx1234567890123456789012345678901234567890-dispatcher-leader                                    4      21m
xxxxx1234567890123456789012345678901234567890-resourcemanager-leader                               2      21m
xxxxx1234567890123456789012345678901234567890-restserver-leader                                    2      21m

XComp · 2022-01-21T07:12:33Z

Have we come up with a conclusion on this PR? I don't have the overview over what names are actually used as labels which fall under the 63 characters limit? @wangyang0918 do I understand you correctly that we have to integrate the KubernetesLabel at some other places to make the check also work for the standalone k8s cluster with HA enabled?

wangyang0918 · 2022-01-21T07:42:09Z

do I understand you correctly that we have to integrate the KubernetesLabel at some other places to make the check also work for the standalone k8s cluster with HA enabled?

Exactly.

Maybe I need to make myself clearer.

ConfigMap name does not have the limitation of 63 characters. Then only the KubernetesLables#FLINK_REST_SERVICE makes sense.
IIRC, we do not add prefix/suffix of cluster-id for the labels. So we just need to check whether the name and value of labels meet the 63 characters.

WDYT?

XComp · 2022-01-24T18:23:39Z

Ok, then I misinterpreted the ticket. Essentially, the issue is not that we have to guess the right value for this configuration parameter but that the validation never happens for standalone clusters with HA enabled. But I don't understand you claim that it make sense to have the REST suffix included in KubernetesLabel. I cannot even find a place where we use the clusterId + -rest as a label key or value.

Why do we have such an odd value (i.e. 45) for MAXIMUM_CHARACTERS_OF_CLUSTER_ID in the first place. Shouldn't it be 63 then? As a consequence, that would mean that we only have to have a single accessor for KubernetesConfigOptions.CLUSTER_ID that checks whether the specified clusterId is 63 characters or less, wouldn't it?

Are there references to where the labels (i.e. the resources limited to 63 characters) are set?

wangyang0918 · 2022-01-25T02:43:14Z

@XComp

But I don't understand you claim that it make sense to have the REST suffix included in KubernetesLabel. I cannot even find a place where we use the clusterId + -rest as a label key or value.

clusterId + -rest is not a label, but the service name. Currently, I know the label key/value and service name both have the 63 characters limitation.

Why do we have such an odd value (i.e. 45) for MAXIMUM_CHARACTERS_OF_CLUSTER_ID in the first place. Shouldn't it be 63 then? As a consequence, that would mean that we only have to have a single accessor for KubernetesConfigOptions.CLUSTER_ID that checks whether the specified clusterId is 63 characters or less, wouldn't it?

Just like the rest service name(clusterId + -rest), we still have other resources need to check. Right?

The reason why set the MAXIMUM_CHARACTERS_OF_CLUSTER_ID to 45, not 58(63 - 5-rest) is trying to leave some potential spaces for the future introduced service/labels.

Are there references to where the labels (i.e. the resources limited to 63 characters) are set?

https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set

XComp · 2022-01-26T08:42:48Z

Ok, my understanding was that we're using service names in labels which forces us to set some limitation on the cluster ID. Reiterating over the comments of FLINK-15816, I conclude that it's really only about adding the check for the clusterIds maximum length also for standalone mode. Essentially, the only change that's needed is to add a check similar to AbstractKubernetesParameters:73 in KubernetesHaServices:99.

I'm just wondering why we should limit ourselves with 45 characters. Don't we have to restart the Flink cluster anyway in case of a version update? That would give the user the possibility to update the cluster ID if it becomes necessary.

In any case, we should add some in-code documentation about this to the MAXIMUM_CHARACTERS_OF_CLUSTER_ID declaration in Constants:86 to explain the reason for this constant being set in that way (i.e. that it's left at 45 to be prepared for future.

Sorry for guiding you into the wrong direction, @alpreu .

rmetzger added the component=Deployment/Kubernetes label Nov 17, 2021

XComp reviewed Dec 7, 2021

View reviewed changes

XComp requested changes Jan 3, 2022

View reviewed changes

alpreu force-pushed the FLINK-15816 branch 2 times, most recently from 317d5e9 to fb128de Compare January 7, 2022 08:19

XComp requested changes Jan 7, 2022

View reviewed changes

[FLINK-15816][k8s] Prevent labels using kubernetes.cluster-id to exce…

2a61cd8

…ed the limit of 63 characters

alpreu force-pushed the FLINK-15816 branch from fb128de to 2a61cd8 Compare January 7, 2022 09:42

alpreu changed the title ~~[FLINK-15816][k8s] Limit kubernetes.cluster-id to a maximum of 40 characters~~ [FLINK-15816][k8s] Prevent labels using kubernetes.cluster-id to exceed the limit of 63 characters Jan 7, 2022

alpreu requested a review from XComp January 7, 2022 13:20

XComp approved these changes Jan 10, 2022

View reviewed changes

lizhimins approved these changes Mar 2, 2023

View reviewed changes

[FLINK-15816][k8s] Prevent labels using kubernetes.cluster-id to exceed the limit of 63 characters #17819

Are you sure you want to change the base?

[FLINK-15816][k8s] Prevent labels using kubernetes.cluster-id to exceed the limit of 63 characters #17819

Conversation

alpreu commented Nov 17, 2021 • edited

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Nov 17, 2021

Automated Checks

Review Progress

flinkbot commented Nov 17, 2021 • edited

CI report:

XComp left a comment • edited

Choose a reason for hiding this comment

XComp Dec 7, 2021

Choose a reason for hiding this comment

XComp Jan 7, 2022

Choose a reason for hiding this comment

alpreu Jan 7, 2022

Choose a reason for hiding this comment

wangyang0918 commented Dec 11, 2021

XComp commented Dec 13, 2021

wangyang0918 commented Dec 15, 2021

XComp commented Dec 16, 2021

alpreu commented Dec 17, 2021

alpreu commented Dec 20, 2021

XComp commented Dec 21, 2021

alpreu commented Dec 21, 2021

XComp commented Dec 21, 2021

wangyang0918 commented Dec 22, 2021

alpreu commented Dec 22, 2021

XComp left a comment

Choose a reason for hiding this comment

alpreu commented Jan 7, 2022

XComp left a comment

Choose a reason for hiding this comment

XComp commented Jan 7, 2022

XComp commented Jan 10, 2022

XComp left a comment

Choose a reason for hiding this comment

wangyang0918 commented Jan 10, 2022

alpreu commented Jan 10, 2022

wangyang0918 commented Jan 11, 2022

alpreu commented Jan 12, 2022

wangyang0918 commented Jan 12, 2022

wangyang0918 commented Jan 12, 2022

alpreu commented Jan 12, 2022 • edited

wangyang0918 commented Jan 13, 2022

XComp commented Jan 21, 2022

wangyang0918 commented Jan 21, 2022

XComp commented Jan 24, 2022 • edited

wangyang0918 commented Jan 25, 2022

XComp commented Jan 26, 2022 • edited

alpreu commented Nov 17, 2021 •

edited

flinkbot commented Nov 17, 2021 •

edited

XComp left a comment •

edited

alpreu commented Jan 12, 2022 •

edited

XComp commented Jan 24, 2022 •

edited

XComp commented Jan 26, 2022 •

edited