-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-20944][k8s] Do not resolve the rest endpoint address when the service exposed type is ClusterIP #14692
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 48ef644 (Tue Jan 19 07:33:12 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @wangyang0918. I had a question concerning the change. How does the client communicates with the cluster if ClusterIP
is chosen? I think you mentioned that we are using a namespaced service for it. Can't the RestClusterClient
be directly initialized with this service's address instead of trying to resolve the web monitor address?
private String getWebMonitorAddress(Configuration configuration) throws Exception { | ||
HighAvailabilityServicesUtils.AddressResolution resolution = | ||
HighAvailabilityServicesUtils.AddressResolution.TRY_ADDRESS_RESOLUTION; | ||
if (configuration.get(KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE) | ||
== KubernetesConfigOptions.ServiceExposedType.ClusterIP) { | ||
resolution = HighAvailabilityServicesUtils.AddressResolution.NO_ADDRESS_RESOLUTION; | ||
} | ||
return HighAvailabilityServicesUtils.getWebMonitorAddress(configuration, resolution); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of suppressing the web monitor address resolution, can't we tell the RestClusterClient
the address to the namespaced service? I think you mentioned in the ticket that we will communicate with the cluster through this service if we have chose ClusterIP
. It just feels wrong that we still retrieve the web monitor's address even though we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flink client communicates with the cluster via namespaced service if ClusterIP is chosen.
I assume you mean directly return a RestClusterClient
using the namespaced service(aka restEndpoint.get().getAddress()
). After then, we also need to check whether the ssl is enabled and add http/https
protocol. I think it is what we have done in HighAvailabilityServicesUtils.getWebMonitorAddress
.
Moreover, I do not think we are retrieving the web monitor's address. It is more like to construct the address in a specific schema(aka protocol://address:port). The retrieval process has already been done in the flinkKubeClient.getRestEndpoint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the K8s case, from where exactly do we retrieve the address of the service? If I understood you correctly, then RestOptions.ADDRESS
contains some address which is not resolvable from the outside. Hence, I am wondering why we should try to construct the web monitor address from this configuration at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Flink application submission could happen in or out of the K8s cluster. The reason why we set the RestOptions.ADDRESS
to the namespaced service is that it could be directly used in the K8s cluster. However, when the submission happens out of the K8s cluster, the namespaced service could not be used to contact with the cluster.
In such situation, users usually need to create an ingress for the communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the submission happens outside of the K8s cluster with ClusterIP
configured, does this mean that the RestClusterClient
cannot talk to the cluster? If yes, then there is not really a need for creating it, does it? But then there is the problem that we could also deploy the cluster from within the K8s cluster where the RestClusterClient
can talk to the Flink cluster, right?
Should we maybe say that user's have to submit the cluster in detached mode if they cannot connect to it? The problem with the current solution is that we might start an attached per job cluster with which we cannot talk. Hence, we cannot request the final job result either and consequently, the cluster will never shut down. So maybe we should fail with an better exception message, what do you think?
f65e390
to
5f74749
Compare
After a little more consideration, maybe we always do not need to resolve the rest endpoint address. I have updated the PR based on this. |
The idea of the address resolution was to check that the client can actually talk to the cluster and to fail early if this is not the case. Hence, I am not sure whether we should change this behaviour because it can lead to the following problem: Let's assume we deploy a per-job cluster with |
@tillrohrmann in alot of cases, organizations are reluctant to open up NodePorts for security issues, instead exposing a secured ingress where all the external systems talk to. Is it possible if we can specifically in document that if ClusterIP is chosen, the cluster admin should connect the service from their own ingress? |
@tillrohrmann I have the same concern before implement this PR. But now I think this change makes sense since Flink client will not retrieve the result in the application mode. BTW the per-job mode is not supported for native K8s integration and we do not have a plan to support per-job mode. The benefit of not resolving the rest endpoint address is we could deploy the Flink session/application cluster with The limitation is that |
You are right with the per-job-mode and K8s @wangyang0918. Hence, this was a bad example ;-) But the example of canceling the job/application is actually a good one. If we know that we cannot talk to the cluster, then we probably shouldn't create a For what do we actually need the |
Both the detached/attached mode for application, the submission process does not need to create a The problem is that Flink client does not know whether it is running in or out of the K8s cluster. So it is hard to not create the |
You mean when not using I think it is ok that Flink does not know where it runs (outside or inside the K8s cluster). What should matter is whether we have to be able to talk to the cluster or not. If we have to talk to the cluster, then we have to be able to create a Would it make sense to say that we don't create a Another question, why does the |
No, I mean to add a log to remind users rest endpoint could only be used in the K8s cluster when using
I agree with you that Flink client should fail if we could not connect to the cluster(e.g. not being able to resolve the address).
Printing the web interface URL will help the users to quickly access the dashboard. I think in Yarn deployment, we have a similar log.
All in all, do you think the following suggestion makes sense?
Or you insist on not creating the |
I don't think that this would work w/o other changes because when you deploy an attached Yarn per-job cluster, then the client will be used to query the job result which requires access to the cluster. What one maybe needs to do is to separate Don't get me wrong here. I think your solution works as a quick fix and maybe that is what we should do. But I think that this problem shows that there is more of a conceptual problem with the overall design because we need special case logic for |
Given that this change is a bigger change, I would be in favour of applying your initial solution: Do not resolve address if |
Hmm. Maybe I do not make myself clear. The above solution is only for Kubernetes. For Yarn, we indeed have the issue for per-job cluster. Just like you said, on a high level scope, I think we do not have a very good abstraction for all I will update the PR to the initial solution. |
08992db
to
df11e96
Compare
cc @tillrohrmann I have updated this PR. After this change, we could start a Flink application/session cluster on K8s native inside/outside the K8s cluster when using
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this PR @wangyang0918. I had a last comment which would be great to resolve before we merge this PR.
"Please note that Flink client operation(e.g. cancel, list, stop," | ||
+ " savepoint, etc.) won't work from outside the Kubernetes cluster."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add that this is due to having chosen ClusterIP
for the KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log only shows up when the having chosen ClusterIP
. So I do not add this information here.
But I also think it is harmless to show more information.
…service exposed type is ClusterIP
df11e96
to
05f8747
Compare
Address the last comments. Now the log looks like following.
|
…service exposed type is ClusterIP This closes #14692.
…service exposed type is ClusterIP This closes apache#14692.
What is the purpose of the change
If the
kubernetes.rest-service.exposed.type
isClusterIP
, then we do not need to resolve the rest endpoint address(aka namespaced service name). Otherwise, we will always get aUnknownHostException
when deploying a Flink application outside of the K8s cluster.Brief change log
Verifying this change
testDeployApplicationClusterWithClusterIP
, which should fail before withUnknownHostException
this change and pass after this PRDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation