-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23146][WIP] Support client mode for Kubernetes in Out-Cluster mode #20451
Conversation
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just getting started with k8s so apologies for a basic question, can you explain how you have in-cluster in client mode? Sorry I didn't see it described in the linked doc (the original SPIP doc says "The driver may
run outside the cluster (client mode) or within (cluster mode)"). But feel free to point me at some other docs I should look at.
|
||
private[spark] object OptionRequirements { | ||
|
||
def requireBothOrNeitherDefined( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used, and if it were, perhaps would be clearer as
(opt1, opt2) match {
case (Some(val1), None) =>
...
case (None, Some(val2)) =>
...
case _ => // ok
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of unused code that could be fixed in other PR (e.g. #21462). Will double check after merging master, but normally, client-mode does not impact this.
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Current state works for out-cluster client mode. For in-cluster, we miss the Running in cluster gives exception:
|
Kubernetes integration test starting |
Kubernetes integration test status success |
May I ask, what's the release plan for this? |
Now that #20910 has been merged, I will update this PR to take account the refactoring. @inetfuture Once these changes are pushed, there is the review process which needs to occur, so difficult to give you a plan. |
I have updated this branch and tested successfully client mode OutCluster. Happy to get confirmation of this from anyone here (cc/ @foxish) For InCluster (was working fine before ##20910), worker starts but fails due to UnknownHostException due to the lack of
|
Kubernetes integration test starting |
Kubernetes integration test status success |
@echarles I'm wondering do you have time to sync this with the master and make it to a shape that's ready for review? Thanks! |
@rayburgemeestre This is cool! Apache Toree configuration for K8S is on my todo list but would be happy to copycat your conf... IMHO both client and cluster mode should be fine with Toree. Did you try cluster mode? |
Yes, absolutely, but I made those changes in a hardcoded way though, just to try it out. So I'm not sure how helpful it is. Near the end of this file: Added these options:
So the exec call looks like this:
A few notes:
EDIT: To answer your question, I remember I tried |
.map(new File(_)) | ||
.orElse(maybeServiceAccountToken) | ||
val oauthTokenValue = sparkConf.getOption(oauthTokenConf) | ||
OptionRequirements.requireNandDefined( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's only used once I'm not sure it warrents a separate file/method for checking Options.
Also the method signature isn't quite clear for me what it does. (especially requireN)
How about just a simple match that @squito suggested here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not introduce OptionRequirements.requireNandDefined
but reused what was already in place.
@@ -88,6 +103,56 @@ private[spark] object SparkKubernetesClientFactory { | |||
new DefaultKubernetesClient(httpClientWithCustomDispatcher, config) | |||
} | |||
|
|||
def createOutClusterKubernetesClient( | |||
master: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix ident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -140,13 +140,6 @@ private[spark] class Client( | |||
throw e | |||
} | |||
|
|||
if (waitForAppCompletion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not needed anymore? If we enable cluster mode we still want the same behavior defined here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh didn't change that behavior. Let's wait the next push and see the diff. master is evolving and numerous refactoring does not make this PR easy to merge with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I have added back that section... Thx @tnachen
SparkKubernetesClientFactory.createKubernetesClient( | ||
KUBERNETES_MASTER_INTERNAL_URL, | ||
Some(sparkConf.get(KUBERNETES_NAMESPACE)), | ||
APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate conf prefix as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just took what was existing. will double check to ensure latest merge does not diverge.
@liyinan926 I pushed the merge with master and tested OutCluster (I didn't test InCluster but as mentioned previously, we would need something similar to the @squito @tnachen Thx for your reviews and comments which make sense. My approach is to change what is needed for the client-mode. I prefer leaving constructs, code removal and enhancement considerations in other PRs. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
@@ -0,0 +1,40 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have KubernetesUtils
that is supposed to be the place for all K8s-related utility methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -58,6 +58,8 @@ private[spark] object Config extends Logging { | |||
"spark.kubernetes.authenticate.driver" | |||
val KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX = | |||
"spark.kubernetes.authenticate.driver.mounted" | |||
private[spark] val APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX
was replaced by KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX
. Please revert this addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -88,6 +103,60 @@ private[spark] object SparkKubernetesClientFactory { | |||
new DefaultKubernetesClient(httpClientWithCustomDispatcher, config) | |||
} | |||
|
|||
def createOutClusterKubernetesClient( | |||
master: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indention.
SparkKubernetesClientFactory.createKubernetesClient( | ||
sparkConf.get("spark.master").replace("k8s://", ""), | ||
Some(sparkConf.get(KUBERNETES_NAMESPACE)), | ||
KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this prefix spark.kubernetes.authenticate.driver.mounted
sounds weird in this case given that the client is running outside the cluster. BTW: can we alternatively use the config at $HOME//.kube/config
to build a kubernetes client instead? I think this is a common approach for building clients outside a cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the call to createKubernetesClient
is not used in two different ways:
KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX
is used inKubernetesClusterManager
KUBERNETES_AUTH_SUBMISSION_CONF_PREFIX
is used inKubernetesClientApplication
I would favor the second and remove the first.
For the config place, I remember that the fabric8 k8s client does also some inspection to see if it is in or out cluster, and loads the config form the default place (depending the case), with possiblity to specify other places for the cert, token... (this is what we give as property to the end-user).
@mccheah who mentioned a plan to create a separate change for client mode support. |
@liyinan926 Thx for you comments (see my answers). I have merged the last changes, added a tiny doc section and pushed again. |
@echarles I'm planning to push a different change that takes a different approach from this. |
Kubernetes integration test starting |
Kubernetes integration test status success |
See #21748 |
See #21748 |
What changes were proposed in this pull request?
The changes allow to support Kubernetes resource manager in client mode (upon the existing cluster mode)
How was this patch tested?
The initial changes were done on the latest commits in the spark-k8s fork (https://github.com/apache-spark-on-k8s/spark) and have been tested on AWS with real data processing.
In an effort to merge back the latests features to apache master, I open here untested changes subject to feedback and discussion.
Documentation will be updated when code will be discussed, but in the meantime there is a indigest design document that can be read to know more about the changes. In- and Out- K8s Cluster considerations, as deps and hdfs access is discussed there.
Upon the current design and implementation constructs, an open point I have is about the way we wanna configure the path of the k8s config in case of OutCluster mode. Options are:
/var/run/secrets/kubernetes.io/serviceaccount/token
(which is there for InCluster), fall back automatically to the given property, or if no property has been given, fallback to the$HOME/.kube/config
(in this latter case, there is no separate cacert nor keyfile, those details are all bundled in the single$HOME/.kube/config
file).The tests so far have been done with separated config, cacert and key files (I guess the single config file should not give any issue).
A last important point is how we move forward with this for the merge. To have a client mode better coverage, it would be interesting to have also downstream apache-spark-on-k8s#540 which is not only Kerberos, but also the Hadoop steps much needed to mount Hadoop conf to connect HDFS from Driver/Executors.
Also to avoid mess in future merge, I list here the changes I had to deal with applying the patch on the apache master repo: