-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17209][YARN] Add the ability to manually update credentials for Spark running on YARN #14789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
what or who exactly is calling the updateCredentials here? The user themselves are calling a developer api in SparkHadoopUtil? I don't know if we ever finished the discussion on DeveloperAPi, but that doesn't seem like a very stable interface for users. |
|
The user themselves will call this API. Another option in SPARK-14743 is to add this API in |
|
So as far as the use case you are saying a user has a shell/process running, they are possibly accessing one hive or hbase cluster (or maybe none) and then they want to access another one. So how is it a user adds/changes the confs to the other hbase or hive clusters? which the trigger would then know how to go fetch. What are all the steps required? |
|
Test build #64352 has finished for PR 14789 at commit
|
|
@tgravescs , with SPARK-14743, credentials/tokens can be managed out of Spark with their own credential provider. In that case user could update the credentials based on configuration changes or something else. For example: we could write a dynamic HBase credential provider which loads configuration from HDFS, once user what to change to another HBase cluster, they could update configuration files in HDFS and trigger credential updating manually using the API provided in this PR. It is mainly based on how user implement their own credential provider. What here provided is just a manual credential update mechanism, when and how to trigger this is relying on user. |
|
Test build #64390 has finished for PR 14789 at commit
|
|
Hi @tgravescs, I have other comments on the pr which I will add, but wanted to ensure there are no high level issues with the approach. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgravescs @vanzin Any thoughts on keeping this private and executing a closure on values of map for usecases like this ? (instead of exposing it or creating a copy via toMap). This can also ensure that outside uses will maintain the MT-invariance expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should stay private since there are synchronization implications. We also want to be careful about having it locked while trying to send this updates though too because it could interfere with normal message processing.
perhaps it needs a new interface and then a DriveEndPoint to handle this. I'd have to look closer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to return a Future for this, which can indicate when the credentials have been updated : as of now, developers invoking this api have no way to know when (if at all) the credentials have been propagated through the driver/executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, Future might be better, I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ask instead of send.
This allows for consumers of the api to have reasonable confidence (based on Future's timeout) whether the update was propagated.
send could result in a task schedule before the update is processed.
|
I guess I forgot about this pr. I don't think we have any yarn specific public apis. Also theoretically this isn't yarn specific if others decided to support it as well. mesos/standalone could also use credentials and updates its just no one has implemented it. We do have api's that don't support all deploy modes though, like the SparkLauncher. A little bit of a side note here, personally I would like to move away from keytabs because its not as secure and at least my company doesn't allow it. To get around keytabs, I was going to add an interface to push new credentials from the gateway box. Have a spark-submit argument that talked to the driver to push new credentials securely over rpc. The reason I bring that up is that if we are adding an api to updateCredentials it would be nice to make it flexible enough to handle that use case as well. perhaps we should add an interface to get a Credentials like object that then can have the updateCredentials routine and others as needed, that way everything isn't directly in the SparkContext. |
|
@tgravescs Interesting, so it means that there wont be a --principal/--keytab, and what is being done in AMCredentialRenewer will be done in the launcher itself ? So the driver behaves just like what executors currently do ? The reason I want to understand more is that current spark design does allow for what is described above, but as soon as this change comes in, it will be difficult to support it : since driver knows about dynamically added clusters/sources, while launcher will not. If yes, you are right, we will need to be careful while designing this to ensure there is minimal change when we evolve to allow for that scenario also. Edit: Currently, it is possible for launcher to 'go away' and the spark job to still continue (iirc), but this will mean it cant (when tokens have to be renewed), right ? |
|
Hey, I haven't had the chance to look at this closely, but this seems to just be an API to trigger the existing credential updater code in Spark, right? When I filed SPARK-14743 I had a different use case in mind. When I talked about Oozie and HoS, Spark's credential updater would not be enabled. Those systems generally do not use Spark's credential updater, since they do not have the user's keytab. They have their own keytab, which they use to login to the KDC and generate the tokens, and they run the child applications using a proxy user. They need a way to update those tokens after Spark has been started, which is different from having a way to trigger Spark's token updater mechanism. I'm not sure I understand how something like Oozie or HoS would use the particular feature added with this change. They can't give their own keytab to the user's application, because the user code should not have access to that. Tom:
Yes basically that is more in line with what I had in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that when we modify the above to return future, this will also need to be in the future (or can always be done inline).
|
@vanzin You are right, this does look orthogonal to what you and Tom described : though both will end up modifying similar codepaths. I am wondering if we can model the change such that both usecases can be supported with minimal impedence. |
|
So I was intending to leave the keytab/principal stuff in there and just add the additional mechanism to push credentials. As long as you are ok with keytab from security point of view its easier for somethings. I think both mechanisms could go through a similar type interface though which is why I brought it up here. |
|
Interesting, thanks for clarifying. The way I was looking at the change was, given we have ability for custom credential providers now, we would need to support for out-of-band (to the current expiry time based refresh/renew) updates to credentials. |
|
I think there are two different issues, one that is being addressed by this change and one that is the one Tom and I are talking about:
The latter is a more complicated change since credential update and fetch are kinda intertwined right now; for example, for the latter to work, the AM would also need to fetch the new credentials as if it were an executor, and right now all the AM do is run the update thread. So I think this is fine for covering case 1 (caveat: I haven't looked at the code yet), but it would be nice to have a solution for case 2 eventually. |
Change-Id: I06d28293c486f322faa178067b285df198f9b401
a25a347 to
b1cac5c
Compare
|
Test build #68519 has finished for PR 14789 at commit
|
|
so I agree these are separate cases but I think the api makes sense to be very similar, or at least in the same sort of class. I don't think we want a public end user api in SparkHadoopUtil.updateCredentials. I would rather us create something in core like a Credentials class and put it in there. |
I think it will be hard to have the same API serve both use cases. You could have one API to generate new credentials and a second one to distribute a set of credentials, and then you could use the second one for the "update from outside of Spark" case. But they'd still be two different new user-facing methods. I agree that |
|
Test build #73461 has finished for PR 14789 at commit
|
What changes were proposed in this pull request?
This PR propose to add a new API in
SparkHadoopUtilto trigger manual credential updating in the run-time.This is mainly used in long running spark applications which needs to access different secured system in the run-time. For example, when Zeppelin / Spark Shell needs to access different secured HBase cluster or Hive metastore service in the run-time, it requires tokens from new services and updates to executors immediately. Previously either we need to relaunch the application to get new tokens, or we need to wait until the old tokens to expire to get new ones.
With this new API, user could manually trigger credential updating in the run-time when required. Credentials will be renewed in AM and updated in executor and driver side.
How was this patch tested?
Manually verified in the secured cluster.