[SPARK-17209][YARN] Add the ability to manually update credentials for Spark running on YARN #14789

jerryshao · 2016-08-24T13:56:44Z

What changes were proposed in this pull request?

This PR propose to add a new API in SparkHadoopUtil to trigger manual credential updating in the run-time.

This is mainly used in long running spark applications which needs to access different secured system in the run-time. For example, when Zeppelin / Spark Shell needs to access different secured HBase cluster or Hive metastore service in the run-time, it requires tokens from new services and updates to executors immediately. Previously either we need to relaunch the application to get new tokens, or we need to wait until the old tokens to expire to get new ones.

With this new API, user could manually trigger credential updating in the run-time when required. Credentials will be renewed in AM and updated in executor and driver side.

How was this patch tested?

Manually verified in the secured cluster.

tgravescs · 2016-08-24T14:27:53Z

what or who exactly is calling the updateCredentials here? The user themselves are calling a developer api in SparkHadoopUtil? I don't know if we ever finished the discussion on DeveloperAPi, but that doesn't seem like a very stable interface for users.

jerryshao · 2016-08-24T14:32:12Z

The user themselves will call this API. Another option in SPARK-14743 is to add this API in SparkContext. But from my understanding it is quite Hadoop related, so adding here might be more proper.

tgravescs · 2016-08-24T14:53:26Z

So as far as the use case you are saying a user has a shell/process running, they are possibly accessing one hive or hbase cluster (or maybe none) and then they want to access another one. So how is it a user adds/changes the confs to the other hbase or hive clusters? which the trigger would then know how to go fetch.

What are all the steps required?

SparkQA · 2016-08-24T16:12:01Z

Test build #64352 has finished for PR 14789 at commit 85370e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-08-25T01:39:33Z

@tgravescs , with SPARK-14743, credentials/tokens can be managed out of Spark with their own credential provider. In that case user could update the credentials based on configuration changes or something else. For example: we could write a dynamic HBase credential provider which loads configuration from HDFS, once user what to change to another HBase cluster, they could update configuration files in HDFS and trigger credential updating manually using the API provided in this PR. It is mainly based on how user implement their own credential provider. What here provided is just a manual credential update mechanism, when and how to trigger this is relying on user.

SparkQA · 2016-08-25T04:39:59Z

Test build #64390 has finished for PR 14789 at commit a25a347.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm · 2016-11-08T20:05:30Z

Hi @tgravescs,
This is specifically for the case where new input source/output destinations are getting added after the driver has already started. Notebooks, long running streaming applications applications, etc - where the configuration could be passed in dynamically.
I do agree that SparkHadoopUtil is a DeveloperApi which we are leveraging here, is there any other api in yarn module which will be a better fit ? I did not find, but could have overlooked.

I have other comments on the pr which I will add, but wanted to ensure there are no high level issues with the approach. Thanks

mridulm · 2016-11-08T20:08:56Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

@tgravescs @vanzin Any thoughts on keeping this private and executing a closure on values of map for usecases like this ? (instead of exposing it or creating a copy via toMap). This can also ensure that outside uses will maintain the MT-invariance expected.

Yeah, this should stay private since there are synchronization implications. We also want to be careful about having it locked while trying to send this updates though too because it could interfere with normal message processing.

perhaps it needs a new interface and then a DriveEndPoint to handle this. I'd have to look closer

mridulm · 2016-11-08T20:20:19Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

You might want to return a Future for this, which can indicate when the credentials have been updated : as of now, developers invoking this api have no way to know when (if at all) the credentials have been propagated through the driver/executors.

Agreed, Future might be better, I will change it.

mridulm · 2016-11-08T20:38:17Z

yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala

Use ask instead of send.
This allows for consumers of the api to have reasonable confidence (based on Future's timeout) whether the update was propagated.

send could result in a task schedule before the update is processed.

tgravescs · 2016-11-08T21:52:39Z

I guess I forgot about this pr. I don't think we have any yarn specific public apis. Also theoretically this isn't yarn specific if others decided to support it as well. mesos/standalone could also use credentials and updates its just no one has implemented it.

We do have api's that don't support all deploy modes though, like the SparkLauncher.
This feels like we should have public api in common place and just have backends handle if they support.

A little bit of a side note here, personally I would like to move away from keytabs because its not as secure and at least my company doesn't allow it. To get around keytabs, I was going to add an interface to push new credentials from the gateway box. Have a spark-submit argument that talked to the driver to push new credentials securely over rpc. The reason I bring that up is that if we are adding an api to updateCredentials it would be nice to make it flexible enough to handle that use case as well.

perhaps we should add an interface to get a Credentials like object that then can have the updateCredentials routine and others as needed, that way everything isn't directly in the SparkContext.

mridulm · 2016-11-08T22:16:09Z

@tgravescs Interesting, so it means that there wont be a --principal/--keytab, and what is being done in AMCredentialRenewer will be done in the launcher itself ? So the driver behaves just like what executors currently do ?

The reason I want to understand more is that current spark design does allow for what is described above, but as soon as this change comes in, it will be difficult to support it : since driver knows about dynamically added clusters/sources, while launcher will not.
Unless we propagate from driver back to launcher, and then have the launcher update/push updates to driver and executors.

If yes, you are right, we will need to be careful while designing this to ensure there is minimal change when we evolve to allow for that scenario also.
Not sure if I got the problem statement totally confused though !

Edit: Currently, it is possible for launcher to 'go away' and the spark job to still continue (iirc), but this will mean it cant (when tokens have to be renewed), right ?

vanzin · 2016-11-09T00:34:08Z

Hey, I haven't had the chance to look at this closely, but this seems to just be an API to trigger the existing credential updater code in Spark, right?

When I filed SPARK-14743 I had a different use case in mind. When I talked about Oozie and HoS, Spark's credential updater would not be enabled. Those systems generally do not use Spark's credential updater, since they do not have the user's keytab. They have their own keytab, which they use to login to the KDC and generate the tokens, and they run the child applications using a proxy user. They need a way to update those tokens after Spark has been started, which is different from having a way to trigger Spark's token updater mechanism.

I'm not sure I understand how something like Oozie or HoS would use the particular feature added with this change. They can't give their own keytab to the user's application, because the user code should not have access to that.

Tom:

To get around keytabs, I was going to add an interface to push new credentials from the gateway box.

Yes basically that is more in line with what I had in mind.

mridulm · 2016-11-09T00:43:21Z

yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala

Note that when we modify the above to return future, this will also need to be in the future (or can always be done inline).

mridulm · 2016-11-09T01:08:55Z

@vanzin You are right, this does look orthogonal to what you and Tom described : though both will end up modifying similar codepaths.
'Where' the renewal is happening is the difference - from within spark or from outside of spark (oozie/from gateway node/etc).

I am wondering if we can model the change such that both usecases can be supported with minimal impedence.

tgravescs · 2016-11-09T15:14:36Z

So I was intending to leave the keytab/principal stuff in there and just add the additional mechanism to push credentials. As long as you are ok with keytab from security point of view its easier for somethings.

I think both mechanisms could go through a similar type interface though which is why I brought it up here.

mridulm · 2016-11-10T08:47:58Z

Interesting, thanks for clarifying.

The way I was looking at the change was, given we have ability for custom credential providers now, we would need to support for out-of-band (to the current expiry time based refresh/renew) updates to credentials.
If the same can be leveraged for other usecases (or evolved to do so), that would be even better.

vanzin · 2016-11-10T18:07:57Z

I think there are two different issues, one that is being addressed by this change and one that is the one Tom and I are talking about:

triggering Spark's credential updater mechanism outside its period, because you want to connect to a new NN or something like that.
triggering the credential fetch mechanism from outside the internal credential updater, which would allow an external entity to update an existing Spark app's credentials.

The latter is a more complicated change since credential update and fetch are kinda intertwined right now; for example, for the latter to work, the AM would also need to fetch the new credentials as if it were an executor, and right now all the AM do is run the update thread.

So I think this is fine for covering case 1 (caveat: I haven't looked at the code yet), but it would be nice to have a solution for case 2 eventually.

Change-Id: I06d28293c486f322faa178067b285df198f9b401

SparkQA · 2016-11-11T12:26:27Z

Test build #68519 has finished for PR 14789 at commit b1cac5c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2016-11-11T13:36:53Z

so I agree these are separate cases but I think the api makes sense to be very similar, or at least in the same sort of class. I don't think we want a public end user api in SparkHadoopUtil.updateCredentials. I would rather us create something in core like a Credentials class and put it in there.

vanzin · 2016-12-05T23:23:59Z

but I think the api makes sense to be very similar, or at least in the same sort of class

I think it will be hard to have the same API serve both use cases. You could have one API to generate new credentials and a second one to distribute a set of credentials, and then you could use the second one for the "update from outside of Spark" case. But they'd still be two different new user-facing methods.

I agree that SparkHadoopUtil is a weird place for this. I don't have a good suggestion for where to add it though - I've been planning to play with removing the credentials stuff from SparkHadoopUtil, since we don't support hadoop 1.x anymore and can clean up that code, but haven't had the time.

SparkQA · 2017-02-25T05:01:33Z

Test build #73461 has finished for PR 14789 at commit b1cac5c.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

mridulm reviewed Nov 8, 2016

View reviewed changes

mridulm reviewed Nov 9, 2016

View reviewed changes

jerryshao added 4 commits November 11, 2016 11:18

Support manual credential update

d1d188c

Change the comments

fe70d5a

Fix the typo

f08d178

Address the comments

b1cac5c

Change-Id: I06d28293c486f322faa178067b285df198f9b401

jerryshao force-pushed the SPARK-17209 branch from a25a347 to b1cac5c Compare November 11, 2016 10:00

jerryshao closed this Mar 3, 2017

[SPARK-17209][YARN] Add the ability to manually update credentials for Spark running on YARN #14789

[SPARK-17209][YARN] Add the ability to manually update credentials for Spark running on YARN #14789

Uh oh!

Conversation

jerryshao commented Aug 24, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

tgravescs commented Aug 24, 2016

Uh oh!

jerryshao commented Aug 24, 2016

Uh oh!

tgravescs commented Aug 24, 2016

Uh oh!

SparkQA commented Aug 24, 2016

Uh oh!

jerryshao commented Aug 25, 2016

Uh oh!

SparkQA commented Aug 25, 2016

Uh oh!

mridulm commented Nov 8, 2016

Uh oh!

mridulm Nov 8, 2016

Choose a reason for hiding this comment

Uh oh!

tgravescs Nov 8, 2016

Choose a reason for hiding this comment

Uh oh!

mridulm Nov 8, 2016

Choose a reason for hiding this comment

Uh oh!

jerryshao Nov 9, 2016

Choose a reason for hiding this comment

Uh oh!

mridulm Nov 8, 2016

Choose a reason for hiding this comment

Uh oh!

tgravescs commented Nov 8, 2016

Uh oh!

mridulm commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanzin commented Nov 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mridulm Nov 9, 2016

Choose a reason for hiding this comment

Uh oh!

mridulm commented Nov 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgravescs commented Nov 9, 2016

Uh oh!

mridulm commented Nov 10, 2016

Uh oh!

vanzin commented Nov 10, 2016

Uh oh!

SparkQA commented Nov 11, 2016

Uh oh!

tgravescs commented Nov 11, 2016

Uh oh!

vanzin commented Dec 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mridulm commented Nov 8, 2016 •

edited

Loading

vanzin commented Nov 9, 2016 •

edited

Loading

mridulm commented Nov 9, 2016 •

edited

Loading

vanzin commented Dec 5, 2016 •

edited

Loading