Skip to content

Conversation

@harishreedharan
Copy link
Contributor

…tempting to create new delegation tokens if a new SparkContext is created within the same application.

Since Hadoop gives precedence to the delegation tokens, we must make sure we login as a different user, get new tokens and replace the old ones in the current user's credentials cache to avoid not being able to get new ones.

/cc @tedyu @tgravescs

…tempting to create new delegation tokens if a new SparkContext is created within the same application.

Since Hadoop gives precedence to the delegation tokens, we must make sure we login as a different user, get new tokens and replace the old ones in the current user's credentials cache to avoid not being able to get new ones.
@harishreedharan
Copy link
Contributor Author

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #46458 has finished for PR 9875 at commit 70f610f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #46454 has finished for PR 9875 at commit 70f610f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@harishreedharan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #46466 has finished for PR 9875 at commit 70f610f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

@harishreedharan I thought this issue existed even without reusing the jvm (going by other jira and prs that were filed)? For instance if you just have a long running process and had specified a keytab to use. I thought they had said it wasn't relogging in because of the token that it was acquiring.

@harishreedharan
Copy link
Contributor Author

Yes, that is correct but this happens even without the tokens expiring.

What do you think about doing the relogin in the YarnCLientSchedulerBackend? This is messy with client mode but I don't think we have another option though.

@harishreedharan
Copy link
Contributor Author

Also, to be clear..it would work fine in Cluster mode. In Client mode, #7394 should have taken care of the long-running app issue (though there was one where @SaintBacchus I think mentioned that the EventLoggingListener was not able to write to HDFS anymore, even though the tokens were getting updated.

So in either case, I am wondering whether in client mode, we should simply re-login using keytab and not bother with tokens on the driver app at all (so the AM would login, update tokens etc, while the client app always just logs in). So you think that make sense, @tgravescs?

@steveloughran
Copy link
Contributor

Is the situation where the client can only have tokens need to be covered? As users may not have keytabs, only kinit-granted tickets, and they still have the right to submit work with a lifespan <= ticket life.

@tgravescs
Copy link
Contributor

If we are going to include the fix for the issue @SaintBacchus mentioned then I think the right thing is to only login from the keytab or get the tokens (not both). If keytab is supplied always use that and don't bother with tokens on the driver, otherwise get the tokens.

@djdean
Copy link

djdean commented Nov 30, 2015

After applying the provided patch things still do not work. I've been doing some debugging, I've found some additional information. When it works, it seems that two tokens are created with the renewal interval being set for the first one using the "getTokenRenewalInterval(stagingDirPath)" function in Client.scala. The second time around (after stopping and restarting the context), however, it prints a message saying 1 token was created, but no renewal interval is set. Finally, it dies saying the token can't be found in the cache. The relevant output is below (ip/hostnames removed):

---------------Successful run--------------
15/11/30 14:18:57 INFO yarn.Client: Credentials file set to: credentials-372be24e-9614-48d4-9f51-4cf275c51f46
15/11/30 14:18:57 INFO yarn.YarnSparkHadoopUtil: delegTokenRenewer: hadoop
15/11/30 14:18:57 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://HOSTNAME:9000/user/hadoop/.sparkStaging/application_1446695132208_0114
15/11/30 14:18:57 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 142 for hadoop on xxx.xxx.xxx.xxx
15/11/30 14:18:57 INFO yarn.Client: Renewal Interval set to 86400400
15/11/30 14:18:57 INFO yarn.Client: Preparing resources for our AM container
15/11/30 14:18:57 INFO yarn.YarnSparkHadoopUtil: delegTokenRenewer: rm/HOSTNAME
15/11/30 14:18:57 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://HOSTNAME:9000/user/hadoop/.sparkStaging/application_1446695132208_0114
15/11/30 14:18:57 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 143 for hadoop on xxx.xxx.xxx.xxx
15/11/30 14:18:58 INFO yarn.YarnSparkHadoopUtil: Hive class not found java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
15/11/30 14:18:58 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
15/11/30 14:18:58 INFO yarn.Client: Uploading resource file:/etc/security/keytabs/hadoop.keytab -> /user/hadoop/.sparkStaging/application_1446695132208_0114/hadoop.keytab
--------End successful run--------------
--------Failed run------------
15/11/30 14:19:46 INFO yarn.Client: Credentials file set to: credentials-b91660b6-a7c4-49f1-b869-ded70fec1641
15/11/30 14:19:46 INFO yarn.Client: Preparing resources for our AM container
15/11/30 14:19:46 INFO yarn.YarnSparkHadoopUtil: Called with conf: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml
15/11/30 14:19:46 INFO yarn.YarnSparkHadoopUtil: delegTokenRenewer: rm/HOSTNAME
15/11/30 14:19:46 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://HOSTNAME:9000/user/hadoop/.sparkStaging/application_1446695132208_0115
15/11/30 14:19:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 144 for hadoop on xxx.xxx.xxx.xxx
15/11/30 14:19:46 INFO yarn.YarnSparkHadoopUtil: Hive class not found java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
15/11/30 14:19:46 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
15/11/30 14:19:46 INFO yarn.Client: Uploading resource file:/etc/security/keytabs/hadoop.keytab -> hdfs://HOSTNAME:9000/user/hadoop/.sparkStaging/application_1446695132208_0115/hadoop.keytab
15/11/30 14:19:46 INFO yarn.Client: Uploading resource file:/var/tmp/spark-1.6.0-SNAPSHOT-bin-patch-8/lib/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.1.jar -> hdfs://HOSTNAME/user/hadoop/.sparkStaging/application_1446695132208_0115/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.1.jar
15/11/30 14:19:58 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 143 for hadoop) can't be found in cache
15/11/30 14:19:58 WARN hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-695293104_13] for 30 seconds. Will retry shortly ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 143 for hadoop) can't be found in cache

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58467 has finished for PR 9875 at commit 70f610f.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

// If this JVM hosted an yarn-client mode driver before, the credentials of the current user
// now has delegation tokens, which means Hadoop security code will look at that and not the
// keytab login. So we must re-login and get new tokens.
if (reusedJVM && loginFromKeytab && !isClusterMode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @harishreedharan , do you plan on updating this patch?

If yes, I'm wondering why not do this in all cases, not just when a new context is created. The same code should work in both scenarios, right?

If not, should probably close the PR.

@vanzin
Copy link
Contributor

vanzin commented Jul 12, 2016

ping @harishreedharan please close the PR if you don't intend to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants