New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11265] [YARN] YarnClient can't get tokens to talk to Hive 1.2.1 in a secure cluster #9232
[SPARK-11265] [YARN] YarnClient can't get tokens to talk to Hive 1.2.1 in a secure cluster #9232
Conversation
…luster. This is just the first "it compiles" stage of the patch: no tests or attempts to run things yet
… set doesn't follow the current rules precisely
@@ -1337,55 +1337,8 @@ object Client extends Logging { | |||
conf: Configuration, | |||
credentials: Credentials) { | |||
if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add a check for isClusterMode
too ?
I believe we don't need a delegation token if we are running in client mode.
The driver will have access to the submitting user's kerberos tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably does. In fact, getting a delegation token there is potentially worse than useless, as that DT will eventually expire. Using the keytab direct will, provided the user keeps logging on, stay valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @tgravescs points out, the tokens are needed throughout the cluster, and yes, must be obtained irrespective of deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't really think the delegation tokens are needed in client mode (for Hive only). Only the driver talks to the metastore.
Test build #44176 has finished for PR 9232 at commit
|
username: String): Option[Token[DelegationTokenIdentifier]] = { | ||
val mirror = universe.runtimeMirror(getClass.getClassLoader) | ||
|
||
// the hive configuration class is a subclass of Hadoop Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the hive metadata uri should also in Hadoop Configuration, does it make sense directly check the metastore URI from conf instead trying to get it from HiveConf ? which could potentially get exception
…metastore, but goes all the way to trying set up a thrift RPC call, then handles the exception raised and verifies that handling is as expected. Guarantees that the reflection code does (and will contine to) work.
-just pushed up a version with @dougb's suggestion; the check for cluster mode is made before trying to obtain either hive or hbase tokens. Note that the HBase token acquisition is very similar to the hive one, and could also be factored out, including sharing the same exception-handling function as the hive one. And again, testable, though that would need Hbase on the test classpath. |
Test build #44222 has finished for PR 9232 at commit
|
Test build #44223 has finished for PR 9232 at commit
|
@@ -322,8 +322,10 @@ private[spark] class Client( | |||
// multiple times, YARN will fail to launch containers for the app with an internal | |||
// error. | |||
val distributedUris = new HashSet[String] | |||
obtainTokenForHiveMetastore(sparkConf, hadoopConf, credentials) | |||
obtainTokenForHBase(sparkConf, hadoopConf, credentials) | |||
if (isClusterMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this cluster mode only? I can run spark shell to access hive or hbase and this won't get tokens for those to ship to executors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion from @dougb , but yes, I didn't think that through. It's not for the driver, its for the nodes. Will fix
…luster mode" This reverts commit 9630a9d.
Patch with revert posted For reviewers, here some things to consider
Also, I've just realised the final |
Test build #44244 has finished for PR 9232 at commit
|
…ny; always call closeCurrent() but only log that call's exceptions, rather than rethrow,
Updated patch
With this load-methods-first strategy, the |
Test build #44257 has finished for PR 9232 at commit
|
…put in a val, and, as we know that it is valid from then on, called directly in the following try/finally block
I've tightened this up and am happy with the code now, except I'm not sure that NoSuchMethodExceptions should be downgraded. It will hide the situation of incompatible Hive version on the CP yet principal required to talk to the hive metastore. Provided the spark app doesn't want to talk to hive, it's not going to matter. But if the app does want't to talk to hive, there'll be no cue except for a log message at launch time, and failures in executors whenever they try to set up an RPC connection. Leaving it as is retains backwards compatibility though. |
Test build #44260 has finished for PR 9232 at commit
|
Test build #44264 has finished for PR 9232 at commit
|
@steveloughran could you use PR titles following the convention described in the following document? |
Test build #44296 has finished for PR 9232 at commit
|
case e: NoSuchMethodException => | ||
logInfo("Hive Method not found", e) | ||
None | ||
case e: ClassNotFoundException => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda feel that everything but this exception should be propagated. That's really the only valid error you might expect; someone running Spark without Hive classes, and even that might change (I think there's a bug tracking removal of the hive profile from the maven build).
Every other error means that either you have the wrong version of Hive in the classpath (user error) or that your configuration says you need delegation tokens, but you can't get them for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I'd left it in there as it may have had a valid reason for being there, but i do things it's correct. Detecting config problems, that is something to throw up.
Note that Client.obtainTokenForHBase()
has similar behaviour; this patch doesn't address it. When someone sits down to do it, the policy about how to react to failures could be converted into a wrapper around a closure which executes the token retrieval (here obtainTokenForHiveMetastoreInner
), so there'd be no divergence.
case e: ClassNotFoundException => | ||
logInfo(s"$service class not found $e") | ||
logDebug("Hive Class not found", e) | ||
case t: Throwable => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really, you don't need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you? All you're doing here is printing the stack trace to stderr, which will happen again when you re-throw the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those exceptions aren't being rethrown though, are they? So its logging the full stack @ debug, and a one-liner for most.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. You're catching an exception, logging it, and re-throwing it, which causes the exception to show up twice in the process output.
Instead, you can just delete your code, and let the exception propagate naturally. It will show up in the output the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, we're at cross purposes. I was looking @ line 128, above. you were at 180. That's why I Was confused. Yes, I'll cut the lower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @steveloughran ,
I think you're still not really getting what I'm saying. You can just delete this whole case
. The exception will just propagate up the call stack.
(edit: actually I see why you need to re-throw. I'd just move the exception handling to the calling method. It's just simpler that way, you're not gaining anything from having this extra method.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I'd pulled it out was to have an isolated policy which could be both tested without mocking failures, and be re-used in the HBase token retrieval, which needs an identical set of clauses.
Given the policy has now been simplified so much, the method is now pretty much unused; I can pull it. But still the HBase token logic will need to be 100% in sync.
Test build #44469 has finished for PR 9232 at commit
|
Test build #44475 has finished for PR 9232 at commit
|
logInfo(s"$service class not found $e") | ||
logDebug("Hive Class not found", e) | ||
case t: Throwable => { | ||
throw t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the exception is thrown. I know swallow the exception is bad, but what happen if the user does not want to access the hive metastore but want to use spark even if token cannot be acquired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can (i) not give Spark a hive configuration, in which case there will be no metastore URIs, and this code will be skipped, or (ii) set spark.yarn.security.tokens.hive.enabled
to false.
Test build #44522 has finished for PR 9232 at commit
|
@@ -1337,55 +1337,9 @@ object Client extends Logging { | |||
conf: Configuration, | |||
credentials: Credentials) { | |||
if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) { | |||
val mirror = universe.runtimeMirror(getClass.getClassLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should go to utils.getContextOrSparkClassLoader()
; notable that scalastyle doesn't pick up on this, even though it rejects Class.forName()
since SPARK-8962
Test build #44637 has finished for PR 9232 at commit
|
case e: Exception => { logError("Unexpected Exception " + e) | ||
throw new RuntimeException("Unexpected exception", e) | ||
} | ||
val util = new YarnSparkHadoopUtil() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be YarnSparkHadoopUtil.get
; you could also avoid the val
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Just some minor things left to clean up, otherwise looks ok. |
Test build #44676 has finished for PR 9232 at commit
|
Latest patch LGTM. |
Merging to master. |
Hi @steveloughran, this patch doesn't merge cleanly to branch-1.5. If you want to apply it there, could you send a new pr for that branch? Thanks. |
Will do. Same JIRA or a new backport one? |
} catch { | ||
case e: ClassNotFoundException => | ||
logInfo(s"Hive class not found $e") | ||
logDebug("Hive class not found", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why double log the message ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not exactly the same.
You could use the same one; I marked it as resolved but it's ok to reopen it for the backport. |
This is a fix for SPARK-11265; the introspection code to get Hive delegation tokens failing on Spark 1.5.1+, due to changes in the Hive codebase