Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11265] [YARN] YarnClient can't get tokens to talk to Hive 1.2.1 in a secure cluster #9232

Conversation

steveloughran
Copy link
Contributor

This is a fix for SPARK-11265; the introspection code to get Hive delegation tokens failing on Spark 1.5.1+, due to changes in the Hive codebase

…luster. This is just the first "it compiles" stage of the patch: no tests or attempts to run things yet
… set doesn't follow the current rules precisely
@@ -1337,55 +1337,8 @@ object Client extends Logging {
conf: Configuration,
credentials: Credentials) {
if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a check for isClusterMode too ?
I believe we don't need a delegation token if we are running in client mode.
The driver will have access to the submitting user's kerberos tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably does. In fact, getting a delegation token there is potentially worse than useless, as that DT will eventually expire. Using the keytab direct will, provided the user keeps logging on, stay valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @tgravescs points out, the tokens are needed throughout the cluster, and yes, must be obtained irrespective of deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't really think the delegation tokens are needed in client mode (for Hive only). Only the driver talks to the metastore.

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44176 has finished for PR 9232 at commit 1fcb42e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

username: String): Option[Token[DelegationTokenIdentifier]] = {
val mirror = universe.runtimeMirror(getClass.getClassLoader)

// the hive configuration class is a subclass of Hadoop Configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hive metadata uri should also in Hadoop Configuration, does it make sense directly check the metastore URI from conf instead trying to get it from HiveConf ? which could potentially get exception

…metastore, but goes all the way to trying set up a thrift RPC call, then handles the exception raised and verifies that handling is as expected. Guarantees that the reflection code does (and will contine to) work.
@steveloughran
Copy link
Contributor Author

-just pushed up a version with @dougb's suggestion; the check for cluster mode is made before trying to obtain either hive or hbase tokens.

Note that the HBase token acquisition is very similar to the hive one, and could also be factored out, including sharing the same exception-handling function as the hive one. And again, testable, though that would need Hbase on the test classpath.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44222 has finished for PR 9232 at commit 1248656.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44223 has finished for PR 9232 at commit 9630a9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran steveloughran changed the title Stevel/patches/spark 11265 hive tokens SPARK-11265 hive tokens Oct 23, 2015
@@ -322,8 +322,10 @@ private[spark] class Client(
// multiple times, YARN will fail to launch containers for the app with an internal
// error.
val distributedUris = new HashSet[String]
obtainTokenForHiveMetastore(sparkConf, hadoopConf, credentials)
obtainTokenForHBase(sparkConf, hadoopConf, credentials)
if (isClusterMode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this cluster mode only? I can run spark shell to access hive or hbase and this won't get tokens for those to ship to executors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion from @dougb , but yes, I didn't think that through. It's not for the driver, its for the nodes. Will fix

@steveloughran
Copy link
Contributor Author

Patch with revert posted

For reviewers, here some things to consider

  1. Which exceptions should trigger log & skip for token, vs which are critical enough to fail the process. The patch ignores no classpath found (NoClassDefFoundError and ClassNotFoundException) (i.e. hive not on the CP), but also NoSuchMethodException. Which means that even if the security settings require a token, if there's some method incompatibility, it will be ignored.
  2. the Hadoop config now passed down is used to help define the hive configuration used in the binding. This is needed for testing. Previously, an empty HiveConf instance was created: enough to force in the hive-default.xml and hive-site.xml, but not able to pick on any custom options. This is a change: does it constitute a regression, or a feature?

Also, I've just realised the finalcloseCurrent() call must be in a finally(); will add

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44244 has finished for PR 9232 at commit 5158afe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ny; always call closeCurrent() but only log that call's exceptions, rather than rethrow,
@steveloughran
Copy link
Contributor Author

Updated patch

  1. Moves all arg validation up before any hive class reflection
  2. gets all hive class methods before attempting to call any, so fails fast if any are missing
  3. calls Hive.closeCurrent() in a finally block (if it could be loaded); catches and logs exceptions @ warn

With this load-methods-first strategy, the closeCurrent variable will always contain a value prior to Hive.get() being invoked. As a result, Hive.closeCurrent() will always be invoked after Hive.get() method has been invoked (irrespective of outcome), so is guaranteed to trigger a cleanup once a Hive instance was created. Previously, any failure of Hive.getDelegationToken() would skip the cleanup.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44257 has finished for PR 9232 at commit 9713cf5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…put in a val, and, as we know that it is valid from then on, called directly in the following try/finally block
@steveloughran
Copy link
Contributor Author

I've tightened this up and am happy with the code now, except I'm not sure that NoSuchMethodExceptions should be downgraded. It will hide the situation of incompatible Hive version on the CP yet principal required to talk to the hive metastore. Provided the spark app doesn't want to talk to hive, it's not going to matter. But if the app does want't to talk to hive, there'll be no cue except for a log message at launch time, and failures in executors whenever they try to set up an RPC connection. Leaving it as is retains backwards compatibility though.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44260 has finished for PR 9232 at commit bc6ea26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44264 has finished for PR 9232 at commit b417ca3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 24, 2015

@steveloughran could you use PR titles following the convention described in the following document?
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest

@steveloughran steveloughran changed the title SPARK-11265 hive tokens [SPARK-11265] [YARN] YarnClient can't get tokens to talk to Hive in a secure cluster Oct 24, 2015
@SparkQA
Copy link

SparkQA commented Oct 24, 2015

Test build #44296 has finished for PR 9232 at commit 89edfa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case e: NoSuchMethodException =>
logInfo("Hive Method not found", e)
None
case e: ClassNotFoundException =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda feel that everything but this exception should be propagated. That's really the only valid error you might expect; someone running Spark without Hive classes, and even that might change (I think there's a bug tracking removal of the hive profile from the maven build).

Every other error means that either you have the wrong version of Hive in the classpath (user error) or that your configuration says you need delegation tokens, but you can't get them for some reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I'd left it in there as it may have had a valid reason for being there, but i do things it's correct. Detecting config problems, that is something to throw up.

Note that Client.obtainTokenForHBase() has similar behaviour; this patch doesn't address it. When someone sits down to do it, the policy about how to react to failures could be converted into a wrapper around a closure which executes the token retrieval (here obtainTokenForHiveMetastoreInner), so there'd be no divergence.

case e: ClassNotFoundException =>
logInfo(s"$service class not found $e")
logDebug("Hive Class not found", e)
case t: Throwable => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really, you don't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you? All you're doing here is printing the stack trace to stderr, which will happen again when you re-throw the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those exceptions aren't being rethrown though, are they? So its logging the full stack @ debug, and a one-liner for most.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. You're catching an exception, logging it, and re-throwing it, which causes the exception to show up twice in the process output.

Instead, you can just delete your code, and let the exception propagate naturally. It will show up in the output the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we're at cross purposes. I was looking @ line 128, above. you were at 180. That's why I Was confused. Yes, I'll cut the lower

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @steveloughran ,

I think you're still not really getting what I'm saying. You can just delete this whole case. The exception will just propagate up the call stack.

(edit: actually I see why you need to re-throw. I'd just move the exception handling to the calling method. It's just simpler that way, you're not gaining anything from having this extra method.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I'd pulled it out was to have an isolated policy which could be both tested without mocking failures, and be re-used in the HBase token retrieval, which needs an identical set of clauses.

Given the policy has now been simplified so much, the method is now pretty much unused; I can pull it. But still the HBase token logic will need to be 100% in sync.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44469 has finished for PR 9232 at commit fbf0ecb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"$service class not found $e\")\n

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44475 has finished for PR 9232 at commit 00cb5a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"$service class not found $e\")\n

logInfo(s"$service class not found $e")
logDebug("Hive Class not found", e)
case t: Throwable => {
throw t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the exception is thrown. I know swallow the exception is bad, but what happen if the user does not want to access the hive metastore but want to use spark even if token cannot be acquired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can (i) not give Spark a hive configuration, in which case there will be no metastore URIs, and this code will be skipped, or (ii) set spark.yarn.security.tokens.hive.enabled to false.

@SparkQA
Copy link

SparkQA commented Oct 28, 2015

Test build #44522 has finished for PR 9232 at commit 217faba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"$service class not found $e\")\n * logDebug(\"$service class not found\", e)\n

@@ -1337,55 +1337,9 @@ object Client extends Logging {
conf: Configuration,
credentials: Credentials) {
if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
val mirror = universe.runtimeMirror(getClass.getClassLoader)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go to utils.getContextOrSparkClassLoader(); notable that scalastyle doesn't pick up on this, even though it rejects Class.forName() since SPARK-8962

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44637 has finished for PR 9232 at commit dd8dea9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"Hive class not found $e\")\n * logDebug(\"Hive class not found\", e)\n

case e: Exception => { logError("Unexpected Exception " + e)
throw new RuntimeException("Unexpected exception", e)
}
val util = new YarnSparkHadoopUtil()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be YarnSparkHadoopUtil.get; you could also avoid the val altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@vanzin
Copy link
Contributor

vanzin commented Oct 30, 2015

Just some minor things left to clean up, otherwise looks ok.

@SparkQA
Copy link

SparkQA commented Oct 30, 2015

Test build #44676 has finished for PR 9232 at commit ebb2b5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"Hive class not found $e\")\n * logDebug(\"Hive class not found\", e)\n

@vanzin
Copy link
Contributor

vanzin commented Oct 31, 2015

Latest patch LGTM.

@vanzin
Copy link
Contributor

vanzin commented Nov 1, 2015

Merging to master.

@asfgit asfgit closed this in 40d3c67 Nov 1, 2015
@vanzin
Copy link
Contributor

vanzin commented Nov 1, 2015

Hi @steveloughran, this patch doesn't merge cleanly to branch-1.5. If you want to apply it there, could you send a new pr for that branch? Thanks.

@steveloughran
Copy link
Contributor Author

Will do. Same JIRA or a new backport one?

} catch {
case e: ClassNotFoundException =>
logInfo(s"Hive class not found $e")
logDebug("Hive class not found", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why double log the message ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not exactly the same.

@vanzin
Copy link
Contributor

vanzin commented Nov 1, 2015

Same JIRA or a new backport one?

You could use the same one; I marked it as resolved but it's ok to reopen it for the backport.

@steveloughran steveloughran deleted the stevel/patches/SPARK-11265-hive-tokens branch November 3, 2015 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants