Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20608] allow standby namenodes in spark.yarn.access.namenodes #17872

Closed

Conversation

charliechen211
Copy link

Related Jira:
https://issues.apache.org/jira/browse/SPARK-20608

Descriptions:
See PR (in branch-2.1): #17870

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@charliechen211 charliechen211 changed the title [SPARK-20608] allow standby namenodes in spark.yarn.access.namenodes (PR in master branch) [SPARK-20608] allow standby namenodes in spark.yarn.access.namenodes May 5, 2017
@charliechen211
Copy link
Author

@srowen @jerryshao @steveloughran This is the latest PR. #17870 is deprecated.

} catch {
case e: StandbyException =>
logWarning(s"Namenode ${dst} is in state standby", e)
case e: RemoteException =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggested adding a handler for UnknownHostException too, but now I think that could hide problems with client config. Best to leave as is.

@steveloughran
Copy link
Contributor

at a glance, patch LGTM.

dstFs.addDelegationTokens(tokenRenewer, tmpCreds)
} catch {
case e: StandbyException =>
logWarning(s"Namenode ${dst} is in state standby", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not accurate to say "Namenode" here, because we may configure to other non-HDFS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum..Here is actually fetching tokens from hadoopFS, including in hadoopFSCredentialProvider, which means it's exactly HDFS?

Copy link
Contributor

@jerryshao jerryshao May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop compatible FS doesn't equal to HDFS, we can configure to wasb, adls and others. Also wasb and adls support fetching delegation tokens from common FS API, so we should avoid mentioning Namenode which is only existed in HDFS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for the below "RemoteException", how do you know "RemoteException" is exactly a standby exception?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our tests, there are two possible exceptions when yarn.spark.access.namenodes=hdfs://activeNamenode,hdfs://standbyNamenode

  1. Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
  2. Caused by: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby
    Maybe RemoteException should be caught by better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that if "RemoteException" is caused by others, it is not correct to log as "Namenode ${dst} is in state standby".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will refactor the exception log.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it. Please review and give some more advices :)

@@ -22,6 +22,8 @@ import scala.util.Try

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.ipc.RemoteException
import org.apache.hadoop.ipc.StandbyException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two line of imports can be merged into one line.

@jerryshao
Copy link
Contributor

This change may be conflicted with #17723 , but I think it is easy to resolve, CC @mgummelt .

@charliechen211
Copy link
Author

@jerryshao done.

@srowen
Copy link
Member

srowen commented May 12, 2017

I think @vanzin is saying this is not the right change

@vanzin
Copy link
Contributor

vanzin commented May 12, 2017

Yes, I already explained in the discussion in the bug. The very fact you're getting an exception from the standby namenode means you're not actually getting the delegation token. Which makes this change pointless.

@srowen srowen mentioned this pull request May 17, 2017
@asfgit asfgit closed this in 5d2750a May 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants