-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20608] allow standby namenodes in spark.yarn.access.namenodes #17872
Conversation
Can one of the admins verify this patch? |
@srowen @jerryshao @steveloughran This is the latest PR. #17870 is deprecated. |
} catch { | ||
case e: StandbyException => | ||
logWarning(s"Namenode ${dst} is in state standby", e) | ||
case e: RemoteException => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggested adding a handler for UnknownHostException too, but now I think that could hide problems with client config. Best to leave as is.
at a glance, patch LGTM. |
dstFs.addDelegationTokens(tokenRenewer, tmpCreds) | ||
} catch { | ||
case e: StandbyException => | ||
logWarning(s"Namenode ${dst} is in state standby", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not accurate to say "Namenode" here, because we may configure to other non-HDFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum..Here is actually fetching tokens from hadoopFS, including in hadoopFSCredentialProvider, which means it's exactly HDFS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadoop compatible FS doesn't equal to HDFS, we can configure to wasb, adls and others. Also wasb and adls support fetching delegation tokens from common FS API, so we should avoid mentioning Namenode which is only existed in HDFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for the below "RemoteException", how do you know "RemoteException" is exactly a standby exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our tests, there are two possible exceptions when yarn.spark.access.namenodes=hdfs://activeNamenode,hdfs://standbyNamenode
- Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
- Caused by: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby
Maybe RemoteException should be caught by better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that if "RemoteException" is caused by others, it is not correct to log as "Namenode ${dst} is in state standby".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will refactor the exception log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored it. Please review and give some more advices :)
…rom standby namenodes
@@ -22,6 +22,8 @@ import scala.util.Try | |||
|
|||
import org.apache.hadoop.conf.Configuration | |||
import org.apache.hadoop.fs.{FileSystem, Path} | |||
import org.apache.hadoop.ipc.RemoteException | |||
import org.apache.hadoop.ipc.StandbyException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This two line of imports can be merged into one line.
@jerryshao done. |
I think @vanzin is saying this is not the right change |
Yes, I already explained in the discussion in the bug. The very fact you're getting an exception from the standby namenode means you're not actually getting the delegation token. Which makes this change pointless. |
Related Jira:
https://issues.apache.org/jira/browse/SPARK-20608
Descriptions:
See PR (in branch-2.1): #17870