-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44976] Preserve full principal user name on executor side #42690
Conversation
I found that this doesn't work because |
I change it to set full principal user name on executor side. |
@@ -560,7 +560,7 @@ class SparkContext(config: SparkConf) extends Logging { | |||
// TODO: Set this only in the Mesos scheduler. | |||
executorEnvs("SPARK_EXECUTOR_MEMORY") = executorMemory + "m" | |||
executorEnvs ++= _conf.getExecutorEnv | |||
executorEnvs("SPARK_USER") = sparkUser | |||
executorEnvs("SPARK_USER") = Utils.getCurrentFullUserName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass SPARK_USER
env. from driver to executor.
@@ -890,7 +890,7 @@ private[spark] class Client( | |||
val env = new HashMap[String, String]() | |||
populateClasspath(args, hadoopConf, sparkConf, env, sparkConf.get(DRIVER_CLASS_PATH)) | |||
env("SPARK_YARN_STAGING_DIR") = stagingDirPath.toString | |||
env("SPARK_USER") = UserGroupInformation.getCurrentUser().getShortUserName() | |||
env("SPARK_USER") = UserGroupInformation.getCurrentUser().getUserName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass full principal user name to driver
@@ -64,7 +64,7 @@ private[spark] class SparkHadoopUtil extends Logging { | |||
} | |||
|
|||
def createSparkUser(): UserGroupInformation = { | |||
val user = Utils.getCurrentUserName() | |||
val user = Utils.getCurrentFullUserName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set full principal user name to ugi on executor side
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
It should be considered when using kerberized cluster. |
How can I open it again? |
Continued on #44244 |
What changes were proposed in this pull request?
Use full principal name as spark user name to respect
hadoop.security.auth_to_local
when accessing non-kerberized hdfs from kerberized hadoop cluster.Why are the changes needed?
Since https://issues.apache.org/jira/browse/SPARK-6558, spark uses short user name, it causes not to respect
hadoop.security.auto_to_local
on the NameNode in non-kerberized hadoop cluster.Also, if an user provides
--principal
and--keytab
options when submitting spark job on kerberized cluster and creating output on non-kerberized HDFS, file/directory ownerships are not coherent.Additional description is on https://issues.apache.org/jira/browse/SPARK-44976.
Does this PR introduce any user-facing change?
The ownerships of output file/directory will be coherent even in non-kerberized hdfs cluster from spark job in kerberized cluster.
How was this patch tested?
Manually tested.
Was this patch authored or co-authored using generative AI tooling?
No.