Skip to content

Conversation

@apeming
Copy link

@apeming apeming commented Jan 15, 2020

What changes were proposed in this pull request?

Let executor can specify hadoop user by using spark.executorEnv.HADOOP_USER_NAME

Why are the changes needed?

We can use HADOOP_USER_NAME to specify hadoop user while submitting a job on driver, but invalid to set executor's hadoop user even by using spark.executorEnv.HADOOP_USER_NAME.

Does this PR introduce any user-facing change?

No

How was this patch tested?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30519][CORE][2.4.3]use spark.executorEnv.HADOOP_USER_NAME to set executor's hadoop user [SPARK-30519][CORE] Use spark.executorEnv.HADOOP_USER_NAME to set executor's hadoop user Jan 15, 2020

def createSparkUser(): UserGroupInformation = {
val user = Utils.getCurrentUserName()
val user = Option(System.getenv("HADOOP_USER_NAME")).getOrElse(Utils.getCurrentUserName())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "HADOOP_USER_NAME" constant defined?

Copy link
Author

@apeming apeming Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's specified by spark.executorEnv.HADOOP_USER_NAME and all executors will use it

@jiangxb1987
Copy link
Contributor

This is a behavior change, and I don't think we should introduce this change by the approach proposed this way.

If both SPARK_USER and HADOOP_USER_NAME are specified in the env, we should prioritize SPARK_USER over HADOOP_USER_NAME. Also, I'm hesitate whether we should support HADOOP_USER_NAME at all.

@apeming
Copy link
Author

apeming commented Feb 15, 2020

This is a behavior change, and I don't think we should introduce this change by the approach proposed this way.

If both SPARK_USER and HADOOP_USER_NAME are specified in the env, we should prioritize SPARK_USER over HADOOP_USER_NAME. Also, I'm hesitate whether we should support HADOOP_USER_NAME at all.

But SPARK_USER is used for scheduler such as Mesos, Kubernetes and HADOOP_USER_NAME is used for Hadoop cluster such HDFS. At Spark on Kubernetes, SPARK_USER is not over HADOOP_USER_NAME

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 26, 2020
@github-actions github-actions bot closed this May 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants