Skip to content

[SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to switch UserGroupInformation in YARN mode#19121

Closed
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-21906
Closed

[SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to switch UserGroupInformation in YARN mode#19121
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-21906

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Sep 4, 2017

What changes were proposed in this pull request?

1、The Yarn application‘s ugi is determined by the ugi launching it
2、 runAsSparkUser is used to switch a ugi as same as itself, because we have already set
env("SPARK_USER") = UserGroupInformation.getCurrentUser().getShortUserName()
in the am container context

 def runAsSparkUser(func: () => Unit) {
    val user = Utils.getCurrentUserName()  // get the user itself
    logDebug("running as user: " + user)
    val ugi = UserGroupInformation.createRemoteUser(user) // create a new ugi use itself
    transferCredentials(UserGroupInformation.getCurrentUser(), ugi) // transfer its own credentials 
    ugi.doAs(new PrivilegedExceptionAction[Unit] { // doAs as itseft
      def run: Unit = func()
    })
  }

How was this patch tested?

manual tests

cc @vanzin

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jerryshao
Copy link
Contributor

Can you please elaborate the problem you met, did you meet any unexpected behavior?

The changes here get rid of env variable "SPARK_USER", this might be OK for yarn application, but what if user runs on standalone mode and explicitly set this "SPARK_USER", your changes seems break the semantics.

@yaooqinn
Copy link
Member Author

yaooqinn commented Sep 5, 2017

@jerryshao

  1. I didn't meet any problems, these codes are ok to run even if it is unnecessary.
  2. In Standalone mode, if collaborating with a secured hdfs, we might haven't support yet. Besides,this ugi doAs wraps executors' initialization but not tasks running, if we truly want to doAs a SPARK_USER, this ugi may be used in both phases.

@jerryshao
Copy link
Contributor

jerryshao commented Sep 5, 2017

UGI is not only used for security, normally it is used for Spark application to communicate with Hadoop using correct user.

doAs already wraps the whole CoarseGrainedExecutorBackend process, all the task threads forked in this process will honor this UGI, don't need to wrap again on each task.

@yaooqinn
Copy link
Member Author

yaooqinn commented Sep 5, 2017

@jerryshao thanks for replying

if a user explicitly set SPARK_USER but not wrap the SPARK_USER UGI doAs for SparkContext initialization, there might be two different users communicating with Hadoop in driver and executors.

@jerryshao
Copy link
Contributor

No, I don't agree with you.

SPARK_USER is set in SparkContext with driver's current UGI and this env variable will be propagated to executors to create executor's UGI with the same user in driver.

For example, if your standalone cluster is started with user "spark", and you submit a Spark application with user "foo" in the gateway, so all the executors should use "foo" to access Hadoop, but with your changes, executors will use "spark" to communicate with Hadoop, since executor process is forked with user "spark" by worker, it is not correct.

@yaooqinn
Copy link
Member Author

yaooqinn commented Sep 5, 2017

@jerryshao Yes, you are right. unlike yarn standalone don‘t switch the jvm process user. but you said that if a user explicitly set SPARK_USER, there is no mechanism in spark to set this user to driver's current UGI, this may lead to user inconsistent between driver and executor.

I may change this to only affect yarn mode.

@jerryshao
Copy link
Contributor

Sorry I didn't clearly say the problem. But IMO the changes you made is really not so necessary.

@yaooqinn
Copy link
Member Author

yaooqinn commented Sep 5, 2017

em..this may also not work for DefaultContainerExecutor which launch containers via admin. might close this pr

@yaooqinn yaooqinn closed this Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants