Skip to content

Conversation

yqwang-ms
Copy link

@yqwang-ms yqwang-ms commented Apr 24, 2020

What changes were proposed in this pull request?

Using createProxyUser to avoid lost any subject creds.

Why are the changes needed?

See https://issues.apache.org/jira/browse/SPARK-31551

Does this PR introduce any user-facing change?

Yes, after this change, the UGI, including all its creds, provided by current user will be used to authn.

How was this patch tested?

Yes

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yqwang-ms yqwang-ms changed the title [SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials and fully qualified user name [SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials Apr 24, 2020
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to also change this to be getUserName()?

@yqwang-ms
Copy link
Author

@tgravescs @sryza @vanzin Related Contributors, could you please take a look at this? Thanks!

@tgravescs
Copy link
Contributor

it's been a long time since I looked at this so I will have to go refresh my memory but I don't think using getUserName works in all cases.

What kind of testing have you done on this? Note the question is How was this patch tested, so can you please give more details on unit tests and manual testing done. In this case we need to have tested in various secure environments.

@yqwang-ms
Copy link
Author

yqwang-ms commented Apr 28, 2020

Thanks for the review :)

For testing, until now, I have did:

  1. End to end simple spark job test (like spark word count) on secure Hadoop cluster.
  2. All existing unit tests are passed

And we can also do a little reasoning, to prove that this change does not introduce more issue than current spark (given the code change is small).
For current spark,

def createSparkUser(): UserGroupInformation = {
val user = Utils.getCurrentUserName()
logDebug("creating UGI for user: " + user)
val ugi = UserGroupInformation.createRemoteUser(user)
transferCredentials(UserGroupInformation.getCurrentUser(), ugi)
ugi
}
def transferCredentials(source: UserGroupInformation, dest: UserGroupInformation): Unit = {
dest.addCredentials(source.getCredentials())
}

The transferCredentials func can only transfer Hadoop creds such as Delegation Tokens.
However, other creds stored in UGI.subject.getPrivateCredentials, will be lost here, such as:

  • Non-Hadoop creds:
    Such as, Kafka creds
  • Newly supported or 3rd party supported Hadoop creds:
    Such as to support OAuth/JWT token authn on Hadoop, we need to store the OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens are not supposed to be managed by Hadoop Credentials (currently it is only for Hadoop secret keys and delegation tokens).

BTW, if we use SPARK_USER to be the effective user, and UserGroupInformation.getCurrentUser as real user (to impersonate the effective user), we should use createProxyUser, instead of, directly transferCredentials from UserGroupInformation.getCurrentUser to SPARK_USER. This is because the transfered creds may not match with the SPARK_USER (such as user name is not matched, so server may choose to reject the creds). You can also check other places in Spark and Hadoop, they all use createProxyUser instead of hacking like transferCredentials.

What this change do is, just replace the transferCredentials with createProxyUser, so the creds are matched with the real user, beside, it will not lost any creds stored in UGI.subject.getPrivateCredentials (including the creds transfered by transferCredentials). So, it can only increase the possibility that a target RPC server accept our UGI, i.e. successful authentication. So, it will have no impacts for current working well spark jobs.

For more, see createProxyUser:
https://github.com/apache/hadoop/blob/5e0eda5d5f696aba7fc209874d232baf2a50d547/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1442-L1451

For the

def getCurrentUserName(): String = {
Option(System.getenv("SPARK_USER"))
.getOrElse(UserGroupInformation.getCurrentUser().getShortUserName())
}

, I agree it is not safe to change to use getUserName.
So I keep it to use getShortUserName, pls check the change.

For this,

private def setRMPrincipal(sparkConf: SparkConf): Unit = {
val shortUserName = UserGroupInformation.getCurrentUser.getShortUserName
val key = s"spark.hadoop.${YarnConfiguration.RM_PRINCIPAL}"
logInfo(s"Setting ${key} to ${shortUserName}")
sparkConf.set(key, shortUserName)
}

It has nothing related to current creds lost issue, but it should be a bug in Spark. Because RMPrincipal should be set to the fully qualified name, i.e. getUserName instead of getShortUserName . Pls check hadoop related codes and usage. Again, a fully qualified name, will only increase successful authentication. However, if you think it is not suitable or safe in current PR, I can remove it :)

The only change, that may impact existing spark jobs, is:
Someone previous set SPARK_USER as a fully qualified name (such as yqwang-ms/yy@zz.com), now will get the short UserName from getCurrentUserName. However, for previous spark, if no SPARK_USER is provided, we also return UserGroupInformation.getCurrentUser().getShortUserName(). So, let's align the getCurrentUserName to return short UserName, and I think this is an inevitable change and impact.

Any suggestions?

@yqwang-ms
Copy link
Author

@tgravescs PTAL :)

@yqwang-ms
Copy link
Author

@tgravescs For safety, I also kept the orignal cloned tokens, see 87433b4
PTAL

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 21, 2020
@github-actions github-actions bot closed this Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants