-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31551][CORE] Fix createSparkUser lost user's non-Hadoop credentials #28323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to also change this to be getUserName()?
@tgravescs @sryza @vanzin Related Contributors, could you please take a look at this? Thanks! |
it's been a long time since I looked at this so I will have to go refresh my memory but I don't think using getUserName works in all cases. What kind of testing have you done on this? Note the question is How was this patch tested, so can you please give more details on unit tests and manual testing done. In this case we need to have tested in various secure environments. |
Thanks for the review :) For testing, until now, I have did:
And we can also do a little reasoning, to prove that this change does not introduce more issue than current spark (given the code change is small).
The transferCredentials func can only transfer Hadoop creds such as Delegation Tokens. However, other creds stored in UGI.subject.getPrivateCredentials, will be lost here, such as:
BTW, if we use SPARK_USER to be the effective user, and UserGroupInformation.getCurrentUser as real user (to impersonate the effective user), we should use createProxyUser, instead of, directly transferCredentials from UserGroupInformation.getCurrentUser to SPARK_USER. This is because the transfered creds may not match with the SPARK_USER (such as user name is not matched, so server may choose to reject the creds). You can also check other places in Spark and Hadoop, they all use createProxyUser instead of hacking like transferCredentials. What this change do is, just replace the transferCredentials with createProxyUser, so the creds are matched with the real user, beside, it will not lost any creds stored in UGI.subject.getPrivateCredentials (including the creds transfered by transferCredentials). So, it can only increase the possibility that a target RPC server accept our UGI, i.e. successful authentication. So, it will have no impacts for current working well spark jobs. For more, see createProxyUser: For the spark/core/src/main/scala/org/apache/spark/util/Utils.scala Lines 2410 to 2413 in 410fa91
, I agree it is not safe to change to use getUserName. So I keep it to use getShortUserName, pls check the change. For this, spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala Lines 844 to 849 in 410fa91
It has nothing related to current creds lost issue, but it should be a bug in Spark. Because RMPrincipal should be set to the fully qualified name, i.e. getUserName instead of getShortUserName . Pls check hadoop related codes and usage. Again, a fully qualified name, will only increase successful authentication. However, if you think it is not suitable or safe in current PR, I can remove it :) The only change, that may impact existing spark jobs, is: Any suggestions? |
@tgravescs PTAL :) |
@tgravescs For safety, I also kept the orignal cloned tokens, see 87433b4 |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Using createProxyUser to avoid lost any subject creds.
Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-31551
Does this PR introduce any user-facing change?
Yes, after this change, the UGI, including all its creds, provided by current user will be used to authn.
How was this patch tested?
Yes