Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . #25201

Closed
wants to merge 22 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Jul 19, 2019

What changes were proposed in this pull request?

#25179
Since origin SparkThriftServer can't proxy client user's authentication about hive and HDFS authentications.
this PR is to enable SparkThriftServer can proxy user's authorities.

For this pr, we first obtain HDFS token for current UGI, the it can truly proxy HDFS priority. Then for each proxy user, when we create hiveClient, we obtain a Hive token for proxy user. Different user use different SparkSession.sharedState.
Then pass DFS toke to each Task, to proxy user's DFS behavior in Executors.

How was this patch tested?

manual test

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-28419][SQL] Enable STS support proxy user's authentication . [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . Jul 24, 2019
@AngersZhuuuu
Copy link
Contributor Author

@juliuszsompolski @wangyum Can you give some advise?

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Aug 7, 2019

Test build #108773 has finished for PR 25201 at commit b414f4c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108813 has finished for PR 25201 at commit f4bf959.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108815 has finished for PR 25201 at commit 989b268.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108816 has finished for PR 25201 at commit 1408e64.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108819 has finished for PR 25201 at commit 4b4bf13.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108821 has finished for PR 25201 at commit 07522c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108828 has finished for PR 25201 at commit 5135706.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108830 has finished for PR 25201 at commit 76c563d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108831 has finished for PR 25201 at commit dd5ce26.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108839 has finished for PR 25201 at commit 997f141.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108854 has finished for PR 25201 at commit 750e5be.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108856 has finished for PR 25201 at commit bc7ee9e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108859 has finished for PR 25201 at commit 76cd624.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108948 has finished for PR 25201 at commit ac87ffc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@gatorsmile
In my pr, I will create a HiveClientImple for each user to let them have their own Hive privilege.
But all old Unit Test is to start a local mode hive metastore, it conflict to my method. cause all SparkThriftServer Unit test failed

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108966 has finished for PR 25201 at commit 3f273a6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108967 has finished for PR 25201 at commit 20eb896.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108975 has finished for PR 25201 at commit 424dca3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@dongjoon-hyun @gatorsmile Can you help to review and give some advise about how to add Unit test

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

val existing = ugi.getCredentials()
existing.mergeAll(originalCreds)
ugi.addCredentials(existing)
sparkSqlOperationManager.sessionToTokens.put(session.getSessionHandle, tokens)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AngersZhuuuu, while applying your patch I found a bug here, session.getSessionHandle will have an exception because session is null.

Then I moved session = HiveSessionProxy.getProxy(sessionWithUGI, sessionWithUGI.getSessionUgi) to before the if statement will solve this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AngersZhuuuu, while applying your patch I found a bug here, session.getSessionHandle will have an exception because session is null.

Then I moved session = HiveSessionProxy.getProxy(sessionWithUGI, sessionWithUGI.getSessionUgi) to before the if statement will solve this issue.

yea, some mistake, here is just a way to implement this. If you have interesting in this way , you can see this
spark-thriftserver/spark-thriftserver#53

@github-actions
Copy link

github-actions bot commented Jun 1, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 1, 2020
@github-actions github-actions bot closed this Jun 2, 2020
@Lucusone
Copy link

this PR in non kerberos environment, work fine. but in kerberos environment cannot get the hdfs token!

@AngersZhuuuu
Copy link
Contributor Author

this PR in non kerberos environment, work fine. but in kerberos environment cannot get the hdfs token!

To be honest, I have never test this in non-kerberos environment since I don't have this env.
I test this in kerberos environment and it can work well. Can you show how you use it and what's the error behavior?
Fetch delegation token form HDFS won't be a blocking thing. If you are Chinese you can connect me with my gmail

@Lucusone
Copy link

ok

@Lucusone
Copy link

this pr is good, I test this pr in spark2.4.3 envrionment, this pr is base on spark3. get hdfs token is different.

@sujith71955
Copy link
Contributor

sujith71955 commented Feb 21, 2022

cc @AngersZhuuuu @gatorsmile @dongjoon-hyun @vinodkc
Can we re-open this PR since lack of impersonation is blocking the use-cases where user wants to execute queries as current users.

@sujith71955
Copy link
Contributor

sujith71955 commented Mar 19, 2022

@AngersZhuuuu We tested this patch with spark 3.x version in kerberos environment and looks good. Would be great if we can get review done from experts. Thanks

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu We tested this patch with spark 3.x version in kerberos environment and looks good. Would be great if we can get review done from experts. Thanks

Your env with kerberos?

@weand
Copy link

weand commented Apr 5, 2022

👍 to get the PR reopened or get the reasons for why not ?

@AnhQuanTran
Copy link

@AngersZhuuuu i'm facing the same this problem on spark 3.2.2. Can u tell me how to fix it. Thank you

https://stackoverflow.com/questions/73984517/spark-thrift-3-2-2-impersonate-user-facing-error-with-metastore-authen-sasl-neg

@goutam-git
Copy link

@AngersZhuuuu can you please provide the steps as how you have tested this patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet