Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KSHC] Support Kerberized HMS in cluster mode w/o keytab #4560

Closed
wants to merge 8 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Mar 19, 2023

Why are the changes needed?

This PR aims to make Kyuubi Spark Hive Connector(KSHC) support kerberized HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a HadoopDelegationTokenProvider.

To enable access to an kerberized HMS using KSHC, the minimal configurations are

spark.sql.catalog.warm=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.warm.hive.metastore.uris=<thrift-uris>

then it's able to run federation query across metastores

SELECT * FROM spark_catalog.db1.tbl1 JOIN warm.db2.tbl2 ON ...

In addition, it allows disabling token renewal for each catalog explicitly

spark.sql.catalog.warm.delegation.token.renewal.enabled=false

The current implementation has some limitations:

the catalog configuration must be present on the Spark application bootstrap, which means the catalog configurations should be set in spark-defaults.conf or append as --conf like:

spark-[shell|submit] \
  --conf spark.sql.catalog.xxx=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
  --conf spark.sql.catalog.xxx.hive.abc=xyz

note: spark-sql may not work properly, because of some tricky code inside Spark ...

but does not work for dynamic registering through SET statement, e.g. SET spark.sql.catalog.xxx=

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

> (select count(*) from hive_2.mammut.test_7) union ( select count(*) from spark_catalog.test.test01 limit 1);
+-----------+
| count(1)  |
+-----------+
| 4         |
| 1         |
+-----------+
2 rows selected (8.378 seconds)
  • Run test locally before make a pull request

@codecov-commenter
Copy link

codecov-commenter commented Mar 20, 2023

Codecov Report

Merging #4560 (8511595) into master (acdfa6c) will decrease coverage by 0.03%.
The diff coverage is n/a.

❗ Current head 8511595 differs from pull request most recent head fe8cd0c. Consider uploading reports for the commit fe8cd0c to get more accurate results

@@             Coverage Diff              @@
##             master    #4560      +/-   ##
============================================
- Coverage     53.34%   53.31%   -0.03%     
  Complexity       13       13              
============================================
  Files           577      577              
  Lines         31546    31557      +11     
  Branches       4244     4244              
============================================
- Hits          16827    16825       -2     
- Misses        13135    13148      +13     
  Partials       1584     1584              

see 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793 pan3793 marked this pull request as ready for review March 22, 2023 15:51
@pan3793 pan3793 self-assigned this Mar 22, 2023
@pan3793 pan3793 changed the title [KSHC] Support Kerberos by implementing KyuubiHiveConnectorDelegationTokenProvider [KSHC] Support Kerberized HMS Mar 22, 2023
@pan3793
Copy link
Member Author

pan3793 commented Mar 22, 2023

cc @zhouyifan279


private def hiveConf(
sparkConf: SparkConf,
hadoopConf: Configuration,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hadoop configuration here seems to be shared by all hive clients, which means you share the same security key to open all the doors of different HMS-s?

Copy link
Member Author

@pan3793 pan3793 Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catalog hive2 could overwrite some dedicated Hadoop configurations by

spark.sql.catalog.hive2.<hadoop-conf-key>=<value>

What do you mean "security key"?

@pan3793
Copy link
Member Author

pan3793 commented Mar 23, 2023

@yaooqinn main changes after your last review, please take a look again

  • switch from RetryingMetaStoreClient to HiveMetaStoreClient
  • one token request failure does not block the subsequent token obtaining
  • allow to disable token renewal for each catalog explicitly

@pan3793 pan3793 added this to the v1.8.0 milestone Mar 23, 2023
@pan3793 pan3793 changed the title [KSHC] Support Kerberized HMS [KSHC] Support Kerberized HMS in cluster mode w/o keytab Mar 23, 2023
@zhouyifan279
Copy link
Contributor

Most user may not be familiar with hive.metastore.token.signature's usage.
Can we generate a hive.metastore.token.signature for each catalog by default?

@pan3793
Copy link
Member Author

pan3793 commented Mar 23, 2023

@zhouyifan279 good suggestion, I updated code to make it fallback to hive.metastore.uris when hive.metastore.token.signature is absent.

The change passes local tests w/ hive.metastore.token.signature and w/o hive.metastore.token.signature, PR description is updated as well.

@pan3793
Copy link
Member Author

pan3793 commented Mar 24, 2023

Thanks all, merged to master

@pan3793 pan3793 deleted the shc-token branch March 24, 2023 03:34
pan3793 added a commit to pan3793/kyuubi that referenced this pull request May 12, 2023
…o keytab

### _Why are the changes needed?_

This PR aims to make Kyuubi Spark Hive Connector(KSHC) support kerberized HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a `HadoopDelegationTokenProvider`.

To enable access to an kerberized HMS using KSHC, the minimal configurations are
```
spark.sql.catalog.warm=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.warm.hive.metastore.uris=<thrift-uris>
```
then it's able to run federation query across metastores
```
SELECT * FROM spark_catalog.db1.tbl1 JOIN warm.db2.tbl2 ON ...
```

In addition, it allows disabling token renewal for each catalog explicitly
```
spark.sql.catalog.warm.delegation.token.renewal.enabled=false
```

The current implementation has some limitations:

the catalog configuration must be present on the Spark application bootstrap, which means the catalog configurations should be set in `spark-defaults.conf` or append as `--conf` like:
```
spark-[sql|shell|submit] \
  --conf spark.sql.catalog.xxx=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
  --conf spark.sql.catalog.xxx.hive.abc=xyz
```

but does not work for dynamic registering through SET statement, e.g. `SET spark.sql.catalog.xxx=`

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

```
> (select count(*) from hive_2.mammut.test_7) union ( select count(*) from spark_catalog.test.test01 limit 1);
+-----------+
| count(1)  |
+-----------+
| 4         |
| 1         |
+-----------+
2 rows selected (8.378 seconds)
```

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4560 from pan3793/shc-token.

Closes apache#4560

fe8cd0c [Cheng Pan] Centralized metastore token signature fallback logic
8511595 [Cheng Pan] comments
fc3b4d5 [Cheng Pan] hive.metastore.token.signature fallback to hive.metastore.uris
fb7eb03 [Cheng Pan] unused import
858b390 [Cheng Pan] New catalog property delegation.token.renewal.enabled
28ec5a5 [Cheng Pan] disable hms client retry
52044d4 [Cheng Pan] update comments
33b2418 [Cheng Pan] [KSHC] Support Kerberos by implementing KyuubiHiveConnectorDelegationTokenProvider

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants