Skip to content

[KYUUBI #7433][SPARK] Preserve Hive delegation tokens with non-empty service#7518

Closed
Sunwoo-Shin wants to merge 1 commit into
apache:masterfrom
Sunwoo-Shin:kyuubi-7433-multi-hms-hive-token
Closed

[KYUUBI #7433][SPARK] Preserve Hive delegation tokens with non-empty service#7518
Sunwoo-Shin wants to merge 1 commit into
apache:masterfrom
Sunwoo-Shin:kyuubi-7433-multi-hms-hive-token

Conversation

@Sunwoo-Shin

Copy link
Copy Markdown
Contributor

Why are the changes needed?

Closes #7433.

SparkTBinaryFrontendService#addHiveToken keeps only the Hive delegation token whose service field is empty (the one HiveMetaStoreClient selects by default) and silently drops every Hive token whose service is non-empty.

A Hive delegation token gets a non-empty service when it is bound to a specific metastore via hive.metastore.token.signature (the signature is stored in the token service). When an engine talks to multiple Hive metastores that use different Kerberos principals — e.g. two Iceberg catalogs, each backed by its own HMS — each metastore produces its own signature-bound token. Because these tokens are dropped before reaching the engine UGI, the engine fails to authenticate against the non-default metastore with DIGEST-MD5: IO error acquiring password.

This is the engine-side counterpart of #1091 (renewing delegation tokens for multiple Hive metastore clusters): even when the server pushes per-metastore tokens, the engine drops them.

Affected versions: 1.11.1.

This change partitions the incoming Hive tokens by their service field:

  • Tokens with a non-empty service are added to the engine credentials keyed by their alias, reusing the same issue-date downgrade guard the default path already applies.
  • The existing single-metastore URI matching now runs only over the default (empty-service) tokens, so behavior for the common single-HMS case is unchanged.
  • The No matching Hive token found ... warning is emitted only when a default-service token was actually expected, to avoid noise in metastore deployments that rely solely on signature-bound tokens.

How was this patch tested?

Added unit tests in SparkTBinaryFrontendServiceSuite (the token-merging logic was extracted into mergeHiveTokens so it can be exercised without a SparkContext):

  • signature-bound tokens for multiple metastores are preserved, keyed by alias;
  • a signature-bound token with an earlier issue date is ignored, and a later one replaces the existing token;
  • the existing single-metastore matching for default-service tokens is unchanged;
  • signature-bound tokens are added without disturbing the default-service path.
build/mvn test -pl externals/kyuubi-spark-sql-engine -am \
  -Dtest=none \
  -DwildcardSuites=org.apache.kyuubi.engine.spark.SparkTBinaryFrontendServiceSuite

Was this patch authored or co-authored using generative AI tooling?

Assisted-by: Claude:claude-opus-4-8

…empty service

addHiveToken dropped Hive tokens whose service is non-empty, breaking auth
against non-default metastores in multi-HMS setups. Partition tokens on the
service field and add the signature-bound ones by alias.
@pan3793

pan3793 commented Jun 25, 2026

Copy link
Copy Markdown
Member

Code LGTM, but as you know, the Kerberos/UGI/DT part is always tricky stuff, have you tested it in a real cluster in addition to the UT?

@pan3793

pan3793 commented Jun 25, 2026

Copy link
Copy Markdown
Member

also cc @zhouyifan279 and @cxzl25, who have better knowledge of this part

@Sunwoo-Shin

Copy link
Copy Markdown
Contributor Author

Thanks for the review @pan3793!

Yes — besides the UTs, this has been running on a production Kerberized cluster for a while now.

For context on our setup: the built-in HiveDelegationTokenProvider only supports a single Hive metastore, so we additionally implemented a custom provider through the DelegationTokenProvider SPI that fetches delegation tokens from multiple Hive metastores. Combined with this patch — which preserves the tokens carrying a non-empty service field — the per-metastore (signature-bound) tokens are now propagated correctly to the executors, while the single-metastore path keeps working as before.

@pan3793 pan3793 added this to the v1.12.0 milestone Jun 29, 2026
@pan3793 pan3793 closed this in 96f745e Jun 29, 2026
@pan3793

pan3793 commented Jun 29, 2026

Copy link
Copy Markdown
Member

thanks, merged to master

@Sunwoo-Shin Sunwoo-Shin deleted the kyuubi-7433-multi-hms-hive-token branch June 29, 2026 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] SparkTBinaryFrontendService#addHiveToken silently drops Hive delegation tokens with non-empty service field (multi-HMS scenario)

3 participants