Skip to content

Add DIGEST-MD5 SASL delegation token auth to HiveCatalog#3150

Draft
ShreyeshArangath wants to merge 3 commits intoapache:mainfrom
ShreyeshArangath:feat/add-delegation-token
Draft

Add DIGEST-MD5 SASL delegation token auth to HiveCatalog#3150
ShreyeshArangath wants to merge 3 commits intoapache:mainfrom
ShreyeshArangath:feat/add-delegation-token

Conversation

@ShreyeshArangath
Copy link

Rationale for this change

Enable PyIceberg's HiveCatalog to authenticate using DIGEST-MD5 SASL with delegation tokens from $HADOOP_TOKEN_FILE_LOCATION, which is the standard mechanism in secure Hadoop environments. This unblocks PyIceberg adoption in production clusters that don't use Kerberos directly

Summary

  • Add HiveAuthError exception for Hive-specific auth failures
  • Add hadoop_credentials module to parse HDTS binary token files
  • Add _DigestMD5SaslTransport to work around THRIFT-5926 (None initial response)
  • Support hive.metastore.authentication property (NONE/KERBEROS/DIGEST-MD5)
  • Add pure-sasl to hive extras in pyproject.toml
  • Backward compatible: existing kerberos_auth boolean still works

Closes #3145

Are these changes tested?

Unit tests

Are there any user-facing changes?

Yes, introduce DIGEST-MD5 SASAL delegation token support

ShreyeshArangath and others added 3 commits March 16, 2026 13:44
Enable PyIceberg's HiveCatalog to authenticate using DIGEST-MD5 SASL
with delegation tokens from $HADOOP_TOKEN_FILE_LOCATION, which is the
standard mechanism in secure Hadoop environments. This unblocks PyIceberg
adoption in production clusters that don't use Kerberos directly.

- Add HiveAuthError exception for Hive-specific auth failures
- Add hadoop_credentials module to parse HDTS binary token files
- Add _DigestMD5SaslTransport to work around THRIFT-5926 (None initial response)
- Support hive.metastore.authentication property (NONE/KERBEROS/DIGEST-MD5)
- Add pure-sasl to hive extras in pyproject.toml
- Backward compatible: existing kerberos_auth boolean still works

Closes apache#3145

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address all findings from code review:

Critical:
- Rewrite VInt decoder to match Java WritableUtils.readVLong exactly,
  using signed-byte interpretation and correct prefix/length semantics

High:
- Catch OSError (not just FileNotFoundError) when reading token file
- Reject unknown auth mechanisms with HiveAuthError instead of silently
  falling back to unauthenticated TBufferedTransport
- Replace monkey-patching sasl.process in _DigestMD5SaslTransport with
  a clean send_sasl_msg override (thread-safe, no shared state mutation)

Medium:
- Fix kerberos_service_name default from config key to actual value
- Wrap UnicodeDecodeError in HiveAuthError for invalid UTF-8 in tokens
- Rewrite VInt test encoder to match real Hadoop encoding format
- Fix dead kerberos backward-compat tests to actually exercise __init__

Low:
- Add upper bound to pure-sasl dependency (<1.0.0)
- Fix tmp_path typing from object to pathlib.Path
- Fix docs to say pure-sasl (pip package name) not puresasl

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support DIGEST-MD5 / delegation token authentication for HMS

1 participant