[python] [daft] Inject REST catalog OSS credentials into Daft IOConfig#7959
Merged
Conversation
Adds DaftRestReadTest in daft_catalog_rest_test.py with two failing
tests pinpointing the two daft + REST integration bugs:
1. test_read_table_forwards_full_catalog_options_to_datasource
_read_table strips catalog_options down to just {'warehouse': ...}
before constructing PaimonDataSource, dropping metastore/uri/token
and other REST-required fields.
2. test_rest_catalog_disables_native_parquet_reader
PaimonDataSource leaves _is_parquet=True for REST catalogs, sending
OSS reads through Daft's static IOConfig which cannot refresh DLF
dynamic STS tokens.
The two tests are expected to FAIL on this commit; a follow-up commit
will add the fixes and turn them green.
Daft's native parquet reader binds OSS credentials into a static
IOConfig at PaimonDataSource construction time. REST catalogs (DLF)
issue per-request OSS STS tokens that get refreshed by pypaimon's
file_io but cannot be plumbed back into Daft's IOConfig, so reads
fail with 401 once the initial token expires.
Until Daft exposes a token-refresh callback (IOConfigProvider or
similar), force REST catalogs through pypaimon's reader, which
shares the catalog's token-aware file_io.
Also stop _read_table from trimming catalog_options to just
{'warehouse': ...} — PaimonDataSource needs the metastore field to
make this fall-back decision.
Replace the earlier reader-fallback approach with credential injection. Previously REST catalogs were forced through pypaimon's reader because Daft's IOConfig had no DLF credentials; now _read_table extracts the OSS STS token from table.file_io and merges it into the options used to build IOConfig, so Daft's native parquet path works directly.
…ich_options_with_rest_token Address review feedback: collapse the 3-line WHY comment to a single line, and add three unit tests covering the no-op paths — non-REST metastore, file_io without try_to_refresh_token, and refreshed token still None.
… enrichment OSS reads through Daft's native parquet path need the IOConfig builder to (1) route to the OSS backend and (2) extract the bucket; both decisions key off the warehouse field. DLF's warehouse is a catalog name with no scheme, so the builder silently fell into the S3-compatible branch and OpenDAL complained "The bucket is misconfigured". Synthesize warehouse from table.table_path's scheme + netloc (e.g. "oss://<bucket>") so the builder picks up OSS and the bucket. Verified end-to-end against real DLF: a 4-row parquet table writes and reads back via Daft.
Contributor
|
+1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Daft reads against an Apache Paimon REST catalog (DLF) currently 401 on OSS:
_convert_paimon_catalog_options_to_io_configbuilds Daft'sIOConfigfrom staticfs.oss.*keys, but DLF delivers per-table STS tokens throughtable.file_iorather thancatalog_options._read_tablenow foldstable.file_io.token.tokeninto the options before buildingIOConfig, so Daft's native parquet reader gets the same credentials pypaimon uses. It also stops trimmingcatalog_optionsto{warehouse: ...}so the enrichment can read themetastorefield.Tests
daft_catalog_rest_test.py::DaftRestReadTest:test_read_table_forwards_full_catalog_options_to_datasourcetest_read_table_enriches_io_config_with_rest_token