Skip to content

[python] [daft] Inject REST catalog OSS credentials into Daft IOConfig#7959

Merged
JingsongLi merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:daft_rest_fallback
May 26, 2026
Merged

[python] [daft] Inject REST catalog OSS credentials into Daft IOConfig#7959
JingsongLi merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:daft_rest_fallback

Conversation

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented May 25, 2026

Purpose

Daft reads against an Apache Paimon REST catalog (DLF) currently 401 on OSS: _convert_paimon_catalog_options_to_io_config builds Daft's IOConfig from static fs.oss.* keys, but DLF delivers per-table STS tokens through table.file_io rather than catalog_options.

_read_table now folds table.file_io.token.token into the options before building IOConfig, so Daft's native parquet reader gets the same credentials pypaimon uses. It also stops trimming catalog_options to {warehouse: ...} so the enrichment can read the
metastore field.

Tests

daft_catalog_rest_test.py::DaftRestReadTest:

  • test_read_table_forwards_full_catalog_options_to_datasource
  • test_read_table_enriches_io_config_with_rest_token

Adds DaftRestReadTest in daft_catalog_rest_test.py with two failing
tests pinpointing the two daft + REST integration bugs:

1. test_read_table_forwards_full_catalog_options_to_datasource
   _read_table strips catalog_options down to just {'warehouse': ...}
   before constructing PaimonDataSource, dropping metastore/uri/token
   and other REST-required fields.

2. test_rest_catalog_disables_native_parquet_reader
   PaimonDataSource leaves _is_parquet=True for REST catalogs, sending
   OSS reads through Daft's static IOConfig which cannot refresh DLF
   dynamic STS tokens.

The two tests are expected to FAIL on this commit; a follow-up commit
will add the fixes and turn them green.
Daft's native parquet reader binds OSS credentials into a static
IOConfig at PaimonDataSource construction time. REST catalogs (DLF)
issue per-request OSS STS tokens that get refreshed by pypaimon's
file_io but cannot be plumbed back into Daft's IOConfig, so reads
fail with 401 once the initial token expires.

Until Daft exposes a token-refresh callback (IOConfigProvider or
similar), force REST catalogs through pypaimon's reader, which
shares the catalog's token-aware file_io.

Also stop _read_table from trimming catalog_options to just
{'warehouse': ...} — PaimonDataSource needs the metastore field to
make this fall-back decision.
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] [daft] Fall back to pypaimon reader for REST catalog [python] [daft] Inject REST catalog OSS credentials into Daft IOConfig May 25, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review May 25, 2026 13:11
Replace the earlier reader-fallback approach with credential injection.
Previously REST catalogs were forced through pypaimon's reader because
Daft's IOConfig had no DLF credentials; now _read_table extracts the
OSS STS token from table.file_io and merges it into the options used
to build IOConfig, so Daft's native parquet path works directly.
…ich_options_with_rest_token

Address review feedback: collapse the 3-line WHY comment to a single
line, and add three unit tests covering the no-op paths — non-REST
metastore, file_io without try_to_refresh_token, and refreshed token
still None.
… enrichment

OSS reads through Daft's native parquet path need the IOConfig builder
to (1) route to the OSS backend and (2) extract the bucket; both decisions
key off the warehouse field. DLF's warehouse is a catalog name with no
scheme, so the builder silently fell into the S3-compatible branch and
OpenDAL complained "The bucket is misconfigured".

Synthesize warehouse from table.table_path's scheme + netloc (e.g.
"oss://<bucket>") so the builder picks up OSS and the bucket. Verified
end-to-end against real DLF: a 4-row parquet table writes and reads
back via Daft.
@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit ff3b281 into apache:master May 26, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants