[datafusion] Allow callers to skip default-database init in register_catalog#345
Conversation
`SQLContext::register_catalog` always probes/creates a "default" database
and `set_current_database("default")` on the first catalog. This breaks
for non-admin principals who lack DESCRIBE / CREATEDATABASE on `default`:
they can never construct a `SQLContext` even when their actual queries
target databases they do have access to.
Add `register_catalog_with_default_db(catalog_name, catalog, default_db)`
where `default_db: Option<&str>`:
- `Some(name)` — ensure `name` exists (create if missing), set as current
on first registration. Same behavior as before, just configurable.
- `None` — skip the default-database probe/create AND the implicit
`set_current_database`. Callers are expected to use fully-qualified
names or call `set_current_database` later.
This mirrors Java `FlinkCatalog`'s `defaultDatabase` constructor
parameter and `DISABLE_CREATE_TABLE_IN_DEFAULT_DB=true` option.
Existing `register_catalog(name, catalog)` delegates with
`Some("default")` — fully backwards compatible.
Python binding exposes the same control via a new `default_database`
keyword. Convention chosen to keep `None`/omitted as the historical
behavior:
- omitted / `None` -> Some("default") (back-compat)
- `default_database=""` -> None (skip init)
- `default_database=X` -> Some(X) (custom default)
Squash the wall-of-text doc blocks down to the load-bearing lines. No code change.
ProbeTrackingCatalog records get/create_database calls and returns
Unsupported (not DatabaseNotExist) from get_database — so failing to
skip the probe surfaces as a hard error rather than the silent
create-on-missing path the existing MockCatalog allows.
Three cases:
- None -> neither get nor create_database is called; registration succeeds
- Some -> Unsupported error from get_database propagates
- bare register_catalog() -> delegates with Some("default") (probes)
jerry-024
left a comment
There was a problem hiding this comment.
LGTM — clean design, backward-compatible, and solves a real pain point for non-admin principals.
A few optional suggestions (none blocking):
-
Document table-function behavior when
default_db = None(sql_context.rs:163)
register_table_functions(&self.ctx, &catalog, default_db.unwrap_or("default"))still resolves bare table names against"default". Consider adding a note in the doc comment: "Whendefault_db = None, table functions (vector_search,full_text_search) require fully-qualified table names." -
Guard against
Some("")at the Rust API level
The Python binding handles empty-string → None conversion, but other Rust callers could passSome("")directly, which would callget_database("")/create_database(""). A one-liner validation would make the API more robust:if matches!(default_db, Some("")) { return Err(DataFusionError::Plan("default_db must not be empty".into())); }
-
Optional test: cover
default_db = None+ bare table function call (e.g.vector_search('unqualified', ...)) to document the expected error path.
All three are nit-level — happy to see them addressed here or in a follow-up.
…ack doc, integration test
Three optional review comments, all addressed:
1. Doc note on built-in TVF behavior with default_db=None: bare table
names inside vector_search / full_text_search still resolve against
the literal "default" namespace (the fallback in register_table_functions).
Callers using None must qualify table names in those TVF calls.
2. API-level guard against Some(""). Python binding already maps "" → None
to express "skip init", but raw Rust callers could pass Some("") directly
and silently probe get_database(""). Reject at the boundary with a Plan
error telling them to use None instead.
3. New integration-shaped test
`register_catalog_with_none_table_function_resolves_bare_name_to_literal_default`
documents the expected error path: bare TVF name + default_db=None →
surfaces an error naming `default.<bare_name>` (proving the fallback fired).
Plus a unit test for the empty-string guard
(`register_catalog_with_some_empty_string_is_rejected`).
5 tests total now, all green.
Thanks for the review! All three nits addressed in c3971e6: #1 Doc note added on register_catalog_with_default_db re: the TVF fallback: bare names in vector_search / full_text_search still resolve against literal "default" #2 Added the Some("") guard at the top of register_catalog_with_default_db — returns DataFusionError::Plan("default_db must not be empty; pass None to skip default-database init"). Doesn't affect the Python binding (already maps "" → None there); this is purely a footgun guard for raw Rust callers. New unit test #3 Added register_catalog_with_none_table_function_resolves_bare_name_to_literal_default — registers with None, runs SELECT * FROM vector_search('bare', ...), asserts the error names both default and bare (proving the fallback namespace fired against the MockCatalog's TableNotExist). Test count: 3 → 5, all green (cargo test -p paimon-datafusion --lib register_catalog). cargo check -p paimon-datafusion -p pypaimon_rust clean. cargo fmt applied. |
Purpose
SQLContext::register_catalogalways probes/creates a"default"database and callsset_current_database("default")on the first catalog. This breaks for non-admin principals who lackDESCRIBE/CREATEDATABASEondefault: they cannot construct aSQLContextat all, even when their queries target databases they do have access to (via fully-qualifiedcatalog.db.tablenames). The error surfaces at session init as e.g.Forbidden: ... DESCRIBE on DATABASE default.This change mirrors Java
FlinkCatalog'sdefaultDatabaseconstructor parameter andDISABLE_CREATE_TABLE_IN_DEFAULT_DBoption in Rust.Brief change log
SQLContext::register_catalog_with_default_db(name, catalog, default_db: Option<&str>):Some(name)ensuresnameexists (creates if missing) and sets it as current on the first registered catalog;Noneskips both theget_databaseprobe and the implicitset_current_database— callers are expected to use fully-qualified table namesor call
set_current_databaselater.register_catalog(name, catalog)now delegates withSome("default")— fully backwards-compatible.pypaimon_rust): newdefault_databasekeyword onSQLContext.register_catalog. Convention chosen so omitting the kwarg keeps the historical behavior: omitted /None→Some("default")(back-compat);""→None(skip init);"name"→Some("name")(custom default).Tests
Added
ProbeTrackingCatalogwhich countsget_database/create_databasecalls and returnsError::Unsupported(notDatabaseNotExist) fromget_database, so failingto skip the probe surfaces as a hard error — mimicking the Forbidden condition. Three unit tests:
register_catalog_with_none_skips_default_db_probe—Noneresults in 0 calls toget_database/create_database; registration succeeds; current catalog is set on thesession.
register_catalog_with_some_default_propagates_probe_error—Some("default")runs the probe once and surfaces the underlying error.register_catalog_default_wrapper_uses_default_db— bareregister_catalog()still delegates withSome("default")(back-compat).All existing tests (175 in
paimon-datafusion) continue to pass.cargo check -p paimon-datafusion -p pypaimon_rustis clean.API and Format
New public method
SQLContext::register_catalog_with_default_db. ExistingSQLContext::register_catalogsignature and behavior unchanged. PythonSQLContext.register_cataloggains an optionaldefault_databasekwarg; existing call sites are unaffected. No storage-format changes.Documentation
Rustdoc on both methods explains the
default_dbsemantics and the non-admin-principal motivation. Python kwarg's docstring documents the three-value convention. No external docs update needed.