feat(datafusion): auto-register built-in table functions on catalog registration#324
Merged
Merged
Conversation
Add `SQLContext.register_table_function(name, default_database=None)` to the Python binding so Paimon table-valued functions can be registered from Python — the binding previously had no way to reach `register_udtf`. A single dispatch method keeps the API surface stable: it currently supports `vector_search` and `full_text_search`, and the same `match` will pick up `referenced_files_size` / `physical_files_size` once those land, without changing the Python signature. The function binds to the current catalog. So the binding can obtain that catalog without keeping a duplicate handle of its own, `SQLContext::current_catalog` is made public. The binding also enables the `fulltext` feature so `register_full_text_search` is available. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add tests for `SQLContext.register_table_function`: - vector_search / full_text_search register without error - the optional default_database keyword is accepted - an unknown function name raises a clear error - calling it before any catalog is registered raises Registration alone touches neither the Lumina nor Tantivy runtime, so these tests are deterministic and need no index fixtures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JingsongLi
requested changes
May 18, 2026
Contributor
JingsongLi
left a comment
There was a problem hiding this comment.
We should register it in Catalog by default in Rust. This is a legacy work from before. Can you modify it?
Per review: register the built-in table-valued functions in Rust by default when a catalog is registered, instead of exposing an explicit register_table_function method on the Python binding. SQLContext::register_catalog now registers vector_search, full_text_search, referenced_files_size and physical_files_size against the catalog being registered, so every SQLContext user gets them with no extra call. The Python register_table_function method and the SQLContext::current_catalog visibility change are reverted; the binding keeps the fulltext feature so full_text_search compiles in. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The register_full_text_search call fits within the line width on a single line; rustfmt rejected the wrapped form. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Upstream apache#325 converted referenced_files_size / physical_files_size from table functions to system tables, so they no longer have register_* functions. register_catalog now auto-registers only the remaining UDTFs — vector_search and full_text_search. The binding test is reworked accordingly: it verifies the two UDTFs are registered by triggering their own argument-count validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JingsongLi
reviewed
May 19, 2026
Per review: pull the inline built-in table-function registration in register_catalog into a dedicated function. It is the single place that knows the built-in table functions — new ones are added there. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Vector Search / Full-Text Search registration sections still told readers to call register_* manually. With a SQLContext that is now automatic on register_catalog; the explicit call is only needed with a raw SessionContext. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #xxx
The
vector_searchandfull_text_searchtable-valued functions could only be made available by manually calling theirregister_*function on aSessionContext.SQLContextusers (including the Python binding / pypaimon) had no way to reachregister_udtf, so these functions were effectively unusable throughSQLContext.This registers them automatically, in Rust, when a catalog is registered, so no caller-side setup is needed.
Brief change log
SQLContext::register_catalognow registersvector_searchandfull_text_searchagainst the catalog being registered. AnySQLContextwith a catalog gets them with no extra call.full_text_searchregistration is#[cfg(feature = "fulltext")]-gated, so builds without the feature are unaffected.bindings/python/Cargo.toml) enables thefulltextfeature onpaimon-datafusionsofull_text_searchis compiled into the binding.Note:
referenced_files_size/physical_files_sizeare intentionally not covered — they are system tables (table$referenced_files_size), not UDTFs, so they need no registration.Tests
bindings/python/tests/test_datafusion.py—test_table_functions_registered_with_catalog: afterregister_catalog, callingvector_search/full_text_searchwith the wrong argument count surfaces each function's own validation error, proving it is registered (an unregistered name would instead fail with "table functionnot found").
API and Format
SQLContext::register_catalognow also registers thevector_searchandfull_text_searchtable functions. No new public API, no signature change.paimon-datafusion/fulltext(adds the pure-Rusttantivydependency).Documentation
docs/src/sql.mdis updated: the Vector Search / Full-Text Search registration sections now state that aSQLContextregisters these functions automatically when a catalog is registered, and that the explicitregister_*call is only needed with a rawSessionContext.