[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs by srielau · Pull Request #55915 · apache/spark

srielau · 2026-05-15T22:51:16Z

What changes were proposed in this pull request?

Renders a structured DESCRIBE FUNCTION [EXTENDED] output for SQL user-defined functions (temporary and persistent) in place of the generic Function / Class / Usage:<json blob> dump that DescribeFunctionCommand produces today for any function whose ExpressionInfo.className != null.

For SQL UDFs the output becomes:

Function: qualified name
Type: SCALAR or TABLE
Input: parameter list (name + SQL type, column-aligned; DEFAULT <expr> and 'comment' annotations are added in EXTENDED mode)
Returns: scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode)
EXTENDED only: Comment, Collation, Deterministic, Data Access (CONTAINS SQL / READS SQL DATA), Configs, Owner, Create Time, Body, and SQL Path.

SQL Path: is emitted only when both spark.sql.path.enabled = true and a frozen path was persisted on the function at CREATE FUNCTION time (SPARK-56639 / SPARK-56520). The path is read from the function's function.resolutionPath property and rendered through SqlPathFormat.formatForDisplay, producing the same `catalog`.`namespace` format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, not the invoker's current PATH.

Behavior for builtin functions and non-SQL UDFs is unchanged.

Class hierarchy / dispatch:

SQLFunction (catalyst): adds the SCALAR / TABLE constants and a new fromExpressionInfo(info, parser) constructor that reconstructs a SQLFunction from the JSON usage blob produced by toExpressionInfo. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs.
DescribeFunctionCommand (sql/core): when SQLFunction.isSQLFunction(info.getClassName) is true, dispatches to a new describeSQLFunction(info, parser) helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline through SqlPathFormat; the temporary DescribeFunctionCommandUtils helper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed by describeSQLFunction).
SessionCatalog.registerFunction: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cached ExpressionInfo was always built via makeExprInfoForHiveFunction, which sets usage = null. That worked for the pre-existing DESCRIBE FUNCTION codepath (which doesn't read usage), but breaks the new describeSQLFunction path: after a SQL UDF has been invoked once, DESCRIBE FUNCTION reads back the cached info and SQLFunction.fromExpressionInfo cannot parse null. registerFunction now branches on funcDefinition.isUserDefinedFunction and builds the structured ExpressionInfo via UserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfo for SQL UDFs (matching the lookup-side build in lookupPersistentFunction), so the cached info has the right usage blob for DESCRIBE.

Why are the changes needed?

DESCRIBE FUNCTION is intended to give users a human-readable description of a routine, analogous to DESCRIBE TABLE for tables. For SQL UDFs the current output instead exposes the internal serialization format:

> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE","sqlFunction.returnType":"DOUBLE","sqlFunction.expression":"width * height","sqlFunction.isTableFunc":"false",...}
 Extended Usage:

That JSON blob is not part of any public surface, and the literal string sqlFunction. for Class: is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized in ExpressionInfo; this PR just formats it.

Does this PR introduce any user-facing change?

Yes — the rows returned by DESCRIBE FUNCTION [EXTENDED] <sql_udf> change.

Before:

> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE", ...}
 Extended Usage:

After (simple case):

> DESCRIBE FUNCTION EXTENDED area;
 Function:      default.area
 Type:          SCALAR
 Input:         width  DOUBLE 'width'
                height DOUBLE 'height'
 Returns:       DOUBLE
 Comment:       compute area
 Deterministic: true
 Data Access:   CONTAINS SQL
 Owner:         <owner>
 Create Time:   <timestamp>
 Body:          width * height

After (function created under spark.sql.path.enabled = true with a non-default PATH at CREATE time):

> SET spark.sql.path.enabled = true;
> SET PATH = spark_catalog.path_func_db_a, system.builtin;
> CREATE FUNCTION frozen_fn() RETURNS INT RETURN (SELECT MAX(id) FROM frozen_t);
> SET PATH = spark_catalog.path_func_db_b, system.builtin;
> DESCRIBE FUNCTION EXTENDED default.frozen_fn;
 Function:      default.frozen_fn
 Type:          SCALAR
 Input:         ()
 Returns:       INT
 ...
 Body:          (SELECT MAX(id) FROM frozen_t)
 SQL Path:      `spark_catalog`.`path_func_db_a`, `system`.`builtin`

SQL Path reflects the creator's frozen PATH, not the session's current PATH at describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged.

How was this patch tested?

Added four unit tests to SQLFunctionSuite (sql/core):

describe SQL scalar functions — temporary and persistent scalar UDFs with comments, defaults, and EXTENDED mode. Asserts Function, Type, Input (column-aligned, with DEFAULT and 'comment' in extended mode), Returns, Deterministic, Data Access, Comment, Create Time, Body.
describe SQL table functions — table UDFs with explicit return columns; asserts Type: TABLE, Returns columns, and the EXTENDED-only fields.
describe SQL functions with derived routine characteristics — checks that Deterministic and Data Access reflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved.
The existing SPARK-56639: SQL function uses frozen SQL path test is extended: after switching PATH to a different namespace it invokes default.frozen_fn (populating the function-registry cache) and then runs DESCRIBE FUNCTION EXTENDED default.frozen_fn, asserting the SQL Path: row shows the creator's frozen path (`spark_catalog`.`path_func_db_a`, `system`.`builtin`) and does not mention the invoker's current path namespace. This extension also exercises the SessionCatalog.registerFunction fix above: prior to the fix, the DESCRIBE after the invocation hit CORRUPTED_CATALOG_FUNCTION because the cached ExpressionInfo had usage = null.

Each describe test uses checkKeywordsExist against DESCRIBE FUNCTION [EXTENDED] <name> output.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (claude-opus-4-7)

Renders structured DESCRIBE output for SQL user-defined functions instead of the generic Class/Usage dump: Function/Type/Input/Returns, and in EXTENDED mode Comment/Collation/Deterministic/Data Access/Configs/Owner/ Create Time/Body/SQL Path. Ports the formatter from the Databricks runtime. - SQLFunction: add SCALAR/TABLE constants and fromExpressionInfo for reconstructing the function from its ExpressionInfo usage blob (covers both temp and persistent SQL UDFs). - DescribeFunctionCommand: dispatch to describeSQLFunction when the className matches SQLFunction.isSQLFunction; inline the SQL PATH display via SqlPathFormat (replaces DescribeFunctionCommandUtils). - SQLFunctionSuite: port describe tests for scalar/table SQL UDFs and derived routine characteristics.

cloud-fan

Thanks for the cleanup — the new structured output is a real improvement over the JSON blob, and folding the SQL Path helper back into describeSQLFunction is a nice simplification.

Summary

Prior state. DESCRIBE FUNCTION [EXTENDED] <sql_udf> dumps Class: sqlFunction. and the internal Usage: JSON blob, which is unusable for end users.
Design approach. Dispatch on SQLFunction.isSQLFunction(info.getClassName) to a new describeSQLFunction helper that reconstructs the SQLFunction from ExpressionInfo.usage via a new SQLFunction.fromExpressionInfo (mirror of fromCatalogFunction) and renders column-aligned key/value rows. Temp and persistent UDFs both flow through the same ExpressionInfo round trip, so a single rendering path handles both.
Key design decisions.
- Reconstruct from ExpressionInfo rather than the catalog, so the same path works for temp UDFs.
- Patch SessionCatalog.registerFunction for UDFs so the cached ExpressionInfo carries the structured usage blob; without this the new fromExpressionInfo would fail after first invocation of a persistent UDF.
- Delete DescribeFunctionCommandUtils — its responsibility is absorbed inline now that f.properties already carries function.resolutionPath.
Implementation sketch. Catalyst: SQLFunction.fromExpressionInfo (new) + SCALAR/TABLE constants; SessionCatalog.registerFunction branches on funcDefinition.isUserDefinedFunction. sql/core: DescribeFunctionCommand.describeSQLFunction + formatParameters + tabulate + append. Tests: new scalar/table/derived-characteristics tests, plus extending the SPARK-56639 test to invoke-then-describe (which exercises the SessionCatalog fix).

Main findings are concentrated in two areas: (a) Create Time: / Owner: are not actually persisted to the catalog, so they display incorrect values for persistent UDFs; (b) the catalog name is now silently dropped from the Function: row for SQL UDFs only. Plus a few smaller items inline.

align Function: qualification, tighten Create Time test - SQLFunction.toCatalogFunction now writes functionMetadataToProps so OWNER and CREATE_TIME survive a session restart; fromProps reads them back with backward-compatible defaults for older catalog payloads. - Extract SQLFunction.fromProps helper so fromCatalogFunction and fromExpressionInfo share a single SQLFunction-construction site. - SessionCatalog.registerFunction now accepts an optional ExpressionInfo; registerUserDefinedFunction's persistent path passes function.toExpressionInfo directly, skipping the toCatalogFunction -> fromCatalogFunction -> toExpressionInfo round trip and preserving the CREATE-time owner/createTime values. - DescribeFunctionCommand qualifies the Function: identifier through qualifyIdentifier for SQL UDFs too, so both paths render the catalog-qualified 3-part name consistently. - Fix two comment typos (functions.scala "into" -> "to"; SQLFunctionSuite "user specified" -> "user-specified"). - Tighten the describe-scalar-functions test: parse the rendered Create Time and assert it falls within a small window around the CREATE FUNCTION call (catches the regression where the cache build time leaked into the displayed timestamp).

cloud-fan · 2026-05-19T01:16:09Z

thanks, merging to master/4.x/4.2!

### What changes were proposed in this pull request? Renders a structured `DESCRIBE FUNCTION [EXTENDED]` output for SQL user-defined functions (temporary and persistent) in place of the generic `Function / Class / Usage:<json blob>` dump that `DescribeFunctionCommand` produces today for any function whose `ExpressionInfo.className != null`. For SQL UDFs the output becomes: - `Function:` qualified name - `Type:` `SCALAR` or `TABLE` - `Input:` parameter list (name + SQL type, column-aligned; `DEFAULT <expr>` and `'comment'` annotations are added in EXTENDED mode) - `Returns:` scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode) - EXTENDED only: `Comment`, `Collation`, `Deterministic`, `Data Access` (`CONTAINS SQL` / `READS SQL DATA`), `Configs`, `Owner`, `Create Time`, `Body`, and `SQL Path`. `SQL Path:` is emitted only when both `spark.sql.path.enabled = true` and a frozen path was persisted on the function at `CREATE FUNCTION` time (SPARK-56639 / SPARK-56520). The path is read from the function's `function.resolutionPath` property and rendered through `SqlPathFormat.formatForDisplay`, producing the same `` `catalog`.`namespace` `` format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, *not* the invoker's current `PATH`. Behavior for builtin functions and non-SQL UDFs is unchanged. Class hierarchy / dispatch: - `SQLFunction` (catalyst): adds the `SCALAR` / `TABLE` constants and a new `fromExpressionInfo(info, parser)` constructor that reconstructs a `SQLFunction` from the JSON usage blob produced by `toExpressionInfo`. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs. - `DescribeFunctionCommand` (sql/core): when `SQLFunction.isSQLFunction(info.getClassName)` is true, dispatches to a new `describeSQLFunction(info, parser)` helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline through `SqlPathFormat`; the temporary `DescribeFunctionCommandUtils` helper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed by `describeSQLFunction`). - `SessionCatalog.registerFunction`: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cached `ExpressionInfo` was always built via `makeExprInfoForHiveFunction`, which sets `usage = null`. That worked for the pre-existing `DESCRIBE FUNCTION` codepath (which doesn't read `usage`), but breaks the new `describeSQLFunction` path: after a SQL UDF has been invoked once, `DESCRIBE FUNCTION` reads back the cached info and `SQLFunction.fromExpressionInfo` cannot parse `null`. `registerFunction` now branches on `funcDefinition.isUserDefinedFunction` and builds the structured `ExpressionInfo` via `UserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfo` for SQL UDFs (matching the lookup-side build in `lookupPersistentFunction`), so the cached info has the right usage blob for DESCRIBE. ### Why are the changes needed? `DESCRIBE FUNCTION` is intended to give users a human-readable description of a routine, analogous to `DESCRIBE TABLE` for tables. For SQL UDFs the current output instead exposes the internal serialization format: ``` > DESCRIBE FUNCTION EXTENDED area; Function: default.area Class: sqlFunction. Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE","sqlFunction.returnType":"DOUBLE","sqlFunction.expression":"width * height","sqlFunction.isTableFunc":"false",...} Extended Usage: ``` That JSON blob is not part of any public surface, and the literal string `sqlFunction.` for `Class:` is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized in `ExpressionInfo`; this PR just formats it. ### Does this PR introduce _any_ user-facing change? Yes — the rows returned by `DESCRIBE FUNCTION [EXTENDED] <sql_udf>` change. Before: ``` > DESCRIBE FUNCTION EXTENDED area; Function: default.area Class: sqlFunction. Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE", ...} Extended Usage: ``` After (simple case): ``` > DESCRIBE FUNCTION EXTENDED area; Function: default.area Type: SCALAR Input: width DOUBLE 'width' height DOUBLE 'height' Returns: DOUBLE Comment: compute area Deterministic: true Data Access: CONTAINS SQL Owner: <owner> Create Time: <timestamp> Body: width * height ``` After (function created under `spark.sql.path.enabled = true` with a non-default PATH at CREATE time): ``` > SET spark.sql.path.enabled = true; > SET PATH = spark_catalog.path_func_db_a, system.builtin; > CREATE FUNCTION frozen_fn() RETURNS INT RETURN (SELECT MAX(id) FROM frozen_t); > SET PATH = spark_catalog.path_func_db_b, system.builtin; > DESCRIBE FUNCTION EXTENDED default.frozen_fn; Function: default.frozen_fn Type: SCALAR Input: () Returns: INT ... Body: (SELECT MAX(id) FROM frozen_t) SQL Path: `spark_catalog`.`path_func_db_a`, `system`.`builtin` ``` `SQL Path` reflects the creator's frozen PATH, not the session's current `PATH` at describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged. ### How was this patch tested? Added four unit tests to `SQLFunctionSuite` (sql/core): - `describe SQL scalar functions` — temporary and persistent scalar UDFs with comments, defaults, and `EXTENDED` mode. Asserts `Function`, `Type`, `Input` (column-aligned, with `DEFAULT` and `'comment'` in extended mode), `Returns`, `Deterministic`, `Data Access`, `Comment`, `Create Time`, `Body`. - `describe SQL table functions` — table UDFs with explicit return columns; asserts `Type: TABLE`, `Returns` columns, and the EXTENDED-only fields. - `describe SQL functions with derived routine characteristics` — checks that `Deterministic` and `Data Access` reflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved. - The existing `SPARK-56639: SQL function uses frozen SQL path` test is extended: after switching `PATH` to a different namespace it invokes `default.frozen_fn` (populating the function-registry cache) and then runs `DESCRIBE FUNCTION EXTENDED default.frozen_fn`, asserting the `SQL Path:` row shows the *creator's* frozen path (`` `spark_catalog`.`path_func_db_a`, `system`.`builtin` ``) and does *not* mention the invoker's current path namespace. This extension also exercises the `SessionCatalog.registerFunction` fix above: prior to the fix, the DESCRIBE after the invocation hit `CORRUPTED_CATALOG_FUNCTION` because the cached `ExpressionInfo` had `usage = null`. Each describe test uses `checkKeywordsExist` against `DESCRIBE FUNCTION [EXTENDED] <name>` output. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (claude-opus-4-7) Closes #55915 from srielau/SPARK-56883-describe-sql-udf. Authored-by: Serge Rielau <serge@rielau.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit cfed631) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

srielau force-pushed the SPARK-56883-describe-sql-udf branch 2 times, most recently from 4de3ec1 to d8fb332 Compare May 16, 2026 02:56

srielau force-pushed the SPARK-56883-describe-sql-udf branch from d8fb332 to 07e70c9 Compare May 18, 2026 07:45

cloud-fan reviewed May 18, 2026

View reviewed changes

srielau added 2 commits May 18, 2026 14:48

Empty commit to retrigger CI (workaround GitHub Actions hashFiles flake)

3ddba66

allisonwang-db approved these changes May 18, 2026

View reviewed changes

cloud-fan closed this in cfed631 May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915

[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915
srielau wants to merge 3 commits into
apache:masterfrom
srielau:SPARK-56883-describe-sql-udf

srielau commented May 15, 2026 •

edited

Loading

Uh oh!

cloud-fan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

srielau commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

srielau commented May 15, 2026 •

edited

Loading