Skip to content

[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915

Closed
srielau wants to merge 3 commits into
apache:masterfrom
srielau:SPARK-56883-describe-sql-udf
Closed

[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915
srielau wants to merge 3 commits into
apache:masterfrom
srielau:SPARK-56883-describe-sql-udf

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented May 15, 2026

What changes were proposed in this pull request?

Renders a structured DESCRIBE FUNCTION [EXTENDED] output for SQL user-defined functions (temporary and persistent) in place of the generic Function / Class / Usage:<json blob> dump that DescribeFunctionCommand produces today for any function whose ExpressionInfo.className != null.

For SQL UDFs the output becomes:

  • Function: qualified name
  • Type: SCALAR or TABLE
  • Input: parameter list (name + SQL type, column-aligned; DEFAULT <expr> and 'comment' annotations are added in EXTENDED mode)
  • Returns: scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode)
  • EXTENDED only: Comment, Collation, Deterministic, Data Access (CONTAINS SQL / READS SQL DATA), Configs, Owner, Create Time, Body, and SQL Path.

SQL Path: is emitted only when both spark.sql.path.enabled = true and a frozen path was persisted on the function at CREATE FUNCTION time (SPARK-56639 / SPARK-56520). The path is read from the function's function.resolutionPath property and rendered through SqlPathFormat.formatForDisplay, producing the same `catalog`.`namespace` format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, not the invoker's current PATH.

Behavior for builtin functions and non-SQL UDFs is unchanged.

Class hierarchy / dispatch:

  • SQLFunction (catalyst): adds the SCALAR / TABLE constants and a new fromExpressionInfo(info, parser) constructor that reconstructs a SQLFunction from the JSON usage blob produced by toExpressionInfo. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs.
  • DescribeFunctionCommand (sql/core): when SQLFunction.isSQLFunction(info.getClassName) is true, dispatches to a new describeSQLFunction(info, parser) helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline through SqlPathFormat; the temporary DescribeFunctionCommandUtils helper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed by describeSQLFunction).
  • SessionCatalog.registerFunction: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cached ExpressionInfo was always built via makeExprInfoForHiveFunction, which sets usage = null. That worked for the pre-existing DESCRIBE FUNCTION codepath (which doesn't read usage), but breaks the new describeSQLFunction path: after a SQL UDF has been invoked once, DESCRIBE FUNCTION reads back the cached info and SQLFunction.fromExpressionInfo cannot parse null. registerFunction now branches on funcDefinition.isUserDefinedFunction and builds the structured ExpressionInfo via UserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfo for SQL UDFs (matching the lookup-side build in lookupPersistentFunction), so the cached info has the right usage blob for DESCRIBE.

Why are the changes needed?

DESCRIBE FUNCTION is intended to give users a human-readable description of a routine, analogous to DESCRIBE TABLE for tables. For SQL UDFs the current output instead exposes the internal serialization format:

> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE","sqlFunction.returnType":"DOUBLE","sqlFunction.expression":"width * height","sqlFunction.isTableFunc":"false",...}
 Extended Usage:

That JSON blob is not part of any public surface, and the literal string sqlFunction. for Class: is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized in ExpressionInfo; this PR just formats it.

Does this PR introduce any user-facing change?

Yes — the rows returned by DESCRIBE FUNCTION [EXTENDED] <sql_udf> change.

Before:

> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE", ...}
 Extended Usage:

After (simple case):

> DESCRIBE FUNCTION EXTENDED area;
 Function:      default.area
 Type:          SCALAR
 Input:         width  DOUBLE 'width'
                height DOUBLE 'height'
 Returns:       DOUBLE
 Comment:       compute area
 Deterministic: true
 Data Access:   CONTAINS SQL
 Owner:         <owner>
 Create Time:   <timestamp>
 Body:          width * height

After (function created under spark.sql.path.enabled = true with a non-default PATH at CREATE time):

> SET spark.sql.path.enabled = true;
> SET PATH = spark_catalog.path_func_db_a, system.builtin;
> CREATE FUNCTION frozen_fn() RETURNS INT RETURN (SELECT MAX(id) FROM frozen_t);
> SET PATH = spark_catalog.path_func_db_b, system.builtin;
> DESCRIBE FUNCTION EXTENDED default.frozen_fn;
 Function:      default.frozen_fn
 Type:          SCALAR
 Input:         ()
 Returns:       INT
 ...
 Body:          (SELECT MAX(id) FROM frozen_t)
 SQL Path:      `spark_catalog`.`path_func_db_a`, `system`.`builtin`

SQL Path reflects the creator's frozen PATH, not the session's current PATH at describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged.

How was this patch tested?

Added four unit tests to SQLFunctionSuite (sql/core):

  • describe SQL scalar functions — temporary and persistent scalar UDFs with comments, defaults, and EXTENDED mode. Asserts Function, Type, Input (column-aligned, with DEFAULT and 'comment' in extended mode), Returns, Deterministic, Data Access, Comment, Create Time, Body.
  • describe SQL table functions — table UDFs with explicit return columns; asserts Type: TABLE, Returns columns, and the EXTENDED-only fields.
  • describe SQL functions with derived routine characteristics — checks that Deterministic and Data Access reflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved.
  • The existing SPARK-56639: SQL function uses frozen SQL path test is extended: after switching PATH to a different namespace it invokes default.frozen_fn (populating the function-registry cache) and then runs DESCRIBE FUNCTION EXTENDED default.frozen_fn, asserting the SQL Path: row shows the creator's frozen path (`spark_catalog`.`path_func_db_a`, `system`.`builtin`) and does not mention the invoker's current path namespace. This extension also exercises the SessionCatalog.registerFunction fix above: prior to the fix, the DESCRIBE after the invocation hit CORRUPTED_CATALOG_FUNCTION because the cached ExpressionInfo had usage = null.

Each describe test uses checkKeywordsExist against DESCRIBE FUNCTION [EXTENDED] <name> output.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (claude-opus-4-7)

@srielau srielau force-pushed the SPARK-56883-describe-sql-udf branch 2 times, most recently from 4de3ec1 to d8fb332 Compare May 16, 2026 02:56
Renders structured DESCRIBE output for SQL user-defined functions instead
of the generic Class/Usage dump: Function/Type/Input/Returns, and in
EXTENDED mode Comment/Collation/Deterministic/Data Access/Configs/Owner/
Create Time/Body/SQL Path. Ports the formatter from the Databricks
runtime.

- SQLFunction: add SCALAR/TABLE constants and fromExpressionInfo for
  reconstructing the function from its ExpressionInfo usage blob (covers
  both temp and persistent SQL UDFs).
- DescribeFunctionCommand: dispatch to describeSQLFunction when the
  className matches SQLFunction.isSQLFunction; inline the SQL PATH
  display via SqlPathFormat (replaces DescribeFunctionCommandUtils).
- SQLFunctionSuite: port describe tests for scalar/table SQL UDFs and
  derived routine characteristics.
@srielau srielau force-pushed the SPARK-56883-describe-sql-udf branch from d8fb332 to 07e70c9 Compare May 18, 2026 07:45
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup — the new structured output is a real improvement over the JSON blob, and folding the SQL Path helper back into describeSQLFunction is a nice simplification.

Summary

  • Prior state. DESCRIBE FUNCTION [EXTENDED] <sql_udf> dumps Class: sqlFunction. and the internal Usage: JSON blob, which is unusable for end users.
  • Design approach. Dispatch on SQLFunction.isSQLFunction(info.getClassName) to a new describeSQLFunction helper that reconstructs the SQLFunction from ExpressionInfo.usage via a new SQLFunction.fromExpressionInfo (mirror of fromCatalogFunction) and renders column-aligned key/value rows. Temp and persistent UDFs both flow through the same ExpressionInfo round trip, so a single rendering path handles both.
  • Key design decisions.
    • Reconstruct from ExpressionInfo rather than the catalog, so the same path works for temp UDFs.
    • Patch SessionCatalog.registerFunction for UDFs so the cached ExpressionInfo carries the structured usage blob; without this the new fromExpressionInfo would fail after first invocation of a persistent UDF.
    • Delete DescribeFunctionCommandUtils — its responsibility is absorbed inline now that f.properties already carries function.resolutionPath.
  • Implementation sketch. Catalyst: SQLFunction.fromExpressionInfo (new) + SCALAR/TABLE constants; SessionCatalog.registerFunction branches on funcDefinition.isUserDefinedFunction. sql/core: DescribeFunctionCommand.describeSQLFunction + formatParameters + tabulate + append. Tests: new scalar/table/derived-characteristics tests, plus extending the SPARK-56639 test to invoke-then-describe (which exercises the SessionCatalog fix).

Main findings are concentrated in two areas: (a) Create Time: / Owner: are not actually persisted to the catalog, so they display incorrect values for persistent UDFs; (b) the catalog name is now silently dropped from the Function: row for SQL UDFs only. Plus a few smaller items inline.

Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala Outdated
Comment thread sql/core/src/test/scala/org/apache/spark/sql/execution/SQLFunctionSuite.scala Outdated
srielau added 2 commits May 18, 2026 14:48
align Function: qualification, tighten Create Time test

- SQLFunction.toCatalogFunction now writes functionMetadataToProps so
  OWNER and CREATE_TIME survive a session restart; fromProps reads them
  back with backward-compatible defaults for older catalog payloads.
- Extract SQLFunction.fromProps helper so fromCatalogFunction and
  fromExpressionInfo share a single SQLFunction-construction site.
- SessionCatalog.registerFunction now accepts an optional ExpressionInfo;
  registerUserDefinedFunction's persistent path passes
  function.toExpressionInfo directly, skipping the
  toCatalogFunction -> fromCatalogFunction -> toExpressionInfo round trip
  and preserving the CREATE-time owner/createTime values.
- DescribeFunctionCommand qualifies the Function: identifier through
  qualifyIdentifier for SQL UDFs too, so both paths render the
  catalog-qualified 3-part name consistently.
- Fix two comment typos (functions.scala "into" -> "to";
  SQLFunctionSuite "user specified" -> "user-specified").
- Tighten the describe-scalar-functions test: parse the rendered
  Create Time and assert it falls within a small window around the
  CREATE FUNCTION call (catches the regression where the cache build
  time leaked into the displayed timestamp).
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in cfed631 May 19, 2026
cloud-fan pushed a commit that referenced this pull request May 19, 2026
### What changes were proposed in this pull request?

Renders a structured `DESCRIBE FUNCTION [EXTENDED]` output for SQL user-defined functions (temporary and persistent) in place of the generic `Function / Class / Usage:<json blob>` dump that `DescribeFunctionCommand` produces today for any function whose `ExpressionInfo.className != null`.

For SQL UDFs the output becomes:

- `Function:` qualified name
- `Type:` `SCALAR` or `TABLE`
- `Input:` parameter list (name + SQL type, column-aligned; `DEFAULT <expr>` and `'comment'` annotations are added in EXTENDED mode)
- `Returns:` scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode)
- EXTENDED only: `Comment`, `Collation`, `Deterministic`, `Data Access` (`CONTAINS SQL` / `READS SQL DATA`), `Configs`, `Owner`, `Create Time`, `Body`, and `SQL Path`.

`SQL Path:` is emitted only when both `spark.sql.path.enabled = true` and a frozen path was persisted on the function at `CREATE FUNCTION` time (SPARK-56639 / SPARK-56520). The path is read from the function's `function.resolutionPath` property and rendered through `SqlPathFormat.formatForDisplay`, producing the same `` `catalog`.`namespace` `` format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, *not* the invoker's current `PATH`.

Behavior for builtin functions and non-SQL UDFs is unchanged.

Class hierarchy / dispatch:

- `SQLFunction` (catalyst): adds the `SCALAR` / `TABLE` constants and a new `fromExpressionInfo(info, parser)` constructor that reconstructs a `SQLFunction` from the JSON usage blob produced by `toExpressionInfo`. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs.
- `DescribeFunctionCommand` (sql/core): when `SQLFunction.isSQLFunction(info.getClassName)` is true, dispatches to a new `describeSQLFunction(info, parser)` helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline through `SqlPathFormat`; the temporary `DescribeFunctionCommandUtils` helper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed by `describeSQLFunction`).
- `SessionCatalog.registerFunction`: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cached `ExpressionInfo` was always built via `makeExprInfoForHiveFunction`, which sets `usage = null`. That worked for the pre-existing `DESCRIBE FUNCTION` codepath (which doesn't read `usage`), but breaks the new `describeSQLFunction` path: after a SQL UDF has been invoked once, `DESCRIBE FUNCTION` reads back the cached info and `SQLFunction.fromExpressionInfo` cannot parse `null`. `registerFunction` now branches on `funcDefinition.isUserDefinedFunction` and builds the structured `ExpressionInfo` via `UserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfo` for SQL UDFs (matching the lookup-side build in `lookupPersistentFunction`), so the cached info has the right usage blob for DESCRIBE.

### Why are the changes needed?

`DESCRIBE FUNCTION` is intended to give users a human-readable description of a routine, analogous to `DESCRIBE TABLE` for tables. For SQL UDFs the current output instead exposes the internal serialization format:

```
> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE","sqlFunction.returnType":"DOUBLE","sqlFunction.expression":"width * height","sqlFunction.isTableFunc":"false",...}
 Extended Usage:
```

That JSON blob is not part of any public surface, and the literal string `sqlFunction.` for `Class:` is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized in `ExpressionInfo`; this PR just formats it.

### Does this PR introduce _any_ user-facing change?

Yes — the rows returned by `DESCRIBE FUNCTION [EXTENDED] <sql_udf>` change.

Before:

```
> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE", ...}
 Extended Usage:
```

After (simple case):

```
> DESCRIBE FUNCTION EXTENDED area;
 Function:      default.area
 Type:          SCALAR
 Input:         width  DOUBLE 'width'
                height DOUBLE 'height'
 Returns:       DOUBLE
 Comment:       compute area
 Deterministic: true
 Data Access:   CONTAINS SQL
 Owner:         <owner>
 Create Time:   <timestamp>
 Body:          width * height
```

After (function created under `spark.sql.path.enabled = true` with a non-default PATH at CREATE time):

```
> SET spark.sql.path.enabled = true;
> SET PATH = spark_catalog.path_func_db_a, system.builtin;
> CREATE FUNCTION frozen_fn() RETURNS INT RETURN (SELECT MAX(id) FROM frozen_t);
> SET PATH = spark_catalog.path_func_db_b, system.builtin;
> DESCRIBE FUNCTION EXTENDED default.frozen_fn;
 Function:      default.frozen_fn
 Type:          SCALAR
 Input:         ()
 Returns:       INT
 ...
 Body:          (SELECT MAX(id) FROM frozen_t)
 SQL Path:      `spark_catalog`.`path_func_db_a`, `system`.`builtin`
```

`SQL Path` reflects the creator's frozen PATH, not the session's current `PATH` at describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged.

### How was this patch tested?

Added four unit tests to `SQLFunctionSuite` (sql/core):

- `describe SQL scalar functions` — temporary and persistent scalar UDFs with comments, defaults, and `EXTENDED` mode. Asserts `Function`, `Type`, `Input` (column-aligned, with `DEFAULT` and `'comment'` in extended mode), `Returns`, `Deterministic`, `Data Access`, `Comment`, `Create Time`, `Body`.
- `describe SQL table functions` — table UDFs with explicit return columns; asserts `Type: TABLE`, `Returns` columns, and the EXTENDED-only fields.
- `describe SQL functions with derived routine characteristics` — checks that `Deterministic` and `Data Access` reflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved.
- The existing `SPARK-56639: SQL function uses frozen SQL path` test is extended: after switching `PATH` to a different namespace it invokes `default.frozen_fn` (populating the function-registry cache) and then runs `DESCRIBE FUNCTION EXTENDED default.frozen_fn`, asserting the `SQL Path:` row shows the *creator's* frozen path (`` `spark_catalog`.`path_func_db_a`, `system`.`builtin` ``) and does *not* mention the invoker's current path namespace. This extension also exercises the `SessionCatalog.registerFunction` fix above: prior to the fix, the DESCRIBE after the invocation hit `CORRUPTED_CATALOG_FUNCTION` because the cached `ExpressionInfo` had `usage = null`.

Each describe test uses `checkKeywordsExist` against `DESCRIBE FUNCTION [EXTENDED] <name>` output.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (claude-opus-4-7)

Closes #55915 from srielau/SPARK-56883-describe-sql-udf.

Authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit cfed631)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 19, 2026
### What changes were proposed in this pull request?

Renders a structured `DESCRIBE FUNCTION [EXTENDED]` output for SQL user-defined functions (temporary and persistent) in place of the generic `Function / Class / Usage:<json blob>` dump that `DescribeFunctionCommand` produces today for any function whose `ExpressionInfo.className != null`.

For SQL UDFs the output becomes:

- `Function:` qualified name
- `Type:` `SCALAR` or `TABLE`
- `Input:` parameter list (name + SQL type, column-aligned; `DEFAULT <expr>` and `'comment'` annotations are added in EXTENDED mode)
- `Returns:` scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode)
- EXTENDED only: `Comment`, `Collation`, `Deterministic`, `Data Access` (`CONTAINS SQL` / `READS SQL DATA`), `Configs`, `Owner`, `Create Time`, `Body`, and `SQL Path`.

`SQL Path:` is emitted only when both `spark.sql.path.enabled = true` and a frozen path was persisted on the function at `CREATE FUNCTION` time (SPARK-56639 / SPARK-56520). The path is read from the function's `function.resolutionPath` property and rendered through `SqlPathFormat.formatForDisplay`, producing the same `` `catalog`.`namespace` `` format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, *not* the invoker's current `PATH`.

Behavior for builtin functions and non-SQL UDFs is unchanged.

Class hierarchy / dispatch:

- `SQLFunction` (catalyst): adds the `SCALAR` / `TABLE` constants and a new `fromExpressionInfo(info, parser)` constructor that reconstructs a `SQLFunction` from the JSON usage blob produced by `toExpressionInfo`. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs.
- `DescribeFunctionCommand` (sql/core): when `SQLFunction.isSQLFunction(info.getClassName)` is true, dispatches to a new `describeSQLFunction(info, parser)` helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline through `SqlPathFormat`; the temporary `DescribeFunctionCommandUtils` helper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed by `describeSQLFunction`).
- `SessionCatalog.registerFunction`: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cached `ExpressionInfo` was always built via `makeExprInfoForHiveFunction`, which sets `usage = null`. That worked for the pre-existing `DESCRIBE FUNCTION` codepath (which doesn't read `usage`), but breaks the new `describeSQLFunction` path: after a SQL UDF has been invoked once, `DESCRIBE FUNCTION` reads back the cached info and `SQLFunction.fromExpressionInfo` cannot parse `null`. `registerFunction` now branches on `funcDefinition.isUserDefinedFunction` and builds the structured `ExpressionInfo` via `UserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfo` for SQL UDFs (matching the lookup-side build in `lookupPersistentFunction`), so the cached info has the right usage blob for DESCRIBE.

### Why are the changes needed?

`DESCRIBE FUNCTION` is intended to give users a human-readable description of a routine, analogous to `DESCRIBE TABLE` for tables. For SQL UDFs the current output instead exposes the internal serialization format:

```
> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE","sqlFunction.returnType":"DOUBLE","sqlFunction.expression":"width * height","sqlFunction.isTableFunc":"false",...}
 Extended Usage:
```

That JSON blob is not part of any public surface, and the literal string `sqlFunction.` for `Class:` is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized in `ExpressionInfo`; this PR just formats it.

### Does this PR introduce _any_ user-facing change?

Yes — the rows returned by `DESCRIBE FUNCTION [EXTENDED] <sql_udf>` change.

Before:

```
> DESCRIBE FUNCTION EXTENDED area;
 Function: default.area
 Class: sqlFunction.
 Usage: {"sqlFunction.inputParam":"width DOUBLE,height DOUBLE", ...}
 Extended Usage:
```

After (simple case):

```
> DESCRIBE FUNCTION EXTENDED area;
 Function:      default.area
 Type:          SCALAR
 Input:         width  DOUBLE 'width'
                height DOUBLE 'height'
 Returns:       DOUBLE
 Comment:       compute area
 Deterministic: true
 Data Access:   CONTAINS SQL
 Owner:         <owner>
 Create Time:   <timestamp>
 Body:          width * height
```

After (function created under `spark.sql.path.enabled = true` with a non-default PATH at CREATE time):

```
> SET spark.sql.path.enabled = true;
> SET PATH = spark_catalog.path_func_db_a, system.builtin;
> CREATE FUNCTION frozen_fn() RETURNS INT RETURN (SELECT MAX(id) FROM frozen_t);
> SET PATH = spark_catalog.path_func_db_b, system.builtin;
> DESCRIBE FUNCTION EXTENDED default.frozen_fn;
 Function:      default.frozen_fn
 Type:          SCALAR
 Input:         ()
 Returns:       INT
 ...
 Body:          (SELECT MAX(id) FROM frozen_t)
 SQL Path:      `spark_catalog`.`path_func_db_a`, `system`.`builtin`
```

`SQL Path` reflects the creator's frozen PATH, not the session's current `PATH` at describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged.

### How was this patch tested?

Added four unit tests to `SQLFunctionSuite` (sql/core):

- `describe SQL scalar functions` — temporary and persistent scalar UDFs with comments, defaults, and `EXTENDED` mode. Asserts `Function`, `Type`, `Input` (column-aligned, with `DEFAULT` and `'comment'` in extended mode), `Returns`, `Deterministic`, `Data Access`, `Comment`, `Create Time`, `Body`.
- `describe SQL table functions` — table UDFs with explicit return columns; asserts `Type: TABLE`, `Returns` columns, and the EXTENDED-only fields.
- `describe SQL functions with derived routine characteristics` — checks that `Deterministic` and `Data Access` reflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved.
- The existing `SPARK-56639: SQL function uses frozen SQL path` test is extended: after switching `PATH` to a different namespace it invokes `default.frozen_fn` (populating the function-registry cache) and then runs `DESCRIBE FUNCTION EXTENDED default.frozen_fn`, asserting the `SQL Path:` row shows the *creator's* frozen path (`` `spark_catalog`.`path_func_db_a`, `system`.`builtin` ``) and does *not* mention the invoker's current path namespace. This extension also exercises the `SessionCatalog.registerFunction` fix above: prior to the fix, the DESCRIBE after the invocation hit `CORRUPTED_CATALOG_FUNCTION` because the cached `ExpressionInfo` had `usage = null`.

Each describe test uses `checkKeywordsExist` against `DESCRIBE FUNCTION [EXTENDED] <name>` output.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (claude-opus-4-7)

Closes #55915 from srielau/SPARK-56883-describe-sql-udf.

Authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit cfed631)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants