[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views by cloud-fan · Pull Request #55625 · apache/spark

cloud-fan · 2026-04-30T14:07:13Z

What changes were proposed in this pull request?

Standardize the # Detailed Table Information / # Detailed View Information block in DESCRIBE TABLE EXTENDED output for v2 tables
and views to emit structured rows derived from the resolved
identifier:

For tables (DescribeTableExec): the single Name row that came
from Table.name() is replaced by Catalog, Namespace,
Database, and Table.
For views (DescribeV2ViewExec): the Catalog + Identifier
pair (where Identifier was a single string concatenating
namespace and name with .) is replaced by Catalog,
Namespace, Database, and View.

The catalog name and resolved Identifier are threaded from
ResolvedTable / ResolvedPersistentView through the v2 execs.
DescribeTablePartitionExec is updated to pass the catalog name to
the inner DescribeTableExec it constructs for the schema/partition
header.

The Namespace row uses Identifier.namespace().quoted —
dot-separated, with back-tick quoting only on segments that need it
— matching the existing Spark convention for multi-segment
namespaces. This keeps the row round-trip-safe for namespaces with
dots in segments while staying readable for the common single-level
case.

`Database` row for v1 compatibility

v1 DescribeTableCommand (via CatalogTable.toJsonLinkedHashMap)
emits Catalog / Database / Table rows, where Database is
the single-string database field of TableIdentifier. To keep
DESCRIBE consumers that read the Database row working uniformly
across v1 (HMS) and v2, this PR also emits a Database row in the
v2 output. The row is always present:

For a single-segment namespace, Database is that single
segment (matches v1 exactly).
For a multi-segment namespace, Database is the trailing
segment — multi-segment namespaces still surface their leaf
segment under the v1-compat row, while consumers that need the
full namespace read Namespace.
For a root-level entity (empty namespace), Database is the
empty string. The row is still emitted so the layout is uniform
across all v2 namespaces.

Database alone is not round-trip-safe for multi-segment cases;
Namespace is the canonical v2 representation.

Why are the changes needed?

In a multi-catalog deployment, the catalog name is a first-class
part of a v2 table or view identifier. The previous output buried
it inside connector-controlled strings:

Table.name() for tables is connector-defined; some connectors
return catalog.namespace.name, others just namespace.name,
others use a custom format. The result is that DESCRIBE TABLE
output looks different across catalogs even for the same logical
table shape.
Identifier for v2 views collapsed namespace and name into a
single dotted string, so consumers had to parse the dot back out
and could not unambiguously round-trip multi-level namespaces
with dots in segments.

Splitting the components into Catalog, Namespace,
Database, and Table / View rows:

gives DESCRIBE TABLE EXTENDED a uniform shape across v2
connectors,
makes the catalog name explicit and surfaceable when multiple v2
catalogs are configured,
handles multi-level namespaces naturally via
Identifier.namespace().quoted,
aligns the table and view sections so consumers can read the same
rows from either, switching only on the section header
(# Detailed Table Information vs # Detailed View Information),
with the always-emitted Database compatibility row, lets
consumers built for v1 (HMS) keep working without changes,
is parseable programmatically without splitting strings.

Does this PR introduce any user-facing change?

Yes, slight output change in DESCRIBE TABLE EXTENDED for v2 tables
and v2 views.

For v2 tables, single-segment namespace (most common):

Before: Name | testcat.ns.t |
After: Catalog | testcat | , Namespace | ns | ,
Database | ns | , Table | t | .

For v2 tables, multi-segment namespace:

Before: Name | testcat.ns1.ns2.t |
After: Catalog | testcat | , Namespace | ns1.ns2 | ,
Database | ns2 | , Table | t | .

For v2 views, single-segment namespace:

Before: Catalog | testcat | , Identifier | ns.v |
After: Catalog | testcat | , Namespace | ns | ,
Database | ns | , View | v | .

For v2 views, multi-segment namespace:

Before: Catalog | testcat | , Identifier | ns1.ns2.v |
After: Catalog | testcat | , Namespace | ns1.ns2 | ,
Database | ns2 | , View | v | .

v1 paths (session-catalog tables and views via HMS) are unchanged.
Tools that read DESCRIBE output should switch from concatenating
Name / Identifier to reading the structured rows.

How was this patch tested?

Updated the affected golden assertion in DescribeTableSuite
(DESCRIBE TABLE EXTENDED of a partitioned table) to match the
new row layout including the Database compatibility row.
Added focused tests in v2 DescribeTableSuite pinning the
structured rows on a freshly created v2 table for both
single-segment (ns) and multi-segment (ns1.ns2) namespaces —
the multi-segment test pins that Database carries the trailing
segment while Namespace carries the full dot-joined form.
Added parallel tests in v2 DescribeViewSuite pinning the same
layout for v2 views (single-segment and multi-segment).
Removed the now-redundant DESCRIBE TABLE EXTENDED on a non-view MetadataTable shows the real identifier test in
DataSourceV2MetadataTableSuite (the structured-row layout is
what's pinned by the new tests in v2.DescribeTableSuite; the
identifier-passthrough behavior is no longer tied to
MetadataTable.name()).

Ran:

build/sbt 'sql/testOnly
org.apache.spark.sql.execution.command.v2.DescribeTableSuite
org.apache.spark.sql.execution.command.v2.DescribeViewSuite
org.apache.spark.sql.connector.DataSourceV2MetadataTableSuite
org.apache.spark.sql.connector.DataSourceV2MetadataViewSuite'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…CRIBE TABLE EXTENDED for v2 tables and views ### What changes were proposed in this pull request? Standardize the `# Detailed Table Information` / `# Detailed View Information` block in `DESCRIBE TABLE EXTENDED` output for v2 tables and views to emit structured rows derived from the resolved identifier: - For tables (`DescribeTableExec`): the single `Name` row that came from `Table.name()` is replaced by `Catalog`, `Namespace`, and `Table` (plus a `Database` row when the namespace is a single segment, for v1 compatibility — see below). - For views (`DescribeV2ViewExec`): the `Catalog` + `Identifier` pair (where `Identifier` was a single string concatenating namespace and name with `.`) is replaced by `Catalog`, `Namespace`, and `View` (plus the same `Database` row when the namespace is single-segment). The catalog name and resolved `Identifier` are threaded from `ResolvedTable` / `ResolvedPersistentView` through the v2 execs. `DescribeTablePartitionExec` is updated to pass the catalog name to the inner `DescribeTableExec` it constructs for the schema/partition header. The `Namespace` row uses `Identifier.namespace().quoted` — dot-separated, with back-tick quoting only on segments that need it — matching the existing Spark convention for multi-segment namespaces. This keeps the row round-trip-safe for namespaces with dots in segments while staying readable for the common single-level case. #### `Database` row for v1 compatibility v1 `DescribeTableCommand` (via `CatalogTable.toJsonLinkedHashMap`) already emits `Catalog` / `Database` / `Table` rows, where `Database` is the single-string `database` field of `TableIdentifier`. To keep DESCRIBE consumers that read the `Database` row working uniformly across v1 (HMS) and v2 single-level catalogs, this PR additionally emits a `Database` row in the v2 output **when the namespace is a single segment** — its value is that single segment. Multi-segment v2 namespaces can't be rendered as a single-string `database` losslessly, so the `Database` row is omitted in that case and consumers must read `Namespace` instead. The `Namespace` row is *always* present and is the canonical v2 representation. ### Why are the changes needed? In a multi-catalog deployment, the catalog name is a first-class part of a v2 table or view identifier. The previous output buried it inside connector-controlled strings: - `Table.name()` for tables is connector-defined; some connectors return `catalog.namespace.name`, others just `namespace.name`, others use a custom format. The result is that `DESCRIBE TABLE` output looks different across catalogs even for the same logical table shape. - `Identifier` for v2 views collapsed namespace and name into a single dotted string, so consumers had to parse the dot back out and could not unambiguously round-trip multi-level namespaces with dots in segments. Splitting the components into `Catalog`, `Namespace`, and `Table` / `View` rows: - gives `DESCRIBE TABLE EXTENDED` a uniform shape across v2 connectors, - makes the catalog name explicit and surfaceable when multiple v2 catalogs are configured, - handles multi-level namespaces naturally via `Identifier.namespace().quoted`, - aligns the table and view sections so consumers can read the same rows from either, switching only on the section header (`# Detailed Table Information` vs `# Detailed View Information`), - with the `Database` compatibility row, lets consumers built for v1 (HMS) keep working without changes when the v2 namespace happens to be single-segment, - is parseable programmatically without splitting strings. ### Does this PR introduce any user-facing change? Yes, slight output change in `DESCRIBE TABLE EXTENDED` for v2 tables and v2 views. For v2 tables, single-segment namespace (most common): - Before: `Name | testcat.ns.t | ` - After: `Catalog | testcat | `, `Namespace | ns | `, `Database | ns | `, `Table | t | `. For v2 tables, multi-segment namespace: - Before: `Name | testcat.ns1.ns2.t | ` - After: `Catalog | testcat | `, `Namespace | ns1.ns2 | `, `Table | t | `. For v2 views, single-segment namespace: - Before: `Catalog | testcat | `, `Identifier | ns.v | ` - After: `Catalog | testcat | `, `Namespace | ns | `, `Database | ns | `, `View | v | `. For v2 views, multi-segment namespace: - Before: `Catalog | testcat | `, `Identifier | ns1.ns2.v | ` - After: `Catalog | testcat | `, `Namespace | ns1.ns2 | `, `View | v | `. v1 paths (session-catalog tables and views via HMS) are unchanged. Tools that read DESCRIBE output should switch from concatenating `Name` / `Identifier` to reading the structured rows. ### How was this patch tested? - Updated the affected golden assertion in `DescribeTableSuite` (`DESCRIBE TABLE EXTENDED of a partitioned table`) to match the new row layout including the `Database` compatibility row. - Added a focused test `DESCRIBE TABLE EXTENDED emits structured Catalog/Namespace/Table rows` in v2 `DescribeTableSuite` that pins the structured rows on a freshly created v2 table, including the `Database` row for the single-segment-namespace case. - Removed the now-redundant `DESCRIBE TABLE EXTENDED on a non-view MetadataTable shows the real identifier` test in `DataSourceV2MetadataTableSuite` (the structured-row layout is what's pinned by the new test in `v2.DescribeTableSuite`; the identifier-passthrough behavior is no longer tied to `MetadataTable.name()`). Ran: build/sbt 'sql/testOnly \ org.apache.spark.sql.execution.command.v2.DescribeTableSuite \ org.apache.spark.sql.connector.DataSourceV2MetadataTableSuite \ org.apache.spark.sql.connector.DataSourceV2MetadataViewSuite' ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7

…e as root - Pin multi-segment behavior (Namespace dot-quoted, Database omitted) for v2 tables in DescribeTableSuite. - Pin structured Catalog/Namespace/View rows + multi-segment behavior for v2 views in DescribeViewSuite. - Document in DescribeTableExec that an empty namespace renders as an empty Namespace row (root namespace is intentional and canonical). Co-authored-by: Isaac

Two new tests were titled "...and dot-quotes Namespace" but only verified the dot-joined render of a 2-segment namespace; neither segment in the test (`ns1`, `ns2`) actually triggers `quoteIfNeeded`, so the per-segment back-tick quoting wasn't exercised. Rename to "joins Namespace with dots" and add a parenthetical noting that quoteIfNeeded isn't exercised here. Also annotate the inner DescribeTableExec construction in DescribeTablePartitionExec to make explicit that catalogName / tableIdent are passed for ctor completeness only on this path, since addBaseDescription (the only method invoked there) reads neither field. Co-authored-by: Isaac

Clarify that `catalogName` / `tableIdent` are unused because the call site invokes `addBaseDescription` directly and never reaches `run()` / `addTableDetails`, not because of `isExtended = false` per se.

…tion exec Pulls schema/partitioning/clustering row formatting out of DescribeTableExec into a shared `DescribeTableBaseRows` trait. DescribeTablePartitionExec now mixes in the trait and calls addBaseDescription directly, so it no longer constructs an inner DescribeTableExec just to invoke the helper -- and no longer needs the unused `catalogName` constructor arg. Also tightens a test comment and a Javadoc to read as current-state rather than referring to the prior shape.

… comments

…ntract

Replace the conditional "Database row only when namespace.length == 1" with an unconditional emission: value is the trailing namespace segment (or the empty string for a root-level entity). Multi-segment namespaces now surface their leaf segment as `Database` instead of suppressing the row, keeping the v1-compat row uniform across all v2 layouts. Update the helper Scaladoc and the multi-segment table/view tests accordingly. Co-authored-by: Isaac

cloud-fan · 2026-05-01T17:31:07Z

the docker test failure is unrelated, thanks for the review, merging to master!

cloud-fan changed the title ~~[SPARK-XXXXX][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views~~ [SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views Apr 30, 2026

cloud-fan force-pushed the describe-table-view-structured-rows branch from a8548fe to 2d746f5 Compare April 30, 2026 14:17

cloud-fan added 10 commits April 30, 2026 15:19

Sharpen DescribeTablePartitionExec comment on unused inner-exec fields

d98edb7

Clarify that `catalogName` / `tableIdent` are unused because the call site invokes `addBaseDescription` directly and never reaches `run()` / `addTableDetails`, not because of `isExtended = false` per se.

Clarify DESCRIBE EXTENDED v2 row docs and DescribeV2ViewExec scaladoc

186cc51

Update MetadataTable.name javadoc after DESCRIBE row restructure

437abdd

Centralize Catalog/Namespace/<entity> rows and tighten Describe trait…

9c6426f

… comments

Add Namespace to V2 describe filter and decouple identifier trait

02a06ca

Lift emptyRow to DescribeTableBaseRows and clarify empty-namespace co…

426ae4c

…ntract

gengliangwang approved these changes May 1, 2026

View reviewed changes

cloud-fan closed this in 3a685b1 May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views#55625

[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views#55625
cloud-fan wants to merge 11 commits intoapache:masterfrom
cloud-fan:describe-table-view-structured-rows

cloud-fan commented Apr 30, 2026 •

edited

Loading

Uh oh!

cloud-fan commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cloud-fan commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Database row for v1 compatibility

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloud-fan commented Apr 30, 2026 •

edited

Loading

`Database` row for v1 compatibility