[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views#55625
Closed
cloud-fan wants to merge 11 commits intoapache:masterfrom
Closed
Conversation
…CRIBE TABLE EXTENDED for v2 tables and views
### What changes were proposed in this pull request?
Standardize the `# Detailed Table Information` / `# Detailed View
Information` block in `DESCRIBE TABLE EXTENDED` output for v2 tables
and views to emit structured rows derived from the resolved
identifier:
- For tables (`DescribeTableExec`): the single `Name` row that came
from `Table.name()` is replaced by `Catalog`, `Namespace`, and
`Table` (plus a `Database` row when the namespace is a single
segment, for v1 compatibility — see below).
- For views (`DescribeV2ViewExec`): the `Catalog` + `Identifier`
pair (where `Identifier` was a single string concatenating
namespace and name with `.`) is replaced by `Catalog`,
`Namespace`, and `View` (plus the same `Database` row when the
namespace is single-segment).
The catalog name and resolved `Identifier` are threaded from
`ResolvedTable` / `ResolvedPersistentView` through the v2 execs.
`DescribeTablePartitionExec` is updated to pass the catalog name to
the inner `DescribeTableExec` it constructs for the schema/partition
header.
The `Namespace` row uses `Identifier.namespace().quoted` —
dot-separated, with back-tick quoting only on segments that need it
— matching the existing Spark convention for multi-segment
namespaces. This keeps the row round-trip-safe for namespaces with
dots in segments while staying readable for the common single-level
case.
#### `Database` row for v1 compatibility
v1 `DescribeTableCommand` (via `CatalogTable.toJsonLinkedHashMap`)
already emits `Catalog` / `Database` / `Table` rows, where
`Database` is the single-string `database` field of
`TableIdentifier`. To keep DESCRIBE consumers that read the
`Database` row working uniformly across v1 (HMS) and v2 single-level
catalogs, this PR additionally emits a `Database` row in the v2
output **when the namespace is a single segment** — its value is
that single segment.
Multi-segment v2 namespaces can't be rendered as a single-string
`database` losslessly, so the `Database` row is omitted in that
case and consumers must read `Namespace` instead. The `Namespace`
row is *always* present and is the canonical v2 representation.
### Why are the changes needed?
In a multi-catalog deployment, the catalog name is a first-class
part of a v2 table or view identifier. The previous output buried
it inside connector-controlled strings:
- `Table.name()` for tables is connector-defined; some connectors
return `catalog.namespace.name`, others just `namespace.name`,
others use a custom format. The result is that `DESCRIBE TABLE`
output looks different across catalogs even for the same logical
table shape.
- `Identifier` for v2 views collapsed namespace and name into a
single dotted string, so consumers had to parse the dot back out
and could not unambiguously round-trip multi-level namespaces
with dots in segments.
Splitting the components into `Catalog`, `Namespace`, and
`Table` / `View` rows:
- gives `DESCRIBE TABLE EXTENDED` a uniform shape across v2
connectors,
- makes the catalog name explicit and surfaceable when multiple v2
catalogs are configured,
- handles multi-level namespaces naturally via
`Identifier.namespace().quoted`,
- aligns the table and view sections so consumers can read the same
rows from either, switching only on the section header
(`# Detailed Table Information` vs `# Detailed View Information`),
- with the `Database` compatibility row, lets consumers built for
v1 (HMS) keep working without changes when the v2 namespace
happens to be single-segment,
- is parseable programmatically without splitting strings.
### Does this PR introduce any user-facing change?
Yes, slight output change in `DESCRIBE TABLE EXTENDED` for v2 tables
and v2 views.
For v2 tables, single-segment namespace (most common):
- Before: `Name | testcat.ns.t | `
- After: `Catalog | testcat | `, `Namespace | ns | `,
`Database | ns | `, `Table | t | `.
For v2 tables, multi-segment namespace:
- Before: `Name | testcat.ns1.ns2.t | `
- After: `Catalog | testcat | `, `Namespace | ns1.ns2 | `,
`Table | t | `.
For v2 views, single-segment namespace:
- Before: `Catalog | testcat | `, `Identifier | ns.v | `
- After: `Catalog | testcat | `, `Namespace | ns | `,
`Database | ns | `, `View | v | `.
For v2 views, multi-segment namespace:
- Before: `Catalog | testcat | `, `Identifier | ns1.ns2.v | `
- After: `Catalog | testcat | `, `Namespace | ns1.ns2 | `,
`View | v | `.
v1 paths (session-catalog tables and views via HMS) are unchanged.
Tools that read DESCRIBE output should switch from concatenating
`Name` / `Identifier` to reading the structured rows.
### How was this patch tested?
- Updated the affected golden assertion in `DescribeTableSuite`
(`DESCRIBE TABLE EXTENDED of a partitioned table`) to match the
new row layout including the `Database` compatibility row.
- Added a focused test `DESCRIBE TABLE EXTENDED emits structured
Catalog/Namespace/Table rows` in v2 `DescribeTableSuite` that
pins the structured rows on a freshly created v2 table, including
the `Database` row for the single-segment-namespace case.
- Removed the now-redundant `DESCRIBE TABLE EXTENDED on a non-view
MetadataTable shows the real identifier` test in
`DataSourceV2MetadataTableSuite` (the structured-row layout is
what's pinned by the new test in `v2.DescribeTableSuite`; the
identifier-passthrough behavior is no longer tied to
`MetadataTable.name()`).
Ran:
build/sbt 'sql/testOnly \
org.apache.spark.sql.execution.command.v2.DescribeTableSuite \
org.apache.spark.sql.connector.DataSourceV2MetadataTableSuite \
org.apache.spark.sql.connector.DataSourceV2MetadataViewSuite'
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
a8548fe to
2d746f5
Compare
…e as root - Pin multi-segment behavior (Namespace dot-quoted, Database omitted) for v2 tables in DescribeTableSuite. - Pin structured Catalog/Namespace/View rows + multi-segment behavior for v2 views in DescribeViewSuite. - Document in DescribeTableExec that an empty namespace renders as an empty Namespace row (root namespace is intentional and canonical). Co-authored-by: Isaac
Two new tests were titled "...and dot-quotes Namespace" but only verified the dot-joined render of a 2-segment namespace; neither segment in the test (`ns1`, `ns2`) actually triggers `quoteIfNeeded`, so the per-segment back-tick quoting wasn't exercised. Rename to "joins Namespace with dots" and add a parenthetical noting that quoteIfNeeded isn't exercised here. Also annotate the inner DescribeTableExec construction in DescribeTablePartitionExec to make explicit that catalogName / tableIdent are passed for ctor completeness only on this path, since addBaseDescription (the only method invoked there) reads neither field. Co-authored-by: Isaac
Clarify that `catalogName` / `tableIdent` are unused because the call site invokes `addBaseDescription` directly and never reaches `run()` / `addTableDetails`, not because of `isExtended = false` per se.
…tion exec Pulls schema/partitioning/clustering row formatting out of DescribeTableExec into a shared `DescribeTableBaseRows` trait. DescribeTablePartitionExec now mixes in the trait and calls addBaseDescription directly, so it no longer constructs an inner DescribeTableExec just to invoke the helper -- and no longer needs the unused `catalogName` constructor arg. Also tightens a test comment and a Javadoc to read as current-state rather than referring to the prior shape.
Replace the conditional "Database row only when namespace.length == 1" with an unconditional emission: value is the trailing namespace segment (or the empty string for a root-level entity). Multi-segment namespaces now surface their leaf segment as `Database` instead of suppressing the row, keeping the v1-compat row uniform across all v2 layouts. Update the helper Scaladoc and the multi-segment table/view tests accordingly. Co-authored-by: Isaac
gengliangwang
approved these changes
May 1, 2026
Contributor
Author
|
the docker test failure is unrelated, thanks for the review, merging to master! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Standardize the
# Detailed Table Information/# Detailed View Informationblock inDESCRIBE TABLE EXTENDEDoutput for v2 tablesand views to emit structured rows derived from the resolved
identifier:
DescribeTableExec): the singleNamerow that camefrom
Table.name()is replaced byCatalog,Namespace,Database, andTable.DescribeV2ViewExec): theCatalog+Identifierpair (where
Identifierwas a single string concatenatingnamespace and name with
.) is replaced byCatalog,Namespace,Database, andView.The catalog name and resolved
Identifierare threaded fromResolvedTable/ResolvedPersistentViewthrough the v2 execs.DescribeTablePartitionExecis updated to pass the catalog name tothe inner
DescribeTableExecit constructs for the schema/partitionheader.
The
Namespacerow usesIdentifier.namespace().quoted—dot-separated, with back-tick quoting only on segments that need it
— matching the existing Spark convention for multi-segment
namespaces. This keeps the row round-trip-safe for namespaces with
dots in segments while staying readable for the common single-level
case.
Databaserow for v1 compatibilityv1
DescribeTableCommand(viaCatalogTable.toJsonLinkedHashMap)emits
Catalog/Database/Tablerows, whereDatabaseisthe single-string
databasefield ofTableIdentifier. To keepDESCRIBE consumers that read the
Databaserow working uniformlyacross v1 (HMS) and v2, this PR also emits a
Databaserow in thev2 output. The row is always present:
Databaseis that singlesegment (matches v1 exactly).
Databaseis the trailingsegment — multi-segment namespaces still surface their leaf
segment under the v1-compat row, while consumers that need the
full namespace read
Namespace.Databaseis theempty string. The row is still emitted so the layout is uniform
across all v2 namespaces.
Databasealone is not round-trip-safe for multi-segment cases;Namespaceis the canonical v2 representation.Why are the changes needed?
In a multi-catalog deployment, the catalog name is a first-class
part of a v2 table or view identifier. The previous output buried
it inside connector-controlled strings:
Table.name()for tables is connector-defined; some connectorsreturn
catalog.namespace.name, others justnamespace.name,others use a custom format. The result is that
DESCRIBE TABLEoutput looks different across catalogs even for the same logical
table shape.
Identifierfor v2 views collapsed namespace and name into asingle dotted string, so consumers had to parse the dot back out
and could not unambiguously round-trip multi-level namespaces
with dots in segments.
Splitting the components into
Catalog,Namespace,Database, andTable/Viewrows:DESCRIBE TABLE EXTENDEDa uniform shape across v2connectors,
catalogs are configured,
Identifier.namespace().quoted,rows from either, switching only on the section header
(
# Detailed Table Informationvs# Detailed View Information),Databasecompatibility row, letsconsumers built for v1 (HMS) keep working without changes,
Does this PR introduce any user-facing change?
Yes, slight output change in
DESCRIBE TABLE EXTENDEDfor v2 tablesand v2 views.
For v2 tables, single-segment namespace (most common):
Name | testcat.ns.t |Catalog | testcat |,Namespace | ns |,Database | ns |,Table | t |.For v2 tables, multi-segment namespace:
Name | testcat.ns1.ns2.t |Catalog | testcat |,Namespace | ns1.ns2 |,Database | ns2 |,Table | t |.For v2 views, single-segment namespace:
Catalog | testcat |,Identifier | ns.v |Catalog | testcat |,Namespace | ns |,Database | ns |,View | v |.For v2 views, multi-segment namespace:
Catalog | testcat |,Identifier | ns1.ns2.v |Catalog | testcat |,Namespace | ns1.ns2 |,Database | ns2 |,View | v |.v1 paths (session-catalog tables and views via HMS) are unchanged.
Tools that read DESCRIBE output should switch from concatenating
Name/Identifierto reading the structured rows.How was this patch tested?
DescribeTableSuite(
DESCRIBE TABLE EXTENDED of a partitioned table) to match thenew row layout including the
Databasecompatibility row.DescribeTableSuitepinning thestructured rows on a freshly created v2 table for both
single-segment (
ns) and multi-segment (ns1.ns2) namespaces —the multi-segment test pins that
Databasecarries the trailingsegment while
Namespacecarries the full dot-joined form.DescribeViewSuitepinning the samelayout for v2 views (single-segment and multi-segment).
DESCRIBE TABLE EXTENDED on a non-view MetadataTable shows the real identifiertest inDataSourceV2MetadataTableSuite(the structured-row layout iswhat's pinned by the new tests in
v2.DescribeTableSuite; theidentifier-passthrough behavior is no longer tied to
MetadataTable.name()).Ran:
build/sbt 'sql/testOnly
org.apache.spark.sql.execution.command.v2.DescribeTableSuite
org.apache.spark.sql.execution.command.v2.DescribeViewSuite
org.apache.spark.sql.connector.DataSourceV2MetadataTableSuite
org.apache.spark.sql.connector.DataSourceV2MetadataViewSuite'
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7