Skip to content

[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views#55625

Closed
cloud-fan wants to merge 11 commits intoapache:masterfrom
cloud-fan:describe-table-view-structured-rows
Closed

[SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views#55625
cloud-fan wants to merge 11 commits intoapache:masterfrom
cloud-fan:describe-table-view-structured-rows

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented Apr 30, 2026

What changes were proposed in this pull request?

Standardize the # Detailed Table Information / # Detailed View Information block in DESCRIBE TABLE EXTENDED output for v2 tables
and views to emit structured rows derived from the resolved
identifier:

  • For tables (DescribeTableExec): the single Name row that came
    from Table.name() is replaced by Catalog, Namespace,
    Database, and Table.
  • For views (DescribeV2ViewExec): the Catalog + Identifier
    pair (where Identifier was a single string concatenating
    namespace and name with .) is replaced by Catalog,
    Namespace, Database, and View.

The catalog name and resolved Identifier are threaded from
ResolvedTable / ResolvedPersistentView through the v2 execs.
DescribeTablePartitionExec is updated to pass the catalog name to
the inner DescribeTableExec it constructs for the schema/partition
header.

The Namespace row uses Identifier.namespace().quoted
dot-separated, with back-tick quoting only on segments that need it
— matching the existing Spark convention for multi-segment
namespaces. This keeps the row round-trip-safe for namespaces with
dots in segments while staying readable for the common single-level
case.

Database row for v1 compatibility

v1 DescribeTableCommand (via CatalogTable.toJsonLinkedHashMap)
emits Catalog / Database / Table rows, where Database is
the single-string database field of TableIdentifier. To keep
DESCRIBE consumers that read the Database row working uniformly
across v1 (HMS) and v2, this PR also emits a Database row in the
v2 output. The row is always present:

  • For a single-segment namespace, Database is that single
    segment (matches v1 exactly).
  • For a multi-segment namespace, Database is the trailing
    segment — multi-segment namespaces still surface their leaf
    segment under the v1-compat row, while consumers that need the
    full namespace read Namespace.
  • For a root-level entity (empty namespace), Database is the
    empty string. The row is still emitted so the layout is uniform
    across all v2 namespaces.

Database alone is not round-trip-safe for multi-segment cases;
Namespace is the canonical v2 representation.

Why are the changes needed?

In a multi-catalog deployment, the catalog name is a first-class
part of a v2 table or view identifier. The previous output buried
it inside connector-controlled strings:

  • Table.name() for tables is connector-defined; some connectors
    return catalog.namespace.name, others just namespace.name,
    others use a custom format. The result is that DESCRIBE TABLE
    output looks different across catalogs even for the same logical
    table shape.
  • Identifier for v2 views collapsed namespace and name into a
    single dotted string, so consumers had to parse the dot back out
    and could not unambiguously round-trip multi-level namespaces
    with dots in segments.

Splitting the components into Catalog, Namespace,
Database, and Table / View rows:

  • gives DESCRIBE TABLE EXTENDED a uniform shape across v2
    connectors,
  • makes the catalog name explicit and surfaceable when multiple v2
    catalogs are configured,
  • handles multi-level namespaces naturally via
    Identifier.namespace().quoted,
  • aligns the table and view sections so consumers can read the same
    rows from either, switching only on the section header
    (# Detailed Table Information vs # Detailed View Information),
  • with the always-emitted Database compatibility row, lets
    consumers built for v1 (HMS) keep working without changes,
  • is parseable programmatically without splitting strings.

Does this PR introduce any user-facing change?

Yes, slight output change in DESCRIBE TABLE EXTENDED for v2 tables
and v2 views.

For v2 tables, single-segment namespace (most common):

  • Before: Name | testcat.ns.t |
  • After: Catalog | testcat | , Namespace | ns | ,
    Database | ns | , Table | t | .

For v2 tables, multi-segment namespace:

  • Before: Name | testcat.ns1.ns2.t |
  • After: Catalog | testcat | , Namespace | ns1.ns2 | ,
    Database | ns2 | , Table | t | .

For v2 views, single-segment namespace:

  • Before: Catalog | testcat | , Identifier | ns.v |
  • After: Catalog | testcat | , Namespace | ns | ,
    Database | ns | , View | v | .

For v2 views, multi-segment namespace:

  • Before: Catalog | testcat | , Identifier | ns1.ns2.v |
  • After: Catalog | testcat | , Namespace | ns1.ns2 | ,
    Database | ns2 | , View | v | .

v1 paths (session-catalog tables and views via HMS) are unchanged.
Tools that read DESCRIBE output should switch from concatenating
Name / Identifier to reading the structured rows.

How was this patch tested?

  • Updated the affected golden assertion in DescribeTableSuite
    (DESCRIBE TABLE EXTENDED of a partitioned table) to match the
    new row layout including the Database compatibility row.
  • Added focused tests in v2 DescribeTableSuite pinning the
    structured rows on a freshly created v2 table for both
    single-segment (ns) and multi-segment (ns1.ns2) namespaces —
    the multi-segment test pins that Database carries the trailing
    segment while Namespace carries the full dot-joined form.
  • Added parallel tests in v2 DescribeViewSuite pinning the same
    layout for v2 views (single-segment and multi-segment).
  • Removed the now-redundant DESCRIBE TABLE EXTENDED on a non-view MetadataTable shows the real identifier test in
    DataSourceV2MetadataTableSuite (the structured-row layout is
    what's pinned by the new tests in v2.DescribeTableSuite; the
    identifier-passthrough behavior is no longer tied to
    MetadataTable.name()).

Ran:

build/sbt 'sql/testOnly
org.apache.spark.sql.execution.command.v2.DescribeTableSuite
org.apache.spark.sql.execution.command.v2.DescribeViewSuite
org.apache.spark.sql.connector.DataSourceV2MetadataTableSuite
org.apache.spark.sql.connector.DataSourceV2MetadataViewSuite'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

@cloud-fan cloud-fan changed the title [SPARK-XXXXX][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views [SPARK-56678][SQL] Use structured Catalog/Namespace/Table rows in DESCRIBE TABLE EXTENDED for v2 tables and views Apr 30, 2026
…CRIBE TABLE EXTENDED for v2 tables and views

### What changes were proposed in this pull request?

Standardize the `# Detailed Table Information` / `# Detailed View
Information` block in `DESCRIBE TABLE EXTENDED` output for v2 tables
and views to emit structured rows derived from the resolved
identifier:

- For tables (`DescribeTableExec`): the single `Name` row that came
  from `Table.name()` is replaced by `Catalog`, `Namespace`, and
  `Table` (plus a `Database` row when the namespace is a single
  segment, for v1 compatibility — see below).
- For views (`DescribeV2ViewExec`): the `Catalog` + `Identifier`
  pair (where `Identifier` was a single string concatenating
  namespace and name with `.`) is replaced by `Catalog`,
  `Namespace`, and `View` (plus the same `Database` row when the
  namespace is single-segment).

The catalog name and resolved `Identifier` are threaded from
`ResolvedTable` / `ResolvedPersistentView` through the v2 execs.
`DescribeTablePartitionExec` is updated to pass the catalog name to
the inner `DescribeTableExec` it constructs for the schema/partition
header.

The `Namespace` row uses `Identifier.namespace().quoted` —
dot-separated, with back-tick quoting only on segments that need it
— matching the existing Spark convention for multi-segment
namespaces. This keeps the row round-trip-safe for namespaces with
dots in segments while staying readable for the common single-level
case.

#### `Database` row for v1 compatibility

v1 `DescribeTableCommand` (via `CatalogTable.toJsonLinkedHashMap`)
already emits `Catalog` / `Database` / `Table` rows, where
`Database` is the single-string `database` field of
`TableIdentifier`. To keep DESCRIBE consumers that read the
`Database` row working uniformly across v1 (HMS) and v2 single-level
catalogs, this PR additionally emits a `Database` row in the v2
output **when the namespace is a single segment** — its value is
that single segment.

Multi-segment v2 namespaces can't be rendered as a single-string
`database` losslessly, so the `Database` row is omitted in that
case and consumers must read `Namespace` instead. The `Namespace`
row is *always* present and is the canonical v2 representation.

### Why are the changes needed?

In a multi-catalog deployment, the catalog name is a first-class
part of a v2 table or view identifier. The previous output buried
it inside connector-controlled strings:

- `Table.name()` for tables is connector-defined; some connectors
  return `catalog.namespace.name`, others just `namespace.name`,
  others use a custom format. The result is that `DESCRIBE TABLE`
  output looks different across catalogs even for the same logical
  table shape.
- `Identifier` for v2 views collapsed namespace and name into a
  single dotted string, so consumers had to parse the dot back out
  and could not unambiguously round-trip multi-level namespaces
  with dots in segments.

Splitting the components into `Catalog`, `Namespace`, and
`Table` / `View` rows:
- gives `DESCRIBE TABLE EXTENDED` a uniform shape across v2
  connectors,
- makes the catalog name explicit and surfaceable when multiple v2
  catalogs are configured,
- handles multi-level namespaces naturally via
  `Identifier.namespace().quoted`,
- aligns the table and view sections so consumers can read the same
  rows from either, switching only on the section header
  (`# Detailed Table Information` vs `# Detailed View Information`),
- with the `Database` compatibility row, lets consumers built for
  v1 (HMS) keep working without changes when the v2 namespace
  happens to be single-segment,
- is parseable programmatically without splitting strings.

### Does this PR introduce any user-facing change?

Yes, slight output change in `DESCRIBE TABLE EXTENDED` for v2 tables
and v2 views.

For v2 tables, single-segment namespace (most common):
- Before: `Name | testcat.ns.t | `
- After: `Catalog | testcat | `, `Namespace | ns | `,
  `Database | ns | `, `Table | t | `.

For v2 tables, multi-segment namespace:
- Before: `Name | testcat.ns1.ns2.t | `
- After: `Catalog | testcat | `, `Namespace | ns1.ns2 | `,
  `Table | t | `.

For v2 views, single-segment namespace:
- Before: `Catalog | testcat | `, `Identifier | ns.v | `
- After: `Catalog | testcat | `, `Namespace | ns | `,
  `Database | ns | `, `View | v | `.

For v2 views, multi-segment namespace:
- Before: `Catalog | testcat | `, `Identifier | ns1.ns2.v | `
- After: `Catalog | testcat | `, `Namespace | ns1.ns2 | `,
  `View | v | `.

v1 paths (session-catalog tables and views via HMS) are unchanged.
Tools that read DESCRIBE output should switch from concatenating
`Name` / `Identifier` to reading the structured rows.

### How was this patch tested?

- Updated the affected golden assertion in `DescribeTableSuite`
  (`DESCRIBE TABLE EXTENDED of a partitioned table`) to match the
  new row layout including the `Database` compatibility row.
- Added a focused test `DESCRIBE TABLE EXTENDED emits structured
  Catalog/Namespace/Table rows` in v2 `DescribeTableSuite` that
  pins the structured rows on a freshly created v2 table, including
  the `Database` row for the single-segment-namespace case.
- Removed the now-redundant `DESCRIBE TABLE EXTENDED on a non-view
  MetadataTable shows the real identifier` test in
  `DataSourceV2MetadataTableSuite` (the structured-row layout is
  what's pinned by the new test in `v2.DescribeTableSuite`; the
  identifier-passthrough behavior is no longer tied to
  `MetadataTable.name()`).

Ran:

  build/sbt 'sql/testOnly \
    org.apache.spark.sql.execution.command.v2.DescribeTableSuite \
    org.apache.spark.sql.connector.DataSourceV2MetadataTableSuite \
    org.apache.spark.sql.connector.DataSourceV2MetadataViewSuite'

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
@cloud-fan cloud-fan force-pushed the describe-table-view-structured-rows branch from a8548fe to 2d746f5 Compare April 30, 2026 14:17
…e as root

- Pin multi-segment behavior (Namespace dot-quoted, Database omitted) for v2
  tables in DescribeTableSuite.
- Pin structured Catalog/Namespace/View rows + multi-segment behavior for v2
  views in DescribeViewSuite.
- Document in DescribeTableExec that an empty namespace renders as an empty
  Namespace row (root namespace is intentional and canonical).

Co-authored-by: Isaac
Two new tests were titled "...and dot-quotes Namespace" but only verified the
dot-joined render of a 2-segment namespace; neither segment in the test
(`ns1`, `ns2`) actually triggers `quoteIfNeeded`, so the per-segment
back-tick quoting wasn't exercised. Rename to "joins Namespace with dots"
and add a parenthetical noting that quoteIfNeeded isn't exercised here.

Also annotate the inner DescribeTableExec construction in
DescribeTablePartitionExec to make explicit that catalogName / tableIdent
are passed for ctor completeness only on this path, since
addBaseDescription (the only method invoked there) reads neither field.

Co-authored-by: Isaac
Clarify that `catalogName` / `tableIdent` are unused because the call site
invokes `addBaseDescription` directly and never reaches `run()` /
`addTableDetails`, not because of `isExtended = false` per se.
…tion exec

Pulls schema/partitioning/clustering row formatting out of DescribeTableExec
into a shared `DescribeTableBaseRows` trait. DescribeTablePartitionExec now
mixes in the trait and calls addBaseDescription directly, so it no longer
constructs an inner DescribeTableExec just to invoke the helper -- and no
longer needs the unused `catalogName` constructor arg. Also tightens a test
comment and a Javadoc to read as current-state rather than referring to the
prior shape.
Replace the conditional "Database row only when namespace.length == 1"
with an unconditional emission: value is the trailing namespace segment
(or the empty string for a root-level entity). Multi-segment namespaces
now surface their leaf segment as `Database` instead of suppressing the
row, keeping the v1-compat row uniform across all v2 layouts. Update
the helper Scaladoc and the multi-segment table/view tests accordingly.

Co-authored-by: Isaac
@cloud-fan
Copy link
Copy Markdown
Contributor Author

the docker test failure is unrelated, thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in 3a685b1 May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants