[SPARK-50541][SQL] Describe Table As JSON #49139

asl3 · 2024-12-11T01:19:44Z

What changes were proposed in this pull request?

Support DESCRIBE TABLE ... [AS JSON] to optionally display table metadata in JSON format.

SQL Ref Spec:

{ DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name { [ PARTITION clause ] | [ column_name ] } [ AS JSON ]

Output:
json_metadata: String

Why are the changes needed?

The Spark SQL command DESCRIBE TABLE displays table metadata in a DataFrame format geared toward human consumption. This format causes parsing challenges, e.g. if fields contain special characters or the format changes as new features are added.
The new AS JSON option would return the table metadata as a JSON string that supports parsing via machine, while being extensible with a minimized risk of breaking changes. It is not meant to be human-readable.

Does this PR introduce any user-facing change?

Yes, this provides a new option to display DESCRIBE TABLE metadata in JSON format. See below (and updated golden files) for the JSON output schema:

{
      "table_name": "<table_name>",
      "catalog_name": "<catalog_name>",
      "schema_name": "<innermost_schema_name>",
      "namespace": ["<innermost_schema_name>"],
      "type": "<table_type>",
      "provider": "<provider>",
      "columns": [
        {
          "name": "<name>",
          "type": <type_json>,
          "comment": "<comment>",
          "nullable": <boolean>,
          "default": "<default_val>"
        }
      ],
      "partition_values": {
        "<col_name>": "<val>"
      },
      "location": "<path>",
      "view_text": "<view_text>",
      "view_original_text": "<view_original_text>",
      "view_schema_mode": "<view_schema_mode>",
      "view_catalog_and_namespace": "<view_catalog_and_namespace>",
      "view_query_output_columns": ["col1", "col2"],
      "owner": "<owner>",
      "comment": "<comment>",
      "table_properties": {
        "property1": "<property1>",
        "property2": "<property2>"
      },
      "storage_properties": {
        "property1": "<property1>",
        "property2": "<property2>"
      },
      "serde_library": "<serde_library>",
      "input_format": "<input_format>",
      "output_format": "<output_format>",
      "num_buckets": <num_buckets>,
      "bucket_columns": ["<col_name>"],
      "sort_columns": ["<col_name>"],
      "created_time": "<timestamp_ISO-8601>",
      "last_access": "<timestamp_ISO-8601>",
      "partition_provider": "<partition_provider>"
}

How was this patch tested?

Updated golden files for describe.sql
Added tests in DescribeTableParserSuite.scala, DescribeTableSuite.scala, PlanResolutionSuite.scala

Was this patch authored or co-authored using generative AI tooling?

common/utils/src/main/resources/error/error-conditions.json

sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala

docs/sql-ref-syntax-aux-describe-table.md

sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

docs/sql-ref-syntax-aux-describe-table.md

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

docs/sql-ref-syntax-aux-describe-table.md

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

docs/sql-ref-syntax-aux-describe-table.md

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

cloud-fan · 2025-01-07T08:22:35Z

thanks, merging to master!

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala

dongjoon-hyun · 2025-01-07T16:23:14Z

I made a follow-up.

[SPARK-50541][SQL][TESTS][FOLLOWUP] Use SPARK_VERSION instead of hard-coded version strings #49401

…rd-coded version strings ### What changes were proposed in this pull request? This is a follow-up to use `SPARK_VERSION` instead of hard-coded version strings. - #49139 ### Why are the changes needed? Hard-coded version strings will cause unit test failures from next week during Apache Spark 4.0.0 RC and maintenance releases like 4.0.1-SNAPSHOT. **BEFORE** ``` $ git grep 'created_by = Some("Spark ' sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala: created_by = Some("Spark 4.0.0-SNAPSHOT"), sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala: created_by = Some("Spark 4.0.0-SNAPSHOT"), sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala: created_by = Some("Spark 4.0.0-SNAPSHOT"), sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala: created_by = Some("Spark 4.0.0-SNAPSHOT"), sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala: created_by = Some("Spark 4.0.0-SNAPSHOT"), ``` **AFTER** ``` $ git grep 'created_by = Some("Spark ' $ ``` ### Does this PR introduce _any_ user-facing change? No, this is a test-case fix. ### How was this patch tested? Pass the CIs and check manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49401 from dongjoon-hyun/SPARK-50541. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This is a follow-up of #49139 to use v2 command to simplify the code. Now we only need one logical plan and all the implementation is centralized to that logical plan, no need to touch other analyzer/planner rules. ### Why are the changes needed? code simplification ### Does this PR introduce _any_ user-facing change? no, this feature is not released yet. ### How was this patch tested? update tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #49466 from cloud-fan/as-json. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

asl3 added 7 commits December 10, 2024 13:56

describe table as json

c28766c

merge master

64e2f19

raise error for describe col as json

26d94fd

test error for describe col as json

92c0e36

update golden files

3cc6124

Add JSON schema to docs

a99f2a5

cleanup

0e3a158

github-actions bot added SQL DOCS labels Dec 11, 2024

MaxGekk requested changes Dec 11, 2024

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala Outdated Show resolved Hide resolved

asl3 added 6 commits December 11, 2024 11:27

subclass describe col as json unsupported error

1f53d37

json replacement helper

aebf603

add json to non-reserved keywords

c9a28ac

fmt fix

1d78a14

planresolutionsuite update

00ecf4b

document keyword:

b973a83