Skip to content

[SPARK-55645][SQL] Add serdeName to CatalogStorageFormat#54467

Closed
tagatac wants to merge 1 commit intoapache:masterfrom
tagatac:serde-name
Closed

[SPARK-55645][SQL] Add serdeName to CatalogStorageFormat#54467
tagatac wants to merge 1 commit intoapache:masterfrom
tagatac:serde-name

Conversation

@tagatac
Copy link
Contributor

@tagatac tagatac commented Feb 25, 2026

What changes were proposed in this pull request?

  • Add serdeName to org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat.
  • Include this field when responding to DESCRIBE EXTENDED queries.
  • Handle this field when parsing table details from the Hive Metastore API and when writing back to it.

Why are the changes needed?

  • This field is included in SerDeInfo returned by the Hive Metastore API.
  • Its omission in the internal representation of Hive tables makes it cumbersome to consume this field.

Before this change:

  private def hasExampleSerdeName(h: HiveTableRelation): Boolean = {
    val key = (h.tableMeta.database, h.tableMeta.identifier.table)
    serdeNameCache.computeIfAbsent(key, _ => {
      val catalog = session.sharedState.externalCatalog.unwrapped
        .asInstanceOf[HiveExternalCatalog]
      catalog.client.getRawHiveTableOption(key._1, key._2).exists { rawHiveTable =>
        // Use reflection to access SerDeInfo.name across classloader boundaries,
        // so that this works even when spark.sql.hive.metastore.jars is configured.
        val rawTable = rawHiveTable.rawTable
        val tTable = rawTable.getClass.getMethod("getTTable").invoke(rawTable)
        val sd = tTable.getClass.getMethod("getSd").invoke(tTable)
        val serdeInfo = sd.getClass.getMethod("getSerdeInfo").invoke(sd)
        val name = serdeInfo.getClass.getMethod("getName").invoke(serdeInfo)
        name == ExampleSerdeInfoName
      }
    })
  }

After this change:

  private def hasExampleSerdeName(h: HiveTableRelation): Boolean = {
    h.tableMeta.storage.serdeName.contains(ExampleSerdeInfoName)
  }

Does this PR introduce any user-facing change?

Yes, developers can now access CatalogStorageFormat.serdeName, representing the Hive Metastore API field SerDeInfo.name, when interacting with Spark representations of Hive tables.

How was this patch tested?

  • Unit test added.
  • DESCRIBE EXTENDED run via spark-shell returns "Serde Name" properly for a Hive table with a Serde name:
scala> spark.sql("CREATE TABLE t (d1 DECIMAL(10,3), d2 STRING) STORED AS TEXTFILE;").show()
++
||
++
++
scala> spark.sql("DESCRIBE EXTENDED t;").show()
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                  d1|       decimal(10,3)|   NULL|
|                  d2|              string|   NULL|
|                    |                    |       |
|# Detailed Table ...|                    |       |
...
|            Location|file:/local/home/...|       |
|       Serde Library|org.apache.hadoop...|       |
...
+--------------------+--------------------+-------+
scala> import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.TableIdentifier
scala> val hiveTable = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t", Some("default")))
val hiveTable: org.apache.spark.sql.catalyst.catalog.CatalogTable =
...
scala> val updated = hiveTable.copy(storage = hiveTable.storage.copy(serdeName = Some("testSerdeName")))
val updated: org.apache.spark.sql.catalyst.catalog.CatalogTable =
...
scala> spark.sessionState.catalog.alterTable(updated)
scala> spark.sql("DESCRIBE EXTENDED t;").show()
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                  d1|       decimal(10,3)|   NULL|
|                  d2|              string|   NULL|
|                    |                    |       |
|# Detailed Table ...|                    |       |
...
|            Location|file:/local/home/...|       |
|          Serde Name|       testSerdeName|       |
|       Serde Library|org.apache.hadoop...|       |
...
+--------------------+--------------------+-------+

Was this patch authored or co-authored using generative AI tooling?

No.

This contribution is my original work, and I license the work to the Spark project under the project’s open source license.

@tagatac
Copy link
Contributor Author

tagatac commented Feb 25, 2026

Cc: @asl3 @pan3793 @yaooqinn @cloud-fan

@sarutak
Copy link
Member

sarutak commented Feb 25, 2026

This can be helpful for extension developers. What do you think @pan3793 @yaooqinn ?

@pan3793
Copy link
Member

pan3793 commented Feb 25, 2026

I'm not experienced in the use of serdeName, what's the typical use case of it?

@sarutak
Copy link
Member

sarutak commented Feb 25, 2026

@pan3793
Usually, serdeName is not set but some systems set this field as a marker to tell something to readers.
One possible use case is converting HiveTableRelation to another datasource using custom extensions like RelationConversions does. serdeName can be used as a hint.

@pan3793
Copy link
Member

pan3793 commented Feb 25, 2026

some systems set this field as a marker to tell something to readers

@sarutak I'm not sure it's a good idea, we generally use table/serde properties to store custom info. Was it already used in any famous project? Either ASF project or commercial project.

But meanwhile, I also don't see a reason to disallow that. I'm neutral on this change.

@sarutak
Copy link
Member

sarutak commented Feb 25, 2026

@pan3793 I don't know existing OSS projects that uses serdeName but this field is arbitrarily set by end users through Hive's metastore API. So, if existing Hive table has serdeName set by users and it's not feasible to modify table properties, serdeName can be helpful.

@tagatac tagatac force-pushed the serde-name branch 2 times, most recently from a1f49db to 70508e5 Compare February 25, 2026 18:37
Copy link
Contributor Author

@tagatac tagatac Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note these changes to convertStorageFormat that I missed in the original PR, matching the intent of clearing the SerDe info when converting the storage format.
@yaooqinn @pan3793 @sarutak

@pan3793
Copy link
Member

pan3793 commented Feb 26, 2026

+0. I have no more comments.

@sarutak sarutak closed this in 555cd38 Feb 27, 2026
@sarutak
Copy link
Member

sarutak commented Feb 27, 2026

Merged to master. Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants