Support fetching all schemas using a single SQL command#953
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for fetching all schemas using a single SQL command SHOW SCHEMAS IN ALL CATALOGS in the SQL Execution API mode of the JDBC driver. The change addresses a performance regression compared to the Thrift mode when the catalog is specified as null in getSchemas(catalog, schema).
Key changes:
- Implements single SQL command approach for fetching schemas across all catalogs using
SHOW SCHEMAS IN ALL CATALOGS - Adds fallback mechanism to the previous parallel approach for older DBR versions that don't support the new syntax
- Introduces
SchemasDatabricksResultSetAdapterto handle column mapping for different result set formats
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
DatabricksMetadataSdkClient.java |
Implements the new single SQL command approach with fallback logic for older DBR versions |
CommandBuilder.java |
Adds support for building SHOW SCHEMAS IN ALL CATALOGS SQL command |
SchemasDatabricksResultSetAdapter.java |
New adapter class for handling column mapping between different result set formats |
MetadataResultSetBuilder.java |
Updates schema result building to use the new adapter pattern |
DatabricksDatabaseMetaData.java |
Removes the parallel catalog fetching logic and makes thread pool utility public |
CommandConstants.java |
Adds new SQL command constants for schema operations |
WildcardUtil.java |
Adds utility method for checking null or wildcard patterns |
MetadataResultConstants.java |
Makes CATALOG_FULL_COLUMN public for adapter usage |
| Test files | Updates and adds tests for the new functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
samikshya-db
left a comment
There was a problem hiding this comment.
minor comments, overall LGTM. Thanks!
| String SQL = commandBuilder.getSQLString(CommandName.LIST_SCHEMAS); | ||
| LOGGER.debug("SQL command to fetch schemas: {}", SQL); | ||
| return metadataResultSetBuilder.getSchemasResult(getResultSet(SQL, session), catalog); | ||
| try { |
There was a problem hiding this comment.
This will make it fail for all customers until the new DBSQL release is rolled out. We will unnecessarily add an extra hop until then to first go through the exception flow and then fallback. Can we do something intelligently?
There was a problem hiding this comment.
Discussed offline with using some signals, please check if that looks good to you.
There was a problem hiding this comment.
We are further evaluating this. I will make changes separately as per the conclusion.
Signed-off-by: Jayant Singh <jayant.singh@databricks.com>
…emas-all-catalogs
Description
This is concerning the SQL Execution API mode of the OSS JDBC driver (
useThriftClient=0). Earlier when the catalog was specified as null in getSchemas(catalog, schema), JDBC used to fetch all catalogs followed by fetching schemas across each catalog in parallel. This imposed a performance regression as compared to the Thrift mode of the OSS JDBC (useThriftClient=1/default-mode).This PR executes the getSchemas(null, schema) using the SQL command
SHOW SCHEMAS IN ALL CATALOGSintroduced in DBR 17.x (https://docs.databricks.com/aws/en/release-notes/runtime/17.0#support-all-catalogs-in-show-schemas). Runtime PR: https://github.com/databricks-eng/runtime/pull/161811. The changes are scheduled to flow to OSS Spark https://github.com/apache/spark/blob/5660dbadf90ed08faef6dc883fd98f55b098e96a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowNamespacesCommand.scala.For earlier DBR versions where the SQL syntax is not supported, JDBC fallbacks to earlier approach of fetching catalogs and schemas across each catalog in parallel.
Testing
Additional Notes to the Reviewer