Skip to content

Support fetching all schemas using a single SQL command#953

Merged
jayantsing-db merged 5 commits into
databricks:mainfrom
jayantsing-db:jayantsing-db/schemas-all-catalogs
Sep 2, 2025
Merged

Support fetching all schemas using a single SQL command#953
jayantsing-db merged 5 commits into
databricks:mainfrom
jayantsing-db:jayantsing-db/schemas-all-catalogs

Conversation

@jayantsing-db
Copy link
Copy Markdown
Collaborator

Description

This is concerning the SQL Execution API mode of the OSS JDBC driver (useThriftClient=0). Earlier when the catalog was specified as null in getSchemas(catalog, schema), JDBC used to fetch all catalogs followed by fetching schemas across each catalog in parallel. This imposed a performance regression as compared to the Thrift mode of the OSS JDBC (useThriftClient=1/default-mode).

This PR executes the getSchemas(null, schema) using the SQL command SHOW SCHEMAS IN ALL CATALOGS introduced in DBR 17.x (https://docs.databricks.com/aws/en/release-notes/runtime/17.0#support-all-catalogs-in-show-schemas). Runtime PR: https://github.com/databricks-eng/runtime/pull/161811. The changes are scheduled to flow to OSS Spark https://github.com/apache/spark/blob/5660dbadf90ed08faef6dc883fd98f55b098e96a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowNamespacesCommand.scala.

For earlier DBR versions where the SQL syntax is not supported, JDBC fallbacks to earlier approach of fetching catalogs and schemas across each catalog in parallel.

Testing

  • e2e testing
  • Repo fake service tests
  • Unit tests

Additional Notes to the Reviewer

@jayantsing-db jayantsing-db requested a review from Copilot August 20, 2025 19:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for fetching all schemas using a single SQL command SHOW SCHEMAS IN ALL CATALOGS in the SQL Execution API mode of the JDBC driver. The change addresses a performance regression compared to the Thrift mode when the catalog is specified as null in getSchemas(catalog, schema).

Key changes:

  • Implements single SQL command approach for fetching schemas across all catalogs using SHOW SCHEMAS IN ALL CATALOGS
  • Adds fallback mechanism to the previous parallel approach for older DBR versions that don't support the new syntax
  • Introduces SchemasDatabricksResultSetAdapter to handle column mapping for different result set formats

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
DatabricksMetadataSdkClient.java Implements the new single SQL command approach with fallback logic for older DBR versions
CommandBuilder.java Adds support for building SHOW SCHEMAS IN ALL CATALOGS SQL command
SchemasDatabricksResultSetAdapter.java New adapter class for handling column mapping between different result set formats
MetadataResultSetBuilder.java Updates schema result building to use the new adapter pattern
DatabricksDatabaseMetaData.java Removes the parallel catalog fetching logic and makes thread pool utility public
CommandConstants.java Adds new SQL command constants for schema operations
WildcardUtil.java Adds utility method for checking null or wildcard patterns
MetadataResultConstants.java Makes CATALOG_FULL_COLUMN public for adapter usage
Test files Updates and adds tests for the new functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Copy Markdown
Collaborator

@samikshya-db samikshya-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, overall LGTM. Thanks!

Comment thread src/main/java/com/databricks/jdbc/common/util/WildcardUtil.java Outdated
String SQL = commandBuilder.getSQLString(CommandName.LIST_SCHEMAS);
LOGGER.debug("SQL command to fetch schemas: {}", SQL);
return metadataResultSetBuilder.getSchemasResult(getResultSet(SQL, session), catalog);
try {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it fail for all customers until the new DBSQL release is rolled out. We will unnecessarily add an extra hop until then to first go through the exception flow and then fallback. Can we do something intelligently?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with using some signals, please check if that looks good to you.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are further evaluating this. I will make changes separately as per the conclusion.

@jayantsing-db jayantsing-db enabled auto-merge (squash) September 2, 2025 04:10
@jayantsing-db jayantsing-db merged commit 9cecd23 into databricks:main Sep 2, 2025
12 of 13 checks passed
@jayantsing-db jayantsing-db deleted the jayantsing-db/schemas-all-catalogs branch September 2, 2025 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants