feat: don't force db connect if using serverless#3781
Merged
Conversation
treysp
approved these changes
Feb 4, 2025
| SQLMesh's Databricks Connect implementation supports Databricks Runtime 13.0 or higher. If SQLMesh detects that you have Databricks Connect installed, then it will use it for all Python models (both Pandas and PySpark DataFrames). | ||
| If SQLMesh detects that you have Databricks Connect installed, then it will automatically configure the connection and use it for all Python models that return a Pandas or PySpark DataFrame. | ||
|
|
||
| To have databricks-connect installed but ignored by SQLMesh, set `disable_databricks_connect` to `true` in the connection configuration. |
Contributor
There was a problem hiding this comment.
When/why would someone want to have it installed but ignored?
Just needing installed in the env for a non-sqlmesh reason, or people would switch back and forth between using it in sqlmesh and not?
Contributor
Author
There was a problem hiding this comment.
Just needing installed in the env for a non-sqlmesh reason
Yeah this is what I am thinking of. They use a single python environment for their work and therefore they don't want to uninstall it just so SQLMesh behaves as they expect.
a8ccac5 to
0619fd3
Compare
0619fd3 to
9977f31
Compare
izeigerman
added a commit
that referenced
this pull request
Feb 4, 2025
This reverts commit 591645c.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prior to this PR, if a user said they wanted to use Serverless for databricks-connect then it forced the use of databricks-connect and therefore one could not use the python sql connector. In addition the documentation said that the SQL connector did not support Databricks Serverless Compute which was misleading - although it doesn't support the workspace side Serverless, typically used by Notebooks and Jobs, it does support SQL Warehouse Serverless compute.
Therefore a user could have wanted to use serverless across their stack - Serverless compute for jobs that require PySpark DataFrame and SQL Warehouse Serverless for their SQL queries. This PR now enables this pattern.
One key limitation it works around was temporary objects - since serverless doesn't support global temporary objects, and instead requires session temporary objects, there was an issue mixing databricks-connect and Python SQL connector across the serverless products since they couldn't share this state. This PR resolves this by recording in session connection metadata if a temporary objects was made in a databricks-connect session and if so it will force using databricks-connect for the remainder of the session.
This PR also adds improvements to documentation, removes excess log output in the console, and improves error message if the user has different default catalogs across their SQL and databricks-connect sessions.
Initial PR that added serverless support for context: #3001