Replies: 1 comment 1 reply
-
|
I was able to tweak the databricks connection to use M2M (using LLM help to modify it), however U2M seems to be unfeasible at this point, as we cannot dynamically generate the token. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Feature request
The Databricks SQL Connector endpoint officially supports using an OAuth Service Principal for integrations ("machine-to-machine"), see docs: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/python-sql-connector#auth-m2m (same for GCP and AWS).
For my use case, I want to set up a service principal to be used by a single database, independent of users. This would allow me to set up multiple sources using different OAuth2 credentials, and avoid having to rely on Databaricks PATs or other workarounds that require a "raw" token embedded in the SQLAlchemy URI.
Note
Supporting this use case would require installing the
databricks-sdklibrary, but given that this is already a requirement for each engine type, I don't see this as a blocker. The databricks engine can still be used without it, if the end-user does not wish to use M2M authentication.Implementation suggestions
In the current implementation, it would make sense to make use of the database "extra" arguments, adding in two additional fields (
client_idandclient_secretor something similar), which should be encrypted/masked (similar to how the token is handled currently). The host is already included in the sqlalchemy URI.However, the current workflow to create the SQLAlchemy engine only allows for raw data, see flow in https://github.com/apache/superset/blob/master/superset/models/core.py#L479. While there is a hook via DBEngineSpec.adjust_engine_params, it only includes current "connect_args".
My suggestion would be to add a new hook to the engine specs that would allow it to "compute" additional connect args. The default would do nothing (
return {}), and the databricks implementation would check if the required fields were set and use that to create thecredential_provider(see docs example).As I am not familiar with the full database/engine spec and integration, please add in any other parts that would be required to make this happen. I would be happy to make a PR with a first implementation to make this possible.
Beta Was this translation helpful? Give feedback.
All reactions