New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks workspace setup docs #949
Conversation
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just requested some minor changes.
To use the Databricks destination, you need: | ||
|
||
* A Databricks workspace with a Unity Catalog metastore connected | ||
* A Gen 2 Azure storage account and container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be an ADLS Gen2 storage account.
|
||
2. Create a storage account | ||
|
||
Search for "Storage accounts" in the Azure Portal and create a new storage account. Make sure to select "StorageV2 (general purpose v2)" as the account kind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this step the user needs to make sure he is creating an ADLS Gen 2 storage account (a specific type of Standard general-purpose v2 storage account), which can be done by enabling the hierarchical namespace. Probably best to just refer to Azure's docs for this: https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account
4. Create an Access Connector for Azure Databricks | ||
|
||
This will allow Databricks to access your storage account. | ||
In the Azure Portal search for "Access Connector for Azure Databricks" and create a new connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a note here that users can also use the Access Connector for Azure Databricks that gets created by default in the Databricks managed resource group when creating a new Databricks workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do I find this? Guess I'm lacking permissions 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
4. Go back to your workspace and click on "Compute" in the left-hand menu | ||
|
||
5. Create a new cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? It seems the implementation uses a SQL warehouse, not a Spark cluster.
@@ -32,7 +110,9 @@ This will install dlt with **databricks** extra which contains Databricks Python | |||
|
|||
This should have your connection parameters and your personal access token. | |||
|
|||
It should now look like: | |||
You will find your server hostname and HTTP path in the your cluster settings -> Advanced Options -> JDBC/ODBC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's indeed using a SQL warehouse and not a Spark cluster, then this should be something like: "go to your SQL warehouse -> Connection details".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The cluster was working, but overkill in this case.
Is the default warehouse created automatically? It was already there in the account when I took over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed created automatically.
https://learn.microsoft.com/en-us/azure/databricks/compute/sql-warehouse/create-sql-warehouse#what-is-a-sql-warehouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks everyone for improving our docs. Also all Jorit's remarks seem to be addressed
Description
Related Issues
Additional Context