Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks workspace setup docs #949

Merged
merged 4 commits into from Feb 11, 2024
Merged

Conversation

steinitzu
Copy link
Collaborator

Description

Related Issues

  • Fixes #...
  • Closes #...
  • Resolves #...

Additional Context

Copy link

netlify bot commented Feb 8, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 7343fff
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/65c50ba6a94a7500089f19ae
😎 Deploy Preview https://deploy-preview-949--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Collaborator

@jorritsandbrink jorritsandbrink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I just requested some minor changes.

To use the Databricks destination, you need:

* A Databricks workspace with a Unity Catalog metastore connected
* A Gen 2 Azure storage account and container
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be an ADLS Gen2 storage account.


2. Create a storage account

Search for "Storage accounts" in the Azure Portal and create a new storage account. Make sure to select "StorageV2 (general purpose v2)" as the account kind.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this step the user needs to make sure he is creating an ADLS Gen 2 storage account (a specific type of Standard general-purpose v2 storage account), which can be done by enabling the hierarchical namespace. Probably best to just refer to Azure's docs for this: https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account

4. Create an Access Connector for Azure Databricks

This will allow Databricks to access your storage account.
In the Azure Portal search for "Access Connector for Azure Databricks" and create a new connector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a note here that users can also use the Access Connector for Azure Databricks that gets created by default in the Databricks managed resource group when creating a new Databricks workspace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do I find this? Guess I'm lacking permissions 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the managed resource group that gets created automatically when you create a Databricks workspace, but we don't have permissions to see it. We had a little chat about it yesterday:

image


4. Go back to your workspace and click on "Compute" in the left-hand menu

5. Create a new cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? It seems the implementation uses a SQL warehouse, not a Spark cluster.

@@ -32,7 +110,9 @@ This will install dlt with **databricks** extra which contains Databricks Python

This should have your connection parameters and your personal access token.

It should now look like:
You will find your server hostname and HTTP path in the your cluster settings -> Advanced Options -> JDBC/ODBC.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's indeed using a SQL warehouse and not a Spark cluster, then this should be something like: "go to your SQL warehouse -> Connection details".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The cluster was working, but overkill in this case.

Is the default warehouse created automatically? It was already there in the account when I took over.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks everyone for improving our docs. Also all Jorit's remarks seem to be addressed

@rudolfix rudolfix merged commit 9e35656 into devel Feb 11, 2024
44 checks passed
@rudolfix rudolfix deleted the sthor/databricks-workspace-docs branch February 11, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants