Unstructured-IO · Paul-Cornell · Nov 13, 2024 · Nov 12, 2024 · Nov 13, 2024 · Nov 13, 2024
diff --git a/mint.json b/mint.json
@@ -449,6 +449,7 @@
               "pages": [
                 "platform/sources/overview",
                 "platform/sources/azure-blob-storage",
+                "platform/sources/databricks-volumes",
                 "platform/sources/google-cloud",
                 "platform/sources/s3",
                 "platform/sources/sharepoint"
@@ -460,6 +461,7 @@
               "platform/destinations/overview",
               "platform/destinations/astradb",
               "platform/destinations/azure-cognitive-search",
+              "platform/destinations/databricks-volumes",
               "platform/destinations/delta-table",
               "platform/destinations/google-cloud",
               "platform/destinations/milvus",

diff --git a/platform/destinations/databricks.mdx → platform/destinations/databricks-volumes.mdx b/platform/destinations/databricks.mdx → platform/destinations/databricks-volumes.mdx
@@ -6,21 +6,21 @@ Send processed data from Unstructured to Databricks Volumes.
 
 You'll need:
 
-import DatabricksPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
+import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
 
-<DatabricksPrerequisites />
+<DatabricksVolumesPrerequisites />
 
 To create the destination connector:
 
 1. On the sidebar, click **Connectors**.
 2. Click **Destinations**.
 3. Click **Add new**.
 4. Give the connector some unique **Name**.
-5. In the **Provider** area, click **Databricks**.
+5. In the **Provider** area, click **Databricks Volumes**.
 6. Click **Continue**. 
 7. Follow the on-screen instructions to fill in the fields as described later on this page.
 8. Click **Save and Test**.
 
-import DatabricksFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
+import DatabricksVolumesFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
 
-<DatabricksFields />
+<DatabricksVolumesFields />
diff --git a/platform/sources/databricks-volumes.mdx b/platform/sources/databricks-volumes.mdx
@@ -0,0 +1,26 @@
+---
+title: Databricks Volumes 
+---
+
+Ingest your files into Unstructured from Databricks Volumes.
+
+You'll need: 
+
+import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
+
+<DatabricksVolumesPrerequisites />
+
+To create the source connector:
+
+1. On the sidebar, click **Connectors**.
+2. Click **Sources**.
+3. Click **Add new**.
+4. Give the connector some unique **Name**.
+5. In the **Provider** area, click **Databricks Volumes**.
+6. Click **Continue**. 
+7. Follow the on-screen instructions to fill in the fields as described later on this page.
+8. Click **Save and Test**.
+
+import DatabricksVolumesFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
+
+<DatabricksVolumesFields />
diff --git a/snippets/general-shared-text/databricks-volumes-platform.mdx b/snippets/general-shared-text/databricks-volumes-platform.mdx
@@ -2,35 +2,25 @@ Fill in the following fields:
 
 - **Name** (_required_): A unique name for this connector.
 - **Host** (_required_): The Databricks workspace host URL.
-- **Cluster ID** : The Databricks cluster ID.
 - **Catalog** (_required_): The name of the catalog to use.
 - **Schema** : The name of the associated schema. If not specified, **default** is used.
 - **Volume** (_required_): The name of the associated volume.
 - **Volume Path** : Any optional path to access within the volume.
-- **Overwrite** Check this box if existing data should be overwritten.
-- **Encoding** : Any encoding to be applied to the data in the volume. If not specified, **utf-8**, is used. 
+- **Client ID** (_required_): The application ID value for the Databricks-managed service principal that has access to the volume.
+- **Client Secret** (_required_): The associated OAuth secret value for the Databricks-managed service principal that has access to the volume.
 
-Also fill in the following fields based on your authentication type, depending on your cloud provider:
+To learn how to create a Databricks-managed service principal, get its application ID, and generate an associated OAuth secret, 
+see the documentation for 
+[AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html), 
+[Azure](https://learn.microsoft.com/databricks/dev-tools/auth/oauth-m2m), 
+or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html).
 
-- For Databricks personal access token authentication (AWS, Azure, and GCP):
+For Azure, only Databricks-managed service principals are supported. Microsoft Entra ID-managed service principals are not supported.
 
-  - **Token** : The Databricks personal access token value.
-
-- For username and password (basic) authentication (AWS only):
-
-  - **Username** : The Databricks username value.
-  - **Password** : The associated Databricks password value.
-
-The following authentication types are currently not supported:
-
-- OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP).
-- OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP).
-- Azure managed identities (MSI) authentication (Azure only).
-- Microsoft Entra ID service principal authentication (Azure only).
-- Azure CLI authentication (Azure only).
-- Microsoft Entra ID user authentication (Azure only).
-- Google Cloud Platform credentials authentication (GCP only).
-- Google Cloud Platform ID authentication (GCP only).
+To learn how to grant a Databricks-managed service principal access to a volume, see the documentation for 
+[AWS](https://docs.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume), 
+[Azure](https://learn.microsoft.com/azure/databricks/volumes/utility-commands#change-permissions-on-a-volume), 
+or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume).
 
 
 
diff --git a/snippets/general-shared-text/databricks-volumes.mdx b/snippets/general-shared-text/databricks-volumes.mdx
@@ -10,6 +10,11 @@ allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; pic
 allowfullscreen
 ></iframe>
 
+The preceding video shows how to use Databricks personal access tokens (PATs), which are supported only for [Unstructured Ingest](/ingestion/overview).
+
+To learn how to use Databricks-managed service principals, which are supported by both the [Unstructured Platform](/platform/overview) and Unstructured Ingest, 
+see the additional videos later on this page.
+
 - The Databricks workspace URL. Get the workspace URL for 
   [AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids), 
   [Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids), 
@@ -21,17 +26,39 @@ allowfullscreen
   - Azure: `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`
   - GCP: `https://<workspace-id>.<random-number>.gcp.databricks.com`
 
-- The Databricks compute resource's ID. Get the compute resource ID for 
-  [AWS](https://docs.databricks.com/integrations/compute-details.html), 
-  [Azure](https://learn.microsoft.com/azure/databricks/integrations/compute-details), 
-  or [GCP](https://docs.gcp.databricks.com/integrations/compute-details.html).
-
 - The Databricks authentication details. For more information, see the documentation for 
   [AWS](https://docs.databricks.com/dev-tools/auth/index.html), 
   [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/), 
   or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html).
 
-  More specifically, you will need:
+  The following videos show how to create a Databricks-managed service principal and then grant it access to a Databricks volume:
+
+  <iframe
+  width="560"
+  height="315"
+  src="https://www.youtube.com/embed/wBmqv5DaA1E"
+  title="YouTube video player"
+  frameborder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowfullscreen
+  ></iframe>
+
+  <iframe
+  width="560"
+  height="315"
+  src="https://www.youtube.com/embed/DykQRxgh2aQ"
+  title="YouTube video player"
+  frameborder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowfullscreen
+  ></iframe>
+
+  For the [Unstructured Platform](/platform/overview), only the following Databricks authentication type is supported:
+
+  - For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP): The client ID and OAuth secret values for the corresponding service principal.
+    Note that for Azure, only Databricks-managed service principals are supported. Microsoft Entra ID-managed service principals are not supported.
+
+  For [Unstructured Ingest](/ingestion/overview), the following Databricks authentication types are supported:
 
   - For Databricks personal access token authentication (AWS, Azure, and GCP): The personal access token's value.
   - For username and password (basic) authentication (AWS only): The user's name and password values.
@@ -44,6 +71,10 @@ allowfullscreen
   - For Google Cloud Platform credentials authentication (GCP only): The local path to the corresponding Google Cloud service account's credentials file.
   - For Google Cloud Platform ID authentication (GCP only): The Google Cloud service account's email address.
 
-- The Databricks catalog name for the Volume. Get the catalog name for [AWS](https://docs.databricks.com/catalogs/manage-catalog.html), [Azure](https://learn.microsoft.com/azure/databricks/catalogs/manage-catalog), or [GCP](https://docs.gcp.databricks.com/catalogs/manage-catalog.html).
-- The Databricks schema name for the Volume. Get the schema name for [AWS](https://docs.databricks.com/schemas/manage-schema.html), [Azure](https://learn.microsoft.com/azure/databricks/schemas/manage-schema), or [GCP](https://docs.gcp.databricks.com/schemas/manage-schema.html).
-- The Databricks Volume name, and optionally any path in that Volume that you want to access directly. Get the Volume information for [AWS](https://docs.databricks.com/files/volumes.html), [Azure](https://learn.microsoft.com/azure/databricks/files/volumes), or [GCP](https://docs.gcp.databricks.com/files/volumes.html).
+- The Databricks catalog name for the volume. Get the catalog name for [AWS](https://docs.databricks.com/catalogs/manage-catalog.html), [Azure](https://learn.microsoft.com/azure/databricks/catalogs/manage-catalog), or [GCP](https://docs.gcp.databricks.com/catalogs/manage-catalog.html).
+- The Databricks schema name for the volume. Get the schema name for [AWS](https://docs.databricks.com/schemas/manage-schema.html), [Azure](https://learn.microsoft.com/azure/databricks/schemas/manage-schema), or [GCP](https://docs.gcp.databricks.com/schemas/manage-schema.html).
+- The Databricks volume name, and optionally any path in that volume that you want to access directly. Get the volume information for [AWS](https://docs.databricks.com/files/volumes.html), [Azure](https://learn.microsoft.com/azure/databricks/files/volumes), or [GCP](https://docs.gcp.databricks.com/files/volumes.html).
+- Make sure that the target user or service principal has access to the target volume. To learn more, see the documentation for 
+  [AWS](https://docs.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume), 
+  [Azure](https://learn.microsoft.com/azure/databricks/volumes/utility-commands#change-permissions-on-a-volume), 
+  or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume).