Unstructured-IO · Paul-Cornell · Oct 3, 2024 · Oct 16, 2024 · Oct 16, 2024 · Oct 17, 2024
diff --git a/platform/connectors.mdx b/platform/connectors.mdx
@@ -12,6 +12,7 @@ The Unstructured Platform supports connecting to the following source and destin
 ## Sources
 
 - [Azure](/platform/sources/azure-blob-storage)
+- [Databricks Volumes](/platform/sources/databricks)
 - [S3](/platform/sources/s3)
 
 If your source is not listed here, you might still be able to connect Unstructured to it through scripts or code by using the 
@@ -22,6 +23,7 @@ If your source is not listed here, you might still be able to connect Unstructur
 ## Destinations
 
 - [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
+- [Databricks Volumes](/platform/destinations/databricks)
 - [Pinecone](/platform/destinations/pinecone)
 - [S3](/platform/destinations/s3)
 

diff --git a/platform/destinations/databricks.mdx b/platform/destinations/databricks.mdx
@@ -12,12 +12,14 @@ import DatabricksPrerequisites from '/snippets/general-shared-text/databricks-vo
 
 To create the destination connector:
 
-1. On the sidebar, click **Destinations**.
-2. Click **New Destination**.
-3. In the **Type** drop-down list, select **Databricks**.
-4. Fill in the fields as described later on this page.
-5. Click **Save and Test**.
-6. Click **Close**.
+1. On the sidebar, click **Connectors**.
+2. Click **Destinations**.
+3. Click **Add new**.
+4. Give the connector some unique **Name**.
+5. In the **Provider** area, click **Databricks**.
+6. Click **Continue**. 
+7. Follow the on-screen instructions to fill in the fields as described later on this page.
+8. Click **Save and Test**.
 
 import DatabricksFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
 

diff --git a/platform/destinations/overview.mdx b/platform/destinations/overview.mdx
@@ -15,6 +15,7 @@ To create a destination connector:
 4. Fill in the fields according to your connector type. To learn how, click your connector type in the following list:
 
    - [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
+   - [Databricks Volumes](/platform/destinations/databricks)
    - [Pinecone](/platform/destinations/pinecone)
    - [S3](/platform/destinations/s3)
 

diff --git a/platform/sources/databricks.mdx b/platform/sources/databricks.mdx
@@ -0,0 +1,26 @@
+---
+title: Databricks Volumes
+---
+
+Ingest your files into Unstructured from Databricks Volumes.
+
+You'll need:
+
+import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
+
+<DatabricksVolumesPrerequisites />
+
+To create the source connector:
+
+1. On the sidebar, click **Connectors**.
+2. Click **Sources**.
+3. Click **Add new**.
+4. Give the connector some unique **Name**.
+5. In the **Provider** area, click **Databricks**.
+6. Click **Continue**. 
+7. Follow the on-screen instructions to fill in the fields as described later on this page.
+8. Click **Save and Test**.
+
+import DatabricksVolumesFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
+
+<DatabricksVolumesFields />
diff --git a/platform/sources/overview.mdx b/platform/sources/overview.mdx
@@ -16,6 +16,7 @@ To create a source connector:
 4. Fill in the fields according to your connector type. To learn how, click your connector type in the following list:
 
    - [Azure](/platform/sources/azure-blob-storage)
+   - [Databricks Volumes](/platform/sources/databricks)
    - [S3](/platform/sources/s3)
 
 5. Click **Save and Test**.

diff --git a/snippets/general-shared-text/databricks-volumes-cli-api.mdx b/snippets/general-shared-text/databricks-volumes-cli-api.mdx
@@ -11,7 +11,6 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
 The following environment variables:
 
 - `DATABRICKS_HOST` - The Databricks host URL, represented by `--host` (CLI) or `host` (Python).
-- `DATABRICKS_CLUSTER_ID` - The Databricks compute resource ID, represented by `--cluster-id` (CLI) or `cluster_id` (Python).
 - `DATABRICKS_CATALOG` - The Databricks catalog name for the Volume, represented by `--catalog` (CLI) or `catalog` (Python).
 - `DATABRICKS_SCHEMA` - The Databricks schema name for the Volume, represented by `--schema` (CLI) or `schema` (Python). If not specified, `default` is used.
 - `DATABRICKS_VOLUME` - The Databricks Volume name, represented by `--volume` (CLI) or `volume` (Python).

diff --git a/snippets/general-shared-text/databricks-volumes-platform.mdx b/snippets/general-shared-text/databricks-volumes-platform.mdx
@@ -2,35 +2,22 @@ Fill in the following fields:
 
 - **Name** (_required_): A unique name for this connector.
 - **Host** (_required_): The Databricks workspace host URL.
-- **Cluster ID** : The Databricks cluster ID.
 - **Catalog** (_required_): The name of the catalog to use.
 - **Schema** : The name of the associated schema. If not specified, **default** is used.
 - **Volume** (_required_): The name of the associated volume.
 - **Volume Path** : Any optional path to access within the volume.
 - **Overwrite** Check this box if existing data should be overwritten.
-- **Encoding** : Any encoding to be applied to the data in the volume. If not specified, **utf-8**, is used. 
+- **Encoding** : Any encoding to be applied to the data in the volume. If not specified, **utf-8** is used. 
 
 Also fill in the following fields based on your authentication type, depending on your cloud provider:
 
-- For Databricks personal access token authentication (AWS, Azure, and GCP):
-
-  - **Token** : The Databricks personal access token value.
-
-- For username and password (basic) authentication (AWS only):
-
-  - **Username** : The Databricks username value.
-  - **Password** : The associated Databricks password value.
-
-The following authentication types are currently not supported:
-
-- OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP).
-- OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP).
-- Azure managed identities (MSI) authentication (Azure only).
-- Microsoft Entra ID service principal authentication (Azure only).
-- Azure CLI authentication (Azure only).
-- Microsoft Entra ID user authentication (Azure only).
-- Google Cloud Platform credentials authentication (GCP only).
-- Google Cloud Platform ID authentication (GCP only).
+- For Databricks personal access token authentication (AWS, Azure, and GCP) or for 
+  Microsoft Entra ID user authentication (Azure only):
 
+  - **Token** : The Databricks personal access token value (for AWS, Azure, and GCP) or the 
+    Microsoft Entra ID token value (Azure only).
 
+- For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP):
 
+  - **Client ID** : The service principal's client (application) ID value.
+  - **Client Secret** : The associated Databricks OAuth client secret value.
diff --git a/snippets/general-shared-text/databricks-volumes.mdx b/snippets/general-shared-text/databricks-volumes.mdx
@@ -1,5 +1,15 @@
 The Databricks Volumes prerequisites:
 
+  <iframe
+  width="560"
+  height="315"
+  src="https://www.youtube.com/embed/rNZpwa1-g7M"
+  title="YouTube video player"
+  frameborder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowfullscreen
+  ></iframe>
+
 - The Databricks workspace URL. Get the workspace URL for 
   [AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids), 
   [Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids), 
@@ -11,26 +21,27 @@ The Databricks Volumes prerequisites:
   - Azure: `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`
   - GCP: `https://<workspace-id>.<random-number>.gcp.databricks.com`
 
-- The Databricks compute resource's ID. Get the compute resource ID for 
-  [AWS](https://docs.databricks.com/integrations/compute-details.html), 
-  [Azure](https://learn.microsoft.com/azure/databricks/integrations/compute-details), 
-  or [GCP](https://docs.gcp.databricks.com/integrations/compute-details.html).
-
 - The Databricks authentication details. For more information, see the documentation for 
   [AWS](https://docs.databricks.com/dev-tools/auth/index.html), 
   [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/), 
   or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html).
 
-  More specifically, you will need:
+  More specifically, you will need the following authentication details.
+
+  The following authentication types are supported by both [Unstructured API services](/api-reference/api-services/overview) 
+  and the [Unstructured Platform](/platform/overview):
 
   - For Databricks personal access token authentication (AWS, Azure, and GCP): The personal access token's value.
-  - For username and password (basic) authentication (AWS only): The user's name and password values.
   - For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP): The client ID and OAuth secret values for the corresponding service principal.
+  - For Microsoft Entra ID user authentication (Azure only): The Entra ID token for the corresponding Entra ID user.
+
+  The following authentication types are supported only by Unstructured API services:
+
+  - For username and password (basic) authentication (AWS only): The user's name and password values.
   - For OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP): No additional values.
   - For Azure managed identities (MSI) authentication (Azure only): The client ID value for the corresponding managed identity.
   - For Microsoft Entra ID service principal authentication (Azure only): The tenant ID, client ID, and client secret values for the corresponding service principal.
   - For Azure CLI authentication (Azure only): No additional values.
-  - For Microsoft Entra ID user authentication (Azure only): The Entra ID token for the corresponding Entra ID user.
   - For Google Cloud Platform credentials authentication (GCP only): The local path to the corresponding Google Cloud service account's credentials file.
   - For Google Cloud Platform ID authentication (GCP only): The Google Cloud service account's email address.