diff --git a/platform/connectors.mdx b/platform/connectors.mdx
index 41bef144..caeede32 100644
--- a/platform/connectors.mdx
+++ b/platform/connectors.mdx
@@ -12,6 +12,7 @@ The Unstructured Platform supports connecting to the following source and destin
## Sources
- [Azure](/platform/sources/azure-blob-storage)
+- [Databricks Volumes](/platform/sources/databricks)
- [S3](/platform/sources/s3)
If your source is not listed here, you might still be able to connect Unstructured to it through scripts or code by using the
@@ -22,6 +23,7 @@ If your source is not listed here, you might still be able to connect Unstructur
## Destinations
- [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
+- [Databricks Volumes](/platform/destinations/databricks)
- [Pinecone](/platform/destinations/pinecone)
- [S3](/platform/destinations/s3)
diff --git a/platform/destinations/databricks.mdx b/platform/destinations/databricks.mdx
index 90dea652..7a9402aa 100644
--- a/platform/destinations/databricks.mdx
+++ b/platform/destinations/databricks.mdx
@@ -12,12 +12,14 @@ import DatabricksPrerequisites from '/snippets/general-shared-text/databricks-vo
To create the destination connector:
-1. On the sidebar, click **Destinations**.
-2. Click **New Destination**.
-3. In the **Type** drop-down list, select **Databricks**.
-4. Fill in the fields as described later on this page.
-5. Click **Save and Test**.
-6. Click **Close**.
+1. On the sidebar, click **Connectors**.
+2. Click **Destinations**.
+3. Click **Add new**.
+4. Give the connector some unique **Name**.
+5. In the **Provider** area, click **Databricks**.
+6. Click **Continue**.
+7. Follow the on-screen instructions to fill in the fields as described later on this page.
+8. Click **Save and Test**.
import DatabricksFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
diff --git a/platform/destinations/overview.mdx b/platform/destinations/overview.mdx
index c9fbf686..3a0e676a 100644
--- a/platform/destinations/overview.mdx
+++ b/platform/destinations/overview.mdx
@@ -15,6 +15,7 @@ To create a destination connector:
4. Fill in the fields according to your connector type. To learn how, click your connector type in the following list:
- [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
+ - [Databricks Volumes](/platform/destinations/databricks)
- [Pinecone](/platform/destinations/pinecone)
- [S3](/platform/destinations/s3)
diff --git a/platform/sources/databricks.mdx b/platform/sources/databricks.mdx
new file mode 100644
index 00000000..52cff067
--- /dev/null
+++ b/platform/sources/databricks.mdx
@@ -0,0 +1,26 @@
+---
+title: Databricks Volumes
+---
+
+Ingest your files into Unstructured from Databricks Volumes.
+
+You'll need:
+
+import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
+
+
+
+To create the source connector:
+
+1. On the sidebar, click **Connectors**.
+2. Click **Sources**.
+3. Click **Add new**.
+4. Give the connector some unique **Name**.
+5. In the **Provider** area, click **Databricks**.
+6. Click **Continue**.
+7. Follow the on-screen instructions to fill in the fields as described later on this page.
+8. Click **Save and Test**.
+
+import DatabricksVolumesFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
+
+
\ No newline at end of file
diff --git a/platform/sources/overview.mdx b/platform/sources/overview.mdx
index 37ac10b0..d2b55568 100644
--- a/platform/sources/overview.mdx
+++ b/platform/sources/overview.mdx
@@ -16,6 +16,7 @@ To create a source connector:
4. Fill in the fields according to your connector type. To learn how, click your connector type in the following list:
- [Azure](/platform/sources/azure-blob-storage)
+ - [Databricks Volumes](/platform/sources/databricks)
- [S3](/platform/sources/s3)
5. Click **Save and Test**.
diff --git a/snippets/general-shared-text/databricks-volumes-cli-api.mdx b/snippets/general-shared-text/databricks-volumes-cli-api.mdx
index a7c1d169..dc7f9845 100644
--- a/snippets/general-shared-text/databricks-volumes-cli-api.mdx
+++ b/snippets/general-shared-text/databricks-volumes-cli-api.mdx
@@ -11,7 +11,6 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
The following environment variables:
- `DATABRICKS_HOST` - The Databricks host URL, represented by `--host` (CLI) or `host` (Python).
-- `DATABRICKS_CLUSTER_ID` - The Databricks compute resource ID, represented by `--cluster-id` (CLI) or `cluster_id` (Python).
- `DATABRICKS_CATALOG` - The Databricks catalog name for the Volume, represented by `--catalog` (CLI) or `catalog` (Python).
- `DATABRICKS_SCHEMA` - The Databricks schema name for the Volume, represented by `--schema` (CLI) or `schema` (Python). If not specified, `default` is used.
- `DATABRICKS_VOLUME` - The Databricks Volume name, represented by `--volume` (CLI) or `volume` (Python).
diff --git a/snippets/general-shared-text/databricks-volumes-platform.mdx b/snippets/general-shared-text/databricks-volumes-platform.mdx
index a1f45588..f4ee23b2 100644
--- a/snippets/general-shared-text/databricks-volumes-platform.mdx
+++ b/snippets/general-shared-text/databricks-volumes-platform.mdx
@@ -2,35 +2,22 @@ Fill in the following fields:
- **Name** (_required_): A unique name for this connector.
- **Host** (_required_): The Databricks workspace host URL.
-- **Cluster ID** : The Databricks cluster ID.
- **Catalog** (_required_): The name of the catalog to use.
- **Schema** : The name of the associated schema. If not specified, **default** is used.
- **Volume** (_required_): The name of the associated volume.
- **Volume Path** : Any optional path to access within the volume.
- **Overwrite** Check this box if existing data should be overwritten.
-- **Encoding** : Any encoding to be applied to the data in the volume. If not specified, **utf-8**, is used.
+- **Encoding** : Any encoding to be applied to the data in the volume. If not specified, **utf-8** is used.
Also fill in the following fields based on your authentication type, depending on your cloud provider:
-- For Databricks personal access token authentication (AWS, Azure, and GCP):
-
- - **Token** : The Databricks personal access token value.
-
-- For username and password (basic) authentication (AWS only):
-
- - **Username** : The Databricks username value.
- - **Password** : The associated Databricks password value.
-
-The following authentication types are currently not supported:
-
-- OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP).
-- OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP).
-- Azure managed identities (MSI) authentication (Azure only).
-- Microsoft Entra ID service principal authentication (Azure only).
-- Azure CLI authentication (Azure only).
-- Microsoft Entra ID user authentication (Azure only).
-- Google Cloud Platform credentials authentication (GCP only).
-- Google Cloud Platform ID authentication (GCP only).
+- For Databricks personal access token authentication (AWS, Azure, and GCP) or for
+ Microsoft Entra ID user authentication (Azure only):
+ - **Token** : The Databricks personal access token value (for AWS, Azure, and GCP) or the
+ Microsoft Entra ID token value (Azure only).
+- For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP):
+ - **Client ID** : The service principal's client (application) ID value.
+ - **Client Secret** : The associated Databricks OAuth client secret value.
\ No newline at end of file
diff --git a/snippets/general-shared-text/databricks-volumes.mdx b/snippets/general-shared-text/databricks-volumes.mdx
index 8bed44d3..680e3580 100644
--- a/snippets/general-shared-text/databricks-volumes.mdx
+++ b/snippets/general-shared-text/databricks-volumes.mdx
@@ -1,5 +1,15 @@
The Databricks Volumes prerequisites:
+
+
- The Databricks workspace URL. Get the workspace URL for
[AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids),
[Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids),
@@ -11,26 +21,27 @@ The Databricks Volumes prerequisites:
- Azure: `https://adb-..azuredatabricks.net`
- GCP: `https://..gcp.databricks.com`
-- The Databricks compute resource's ID. Get the compute resource ID for
- [AWS](https://docs.databricks.com/integrations/compute-details.html),
- [Azure](https://learn.microsoft.com/azure/databricks/integrations/compute-details),
- or [GCP](https://docs.gcp.databricks.com/integrations/compute-details.html).
-
- The Databricks authentication details. For more information, see the documentation for
[AWS](https://docs.databricks.com/dev-tools/auth/index.html),
[Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/),
or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html).
- More specifically, you will need:
+ More specifically, you will need the following authentication details.
+
+ The following authentication types are supported by both [Unstructured API services](/api-reference/api-services/overview)
+ and the [Unstructured Platform](/platform/overview):
- For Databricks personal access token authentication (AWS, Azure, and GCP): The personal access token's value.
- - For username and password (basic) authentication (AWS only): The user's name and password values.
- For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP): The client ID and OAuth secret values for the corresponding service principal.
+ - For Microsoft Entra ID user authentication (Azure only): The Entra ID token for the corresponding Entra ID user.
+
+ The following authentication types are supported only by Unstructured API services:
+
+ - For username and password (basic) authentication (AWS only): The user's name and password values.
- For OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP): No additional values.
- For Azure managed identities (MSI) authentication (Azure only): The client ID value for the corresponding managed identity.
- For Microsoft Entra ID service principal authentication (Azure only): The tenant ID, client ID, and client secret values for the corresponding service principal.
- For Azure CLI authentication (Azure only): No additional values.
- - For Microsoft Entra ID user authentication (Azure only): The Entra ID token for the corresponding Entra ID user.
- For Google Cloud Platform credentials authentication (GCP only): The local path to the corresponding Google Cloud service account's credentials file.
- For Google Cloud Platform ID authentication (GCP only): The Google Cloud service account's email address.