diff --git a/snippets/general-shared-text/databricks-delta-table-api-placeholders.mdx b/snippets/general-shared-text/databricks-delta-table-api-placeholders.mdx index 06c1bfec..2c7c95da 100644 --- a/snippets/general-shared-text/databricks-delta-table-api-placeholders.mdx +++ b/snippets/general-shared-text/databricks-delta-table-api-placeholders.mdx @@ -4,9 +4,15 @@ - `` (_required_ for PAT authentication): For Databricks personal access token (PAT) authentication, the target Databricks user's PAT value. - `` and `` (_required_ for OAuth authentication): For Databricks OAuth machine-to-machine (M2M) authentication, the Databricks managed service principal's **UUID** (or **Client ID** or **Application ID**) and OAuth **Secret** (client secret) values. - `` (_required_): The name of the catalog in Unity Catalog for the target volume and table in the Databricks workspace. -- ``: The name of the database in Unity Catalog for the target volume and table. The default is `default` if not otherwise specified. +- ``: The name of the schema (formerly known as a database) in Unity Catalog for the target table. The default is `default` if not otherwise specified. + + If the target table and volume are in the same schema (formerly known as a database), then `` and `` will have the same values. + - `` (_required_): The name of the target table in Unity Catalog. -- ``: The name of the schema in Unity Catalog for the target volume and table. The default is `default` if not otherwise specified. +- ``: The name of the schema (formerly known as a database) in Unity Catalog for the target volume. The default is `default` if not otherwise specified. + + If the target volume and table are in the same schema (formerly known as a database), then `` and `` will have the same values. + - `` (_required_): The name of the target volume in Unity Catalog. - ``: Any target folder path inside of the volume to use instead of the volume's root. If not otherwise specified, processing occurs at the volume's root. diff --git a/snippets/general-shared-text/databricks-delta-table-cli-api.mdx b/snippets/general-shared-text/databricks-delta-table-cli-api.mdx index cd54ac71..ff71233e 100644 --- a/snippets/general-shared-text/databricks-delta-table-cli-api.mdx +++ b/snippets/general-shared-text/databricks-delta-table-cli-api.mdx @@ -16,8 +16,11 @@ The following environment variables: - `DATABRICKS_CLIENT_ID` - For Databricks managed service principal authenticaton, the service principal's **UUID** (or **Client ID** or **Application ID**) value, represented by `--client-id` (CLI) or `client_id` (Python). - `DATABRICKS_CLIENT_SECRET` - For Databricks managed service principal authenticaton, the service principal's OAuth **Secret** value, represented by `--client-secret` (CLI) or `client_secret` (Python). - `DATABRICKS_CATALOG` - The name of the catalog in Unity Catalog, represented by `--catalog` (CLI) or `catalog` (Python). -- `DATABRICKS_DATABASE` - The name of the schema (database) inside of the catalog, represented by `--database` (CLI) or `database` (Python). The default is `default` if not otherwise specified. -- `DATABRICKS_TABLE` - The name of the table inside of the schema (database), represented by `--table-name` (CLI) or `table_name` (Python). The default is `elements` if not otherwise specified. +- `DATABRICKS_DATABASE` - The name of the schema (formerly known as a database) inside of the catalog for the target table, represented by `--database` (CLI) or `database` (Python). The default is `default` if not otherwise specified. + + If you are also using a volume, and the target table and volume are in the same schema (formerly known as a database), then `DATABRICKS_DATABASE` and `DATABRICKS_SCHEMA` will have the same values. + +- `DATABRICKS_TABLE` - The name of the table inside of the schema (formerly known as a database), represented by `--table-name` (CLI) or `table_name` (Python). The default is `elements` if not otherwise specified. For the SQL-based implementation, add these environment variables: @@ -26,7 +29,9 @@ For the SQL-based implementation, add these environment variables: For the volume-based implementation, add these environment variables: -- `DATABRICKS_SCHEMA` - The name of the schema (database) inside of the catalog, represented by `--schema` (CLI) or `schema` (Python). This name of this database (schema) must be the same as - the value of the `DATABRICKS_DATABASE` environment variable and is required for compatiblity. The default is `default` if not otherwise specified. -- `DATABRICKS_VOLUME` - The name of the volume inside of the schema (database), represented by `--volume` (CLI) or `volume` (Python). +- `DATABRICKS_SCHEMA` - The name of the schema (formerly known as a database) inside of the catalog for the target volume, represented by `--schema` (CLI) or `schema` (Python). The default is `default` if not otherwise specified. + + If the target volume and table are in the same schema (formerly known as a database), then `DATABRICKS_SCHEMA` and `DATABRICKS_SCHEMA` will have the same values. + +- `DATABRICKS_VOLUME` - The name of the volume inside of the schema (formerly known as a database), represented by `--volume` (CLI) or `volume` (Python). - `DATABRICKS_VOLUME_PATH` - Optionally, a specific path inside of the volume that you want to start accessing from, starting from the volume's root, represented by `--volume-path` (CLI) or `volume_path` (Python). The default is to start accessing from the volume's root if not otherwise specified. diff --git a/snippets/general-shared-text/databricks-delta-table-platform.mdx b/snippets/general-shared-text/databricks-delta-table-platform.mdx index 3a99fa09..4805157a 100644 --- a/snippets/general-shared-text/databricks-delta-table-platform.mdx +++ b/snippets/general-shared-text/databricks-delta-table-platform.mdx @@ -6,8 +6,14 @@ Fill in the following fields: - **Token** (_required_ for PAT authentication): For Databricks personal access token (PAT) authentication, the target Databricks user's PAT value. - **UUID** and **OAuth Secret** (_required_ for OAuth authentication): For Databricks OAuth machine-to-machine (M2M) authentication, the Databricks managed service principal's **UUID** (or **Client ID** or **Application ID**) and OAuth **Secret** (client secret) values. - **Catalog** (_required_): The name of the catalog in Unity Catalog for the target volume and table in the Databricks workspace. -- **Database**: The name of the database in Unity Catalog for the target volume and table. The default is `default` if not otherwise specified. +- **Database**: The name of the schema (formerly known as a database) in Unity Catalog for the target table. The default is `default` if not otherwise specified. + + If the target table and volume are in the same schema (formerly known as a database), then **Database** and **Schema** will have the same names. + - **Table Name** (_required_): The name of the target table in Unity Catalog. -- **Schema**: The name of the schema in Unity Catalog for the target volume and table. The default is `default` if not otherwise specified. +- **Schema**: The name of the schema (formerly known as a database) in Unity Catalog for the target volume. The default is `default` if not otherwise specified. + + If the target volume and table are in the same schema (formerly known as a database), then **Schema** and **Database** will have the same names. + - **Volume** (_required_): The name of the target volume in Unity Catalog. - **Volume Path**: Any target folder path inside of the volume to use instead of the volume's root. If not otherwise specified, processing occurs at the volume's root. diff --git a/snippets/general-shared-text/databricks-delta-table.mdx b/snippets/general-shared-text/databricks-delta-table.mdx index 1063dc94..2a166f84 100644 --- a/snippets/general-shared-text/databricks-delta-table.mdx +++ b/snippets/general-shared-text/databricks-delta-table.mdx @@ -9,10 +9,35 @@ - A SQL warehouse for [AWS](https://docs.databricks.com/compute/sql-warehouse/create.html), [Azure](https://learn.microsoft.com/azure/databricks/compute/sql-warehouse/create), or [GCP](https://docs.gcp.databricks.com/compute/sql-warehouse/create.html). + + The following video shows how to create a SQL warehouse if you do not already have one available, get its **Server Hostname** and **HTTP Path** values, and set permissions for someone other than the warehouse's owner to use it: + + + - An all-purpose cluster for [AWS](https://docs.databricks.com/compute/use-compute.html), [Azure](https://learn.microsoft.com/azure/databricks/compute/use-compute), or [GCP](https://docs.gcp.databricks.com/compute/use-compute.html). + The following video shows how to create an all-purpose cluster if you do not already have one available, get its **Server Hostname** and **HTTP Path** values, and set permissions for someone other than the cluster's owner to use it: + + + - The SQL warehouse's or cluster's **Server Hostname** and **HTTP Path** values for [AWS](https://docs.databricks.com/integrations/compute-details.html), [Azure](https://learn.microsoft.com/azure/databricks/integrations/compute-details), or [GCP](https://docs.gcp.databricks.com/integrations/compute-details.html). @@ -25,7 +50,7 @@ for [AWS](https://docs.databricks.com/catalogs/create-catalog.html), [Azure](https://learn.microsoft.com/azure/databricks/catalogs/create-catalog), or [GCP](https://docs.gcp.databricks.com/catalogs/create-catalog.html). - - A schema + - A schema (formerly known as a database) for [AWS](https://docs.databricks.com/schemas/create-schema.html), [Azure](https://learn.microsoft.com/azure/databricks/schemas/create-schema), or [GCP](https://docs.gcp.databricks.com/schemas/create-schema.html) @@ -34,7 +59,19 @@ for [AWS](https://docs.databricks.com/tables/managed.html), [Azure](https://learn.microsoft.com/azure/databricks/tables/managed), or [GCP](https://docs.gcp.databricks.com/tables/managed.html) - within that schema. + within that schema (formerly known as a database). + + The following video shows how to create a catalog, schema (formerly known as a database), and a table in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them: + + This table must contain the following column names and their data types: @@ -86,22 +123,33 @@ ); ``` + + In Databricks, a table's _schema_ is different than a _schema_ (formerly known as a database) in a catalog-schema object relationship in Unity Catalog. + + - Within Unity Catalog, a volume for [AWS](https://docs.databricks.com/volumes/utility-commands.html), [Azure](https://learn.microsoft.com/azure/databricks/volumes/utility-commands), - or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html) - within the same schema as the table. -- For Databricks personal access token authentication to the workspace, the - Databricks personal access token value for - [AWS](https://docs.databricks.com/dev-tools/auth/pat.html#databricks-personal-access-tokens-for-workspace-users), - [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat#azure-databricks-personal-access-tokens-for-workspace-users), or - [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html#databricks-personal-access-tokens-for-workspace-users). - This token must be for the workspace user who - has the appropriate access permissions to the catalog, schema, table, volume, and cluster or SQL warehouse, + or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html). The volume can be in the same + schema (formerly known as a database) as the table, or the volume and table can be in separate schemas. In either case, both of these + schemas must share the same parent catalog. + + The following video shows how to create a catalog, schema (formerly known as a database), and a volume in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them: + + + - For Databricks managed service principal authentication (using Databricks OAuth M2M) to the workspace: - A Databricks managed service principal. - This service principal must have the appropriate access permissions to the catalog, schema, table, volume, and cluster or SQL warehouse. + This service principal must have the appropriate access permissions to the catalog, schema (formerly known as a database), table, volume, and cluster or SQL warehouse. - The service principal's **UUID** (or **Client ID** or **Application ID**) value. - The OAuth **Secret** value for the service principal. @@ -110,11 +158,11 @@ [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html). - For Azure Databricks, this connector only supports Databricks managed service principals. + For Azure Databricks, this connector only supports Databricks managed service principals for authentication. Microsoft Entra ID managed service principals are not supported. - The following video shows how to create a Databricks managed service principal: + The following video shows how to create a Databricks managed service principal if you do not already have one available: +- For Databricks personal access token authentication to the workspace, the + Databricks personal access token value for + [AWS](https://docs.databricks.com/dev-tools/auth/pat.html#databricks-personal-access-tokens-for-workspace-users), + [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat#azure-databricks-personal-access-tokens-for-workspace-users), or + [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html#databricks-personal-access-tokens-for-workspace-users). + This token must be for the workspace user who + has the appropriate access permissions to the catalog, schema (formerly known as a database), table, volume, and cluster or SQL warehouse, + + The following video shows how to create a Databricks personal access token if you do not already have one available: + + + - The Databricks workspace user or Databricks managed service principal must have the following _minimum_ set of permissions and privileges to write to an existing volume or table in Unity Catalog: @@ -140,7 +208,7 @@ - To access a Unity Catalog volume, the following privileges: - `USE CATALOG` on the volume's parent catalog in Unity Catalog. - - `USE SCHEMA` on the volume's parent schema in Unity Catalog. + - `USE SCHEMA` on the volume's parent schema (formerly known as a database) in Unity Catalog. - `READ VOLUME` and `WRITE VOLUME` on the volume. Learn how to check and set Unity Catalog privileges for @@ -148,22 +216,10 @@ [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges). - The following videos shows how to grant a Databricks managed service principal privileges to a Unity Catalog volume: - - - - To access a Unity Catalog table, the following privileges: - `USE CATALOG` on the table's parent catalog in Unity Catalog. - - `USE SCHEMA` on the tables's parent schema in Unity Catalog. + - `USE SCHEMA` on the tables's parent schema (formerly known as a database) in Unity Catalog. - `MODIFY` and `SELECT` on the table. Learn how to check and set Unity Catalog privileges for diff --git a/snippets/general-shared-text/databricks-volumes.mdx b/snippets/general-shared-text/databricks-volumes.mdx index 4c3c6f5a..8ccabdde 100644 --- a/snippets/general-shared-text/databricks-volumes.mdx +++ b/snippets/general-shared-text/databricks-volumes.mdx @@ -1,19 +1,10 @@ - - -The preceding video shows how to use Databricks personal access tokens (PATs), which are supported only for [Unstructured Ingest](/ingestion/overview). - -To learn how to use Databricks managed service principals, which are supported by both the [Unstructured Platform](/platform/overview) and Unstructured Ingest, -see the additional video later on this page. - -- The Databricks workspace URL. Get the workspace URL for +- A Databricks account on [AWS](https://docs.databricks.com/getting-started/free-trial.html), + [Azure](https://learn.microsoft.com/azure/databricks/getting-started/), or + [GCP](https://docs.gcp.databricks.com/getting-started/index.html). +- A workspace within the Datbricks account for [AWS](https://docs.databricks.com/admin/workspace/index.html), + [Azure](https://learn.microsoft.com/azure/databricks/admin/workspace/), or + [GCP](https://docs.gcp.databricks.com/admin/workspace/index.html). +- The workspace's URL. Get the workspace URL for [AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids), [Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids), or [GCP](https://docs.gcp.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids). @@ -29,6 +20,13 @@ see the additional video later on this page. [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/), or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html). + For the [Unstructured Platform](/platform/overview), only Databricks OAuth machine-to-machine (M2M) authentication is supported for + [AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html), + [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), and + [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html). + You will need the the **Client ID** (or **UUID** or **Application** ID) and OAuth **Secret** (client secret) values for the corresponding service principal. + Note that for Azure, only Databricks managed service principals are supported. Microsoft Entra ID managed service principals are not supported. + The following video shows how to create a Databricks managed service principal: - - For the [Unstructured Platform](/platform/overview), only Databricks OAuth machine-to-machine (M2M) authentication is supported for AWS, Azure, and GCP. - You will need the the **Client ID** (or **UUID** or **Application** ID) and OAuth **Secret** (client secret) values for the corresponding service principal. - Note that for Azure, only Databricks managed service principals are supported. Microsoft Entra ID managed service principals are not supported. For [Unstructured Ingest](/ingestion/overview), the following Databricks authentication types are supported: - - For Databricks personal access token authentication (AWS, Azure, and GCP): The personal access token's value. - - For username and password (basic) authentication (AWS only): The user's name and password values. - - For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP): The client ID and OAuth secret values for the corresponding service principal. - - For OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP): No additional values. - - For Azure managed identities (MSI) authentication (Azure only): The client ID value for the corresponding managed identity. - - For Microsoft Entra ID service principal authentication (Azure only): The tenant ID, client ID, and client secret values for the corresponding service principal. - - For Azure CLI authentication (Azure only): No additional values. - - For Microsoft Entra ID user authentication (Azure only): The Entra ID token for the corresponding Entra ID user. - - For Google Cloud Platform credentials authentication (GCP only): The local path to the corresponding Google Cloud service account's credentials file. - - For Google Cloud Platform ID authentication (GCP only): The Google Cloud service account's email address. + - For Databricks personal access token authentication for + [AWS](https://docs.databricks.com/dev-tools/auth/pat.html), + [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat), or + [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html): The personal access token's value. + + The following video shows how to create a Databricks personal access token: + + + + - For username and password (basic) authentication ([AWS](https://docs.databricks.com/archive/dev-tools/basic.html) only): The user's name and password values. + - For OAuth machine-to-machine (M2M) authentication ([AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html), + [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), and + [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html)): The client ID and OAuth secret values for the corresponding service principal. + - For OAuth user-to-machine (U2M) authentication ([AWS](https://docs.databricks.com/dev-tools/auth/oauth-u2m.html), + [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-u2m), and + [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-u2m.html)): No additional values. + - For Azure managed identities (formerly Managed Service Identities (MSI) authentication) ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-mi) only): The client ID value for the corresponding managed identity. + - For Microsoft Entra ID service principal authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-sp) only): The tenant ID, client ID, and client secret values for the corresponding service principal. + - For Azure CLI authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-cli) only): No additional values. + - For Microsoft Entra ID user authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/user-aad-token) only): The Entra ID token for the corresponding Entra ID user. + - For Google Cloud Platform credentials authentication ([GCP](https://docs.gcp.databricks.com/dev-tools/auth/gcp-creds.html) only): The local path to the corresponding Google Cloud service account's credentials file. + - For Google Cloud Platform ID authentication ([GCP](https://docs.gcp.databricks.com/dev-tools/auth/gcp-id.html) only): The Google Cloud service account's email address. - The name of the parent catalog in Unity Catalog for [AWS](https://docs.databricks.com/catalogs/create-catalog.html), [Azure](https://learn.microsoft.com/azure/databricks/catalogs/create-catalog), or [GCP](https://docs.gcp.databricks.com/catalogs/create-catalog.html) for the volume. -- The name of the parent schema in Unity Catalog for +- The name of the parent schema (formerly known as a database) in Unity Catalog for [AWS](https://docs.databricks.com/schemas/create-schema.html), [Azure](https://learn.microsoft.com/azure/databricks/schemas/create-schema), or [GCP](https://docs.gcp.databricks.com/schemas/create-schema.html) for the volume. @@ -73,22 +87,22 @@ see the additional video later on this page. existing volume in Unity Catalog: - `USE CATALOG` on the volume's parent catalog in Unity Catalog. - - `USE SCHEMA` on the volume's parent schema in Unity Catalog. + - `USE SCHEMA` on the volume's parent schema (formerly known as a database) in Unity Catalog. - `READ VOLUME` and `WRITE VOLUME` on the volume. - Learn how to check and set Unity Catalog privileges for - [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges), - [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or - [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges). + The following videos shows how to create and set privileges for a catalog, schema (formerly known as a database), and volume in Unity Catalog. - The following videos shows how to grant a Databricks managed service principal privileges to a Unity Catalog volume: - \ No newline at end of file + > + + Learn more about how to check and set Unity Catalog privileges for + [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges), + [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or + [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges).