Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ curl --request 'POST' --location \
"client_secret": "<client-secret>",
"volume": "<volume>",
"catalog": "<catalog>",
"volume_path": "<volume-path>",
"volume_path": "<volume_path>",
"schema": "<schema>",
"database": "<database>",
"table_name": "<table-name>"
"table_name": "<table_name>"
}
}'
```
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
client_secret="<client-secret>",
volume="<volume>",
catalog="<catalog>",
volume_path="<volume-path>",
volume_path="<volume_path>",
schema="<schema>",
database="<database>",
table_name="<table-name>"
table_name="<table_name>"
)
)
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ curl --request 'POST' --location \
"catalog": "<catalog>",
"schema": "<schema>",
"volume": "<volume>",
"volume_path": "<volume-path>",
"volume_path": "<volume_path>",

# For Databricks OAuth machine-to-machine (M2M) authentication:
"client_secret": "<client-secret>",
Expand Down
2 changes: 1 addition & 1 deletion snippets/destination_connectors/databricks_volumes_sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
catalog="<catalog>",
schema="<schema>",
volume="<volume>",
volume_path="<volume-path>",
volume_path="<volume_path>",

# For Databricks OAuth machine-to-machine (M2M) authentication:
client_secret="<client-secret>",
Expand Down
2 changes: 1 addition & 1 deletion snippets/destination_connectors/postgresql_rest_create.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ curl --request 'POST' --location \
"port": "<port>",
"username": "<username>",
"password": "<password>",
"table_name": "<table-name>",
"table_name": "<table_name>",
"batch_size": <batch-size>
}
}'
Expand Down
2 changes: 1 addition & 1 deletion snippets/destination_connectors/postgresql_sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
port="<port>",
username="<username>",
password="<password>",
table_name="<table-name>",
table_name="<table_name>",
batch_size=<batch-size>
)
)
Expand Down
2 changes: 1 addition & 1 deletion snippets/destination_connectors/snowflake_rest_create.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ curl --request 'POST' --location \
"role": "<role>",
"password": "<password>",
"record_id_key": "<record-id-key>",
"table_name": "<table-name>",
"table_name": "<table_name>",
"batch_size": <batch-size>
}
}'
Expand Down
2 changes: 1 addition & 1 deletion snippets/destination_connectors/snowflake_sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
role="<role>",
password="<password>",
record_id_key="<record-id-key>",
table_name="<table-name>",
table_name="<table_name>",
batch_size=<batch-size>
)
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@

If the target table and volume are in the same schema (formerly known as a database), then `<database>` and `<schema>` will have the same values.

- `<table-name>` (_required_): The name of the target table in Unity Catalog.
- `<table_name>` (_required_): The name of the target table in Unity Catalog.
- `<schema>`: The name of the schema (formerly known as a database) in Unity Catalog for the target volume. The default is `default` if not otherwise specified.

If the target volume and table are in the same schema (formerly known as a database), then `<schema>` and `<database>` will have the same values.

- `<volume>` (_required_): The name of the target volume in Unity Catalog.
- `<volume-path>`: Any target folder path inside of the volume to use instead of the volume's root. If not otherwise specified, processing occurs at the volume's root.

- `<volume_path>`: Any target folder path inside of the volume to use instead of the volume's root. If not otherwise specified, processing occurs at the volume's root.

<Note>
Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), tables, and volumes might cause isolated issues with the connector. It is
recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, tables, and volumes.
</Note>

Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ The following environment variables:

- `DATABRICKS_TABLE` - The name of the table inside of the schema (formerly known as a database), represented by `--table-name` (CLI) or `table_name` (Python). The default is `elements` if not otherwise specified.

<Note>
Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), tables, and volumes might cause isolated issues with the connector. It is
recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, tables, and volumes.
</Note>

For the SQL-based implementation, add these environment variables:

- `DATABRICKS_RECORD_ID_KEY` - The name of the column that uniquely identifies each record in the table, represented by `--record-id-key` (CLI) or `record_id_key` (Python).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,8 @@ Fill in the following fields:

- **Volume** (_required_): The name of the target volume in Unity Catalog.
- **Volume Path**: Any target folder path inside of the volume to use instead of the volume's root. If not otherwise specified, processing occurs at the volume's root.

<Note>
Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), tables, and volumes might cause isolated issues with the connector. It is
recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, tables, and volumes.
</Note>
12 changes: 11 additions & 1 deletion snippets/general-shared-text/databricks-delta-table.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@
[GCP](https://docs.gcp.databricks.com/tables/managed.html)
within that schema (formerly known as a database).

<Note>
Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), and tables might cause isolated issues with the connector. It is
recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, and tables.
</Note>

The following video shows how to create a catalog, schema (formerly known as a database), and a table in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them:

<iframe
Expand All @@ -76,7 +81,7 @@
This table must contain the following column names and their data types:

```text
CREATE TABLE IF NOT EXISTS `<catalog-name>`.`<schema-name>`.elements (
CREATE TABLE IF NOT EXISTS `<catalog_name>`.`<schema_name>`.elements (
id STRING NOT NULL PRIMARY KEY,
record_id STRING,
element_id STRING,
Expand Down Expand Up @@ -135,6 +140,11 @@
schema (formerly known as a database) as the table, or the volume and table can be in separate schemas. In either case, both of these
schemas must share the same parent catalog.

<Note>
Using dashes (`-`) in the names of volumes might cause isolated issues with the connector. It is
recommended to use underscores (`_`) instead of dashes in the names of volumes.
</Note>

The following video shows how to create a catalog, schema (formerly known as a database), and a volume in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them:

<iframe
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
- `<name>` (_required_) - A unique name for this connector.
- `<host>` (_required_) - The Databricks workspace host URL.

<Note>
Do not add a trailing slash (`/`) to the workspace host URL.
</Note>

- `<client-id>` (_required_) - For Databricks OAuth machine-to-machine (M2M) authentication,
the **Client ID** (or **UUID** or **Application ID**) value for the Databricks managed service principal that has the appropriate privileges to the volume.
- `<client-secret>` (_required_) - For Databricks OAuth M2M authentication,
Expand All @@ -8,4 +13,4 @@
- `<catalog>` (_required_) - The name of the catalog to use.
- `<schema>` - The name of the associated schema. If not specified, `default` is used.
- `<volume>` (_required_) - The name of the associated volume.
- `<volume-path>` - Any optional path to access within the volume.
- `<volume_path>` - Any optional path to access within the volume.
5 changes: 5 additions & 0 deletions snippets/general-shared-text/databricks-volumes-cli-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
The following environment variables:

- `DATABRICKS_HOST` - The Databricks host URL, represented by `--host` (CLI) or `host` (Python).

<Note>
Do not add a trailing slash (`/`) to the host URL.
</Note>

- `DATABRICKS_CATALOG` - The Databricks catalog name for the Volume, represented by `--catalog` (CLI) or `catalog` (Python).
- `DATABRICKS_SCHEMA` - The Databricks schema name for the Volume, represented by `--schema` (CLI) or `schema` (Python). If not specified, `default` is used.
- `DATABRICKS_VOLUME` - The Databricks Volume name, represented by `--volume` (CLI) or `volume` (Python).
Expand Down
5 changes: 5 additions & 0 deletions snippets/general-shared-text/databricks-volumes-platform.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ Fill in the following fields:

- **Name** (_required_): A unique name for this connector.
- **Host** (_required_): The Databricks workspace host URL.

<Note>
Do not add a trailing slash (`/`) to the host URL.
</Note>

- **Catalog** (_required_): The name of the catalog to use.
- **Schema** : The name of the associated schema. If not specified, **default** is used.
- **Volume** (_required_): The name of the associated volume.
Expand Down
4 changes: 4 additions & 0 deletions snippets/general-shared-text/databricks-volumes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
- Azure: `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`
- GCP: `https://<workspace-id>.<random-number>.gcp.databricks.com`

<Note>
Do not add a trailing slash (`/`) to the workspace URL.
</Note>

- The Databricks authentication details. For more information, see the documentation for
[AWS](https://docs.databricks.com/dev-tools/auth/index.html),
[Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/),
Expand Down
6 changes: 3 additions & 3 deletions snippets/general-shared-text/duckdb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
- You can list available tables in a schema by running the following DuckDB CLI commands, replacing the target catalog and schema names:

```sql
USE <catalog-name>.<schema-name>;
USE <catalog_name>.<schema_name>;
SHOW TABLES;
```

Expand Down Expand Up @@ -76,6 +76,6 @@
You can list the schema of a table by running the following DuckDB CLI commands, replacing the target catalog, schema, and table names:

```sql
USE <catalog-name>.<schema-name>;
DESCRIBE TABLE <table-name>;
USE <catalog_name>.<schema_name>;
DESCRIBE TABLE <table_name>;
```
6 changes: 3 additions & 3 deletions snippets/general-shared-text/motherduck.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ allowfullscreen
- You can list available tables in a schema by running the following commands in the MotherDuck UI or the DuckDB CLI, replacing the target catalog and schema names:

```sql
USE <catalog-name>.<schema-name>;
USE <catalog_name>.<schema_name>;
SHOW TABLES;
```

Expand Down Expand Up @@ -92,6 +92,6 @@ allowfullscreen
You can list the schema of a table by running the following commands in the MotherDuck UI or the DuckDB CLI, replacing the target catalog, schema, and table names:

```sql
USE <catalog-name>.<schema-name>;
DESCRIBE TABLE <table-name>;
USE <catalog_name>.<schema_name>;
DESCRIBE TABLE <table_name>;
```
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
- `<port>` (required) - The port number.
- `<username>` (required) - The username.
- `<password>` (required) - The user's password.
- `<table-name>` (required) - The name of the table in the database.
- `<table_name>` (required) - The name of the table in the database.
- `<batch-size>` - The maximum number of rows to transmit at a time. The default is `100` if not otherwise specified.
- `<id-column>` (required, source connector only) - The name of the ID column in the table.
- For `fields` (source connector only), set one or more `<field>` values, with each value representing the name of a column to process (including the specified `<id-column>` column). The default is all columns if not otherwise specified.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- `<port>` (_required_): The warehouse's port number. The default is `443` if not otherwise specified.
- `<database>` (_required_): The name of the target Snowflake database.
- `<schema>` (_required_): The name of the target Snowflake schema within the database.
- `<table-name>`: The name of the target Snowflake table within the database's schema. For the destination connector, the default is `elements` if not otherwise specified.
- `<table_name>`: The name of the target Snowflake table within the database's schema. For the destination connector, the default is `elements` if not otherwise specified.
- `<columns>` (source connector only): A comma-separated list of columns to fetch from the table. By default, all columns are fetched unless otherwise specified.
- `<id-column>` (_required_, source connector only): The name of the column that uniquely identifies each record in the table.
- `<record-id-key>` (destination connector only): The name of the column that uniquely identifies each record in the table. The default is `record_id` if not otherwise specified.
Expand Down
4 changes: 2 additions & 2 deletions snippets/general-shared-text/snowflake.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,11 @@
4. Expand the name of the target schema.
5. Expand **Tables**.

Alternatively, the following Snowflake query returns a list of available tables for the schema named `<schema-name>` in the datbase named
Alternatively, the following Snowflake query returns a list of available tables for the schema named `<schema_name>` in the datbase named
`<database-name>` in the current account:

```text
SHOW TABLES IN SCHEMA <database-name>.<schema-name>;
SHOW TABLES IN SCHEMA <database-name>.<schema_name>;
```

Snowflake requires the target table to have a defined schema before Unstructured can write to the table. The recommended table
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ curl --request 'POST' --location \
"catalog": "<catalog>",
"schema": "<schema>",
"volume": "<volume>",
"volume_path": "<volume-path>",
"volume_path": "<volume_path>",

# For Databricks OAuth machine-to-machine (M2M) authentication:
"client_id": "<client-id>"
Expand Down
2 changes: 1 addition & 1 deletion snippets/source_connectors/postgresql_rest_create.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ curl --request 'POST' --location \
"port": "<port>",
"username": "<username>",
"password": "<password>",
"table_name": "<table-name>",
"table_name": "<table_name>",
"batch_size": <batch-size>,
"id_column": "<id-column>",
"fields": [
Expand Down
2 changes: 1 addition & 1 deletion snippets/source_connectors/postgresql_sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
port="<port>",
username="<username>",
password="<password>",
table_name="<table-name>",
table_name="<table_name>",
batch_size=<batch-size>,
id_column="<id-column>",
fields=[
Expand Down
2 changes: 1 addition & 1 deletion snippets/source_connectors/snowflake_rest_create.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ curl --request 'POST' --location \
"role": "<role>",
"password": "<password>",
"id_column": "<id-column>",
"table_name": "<table-name>",
"table_name": "<table_name>",
"batch_size": <batch-size>,
"fields": [
"<field>",
Expand Down
2 changes: 1 addition & 1 deletion snippets/source_connectors/snowflake_sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
port=<port>
database="<database>",
schema_="<schema>",
table_name="<table-name>",
table_name="<table_name>",
id_column="<id-column>",
fields=[
"<field>",
Expand Down