From 4de79ba11fcbdb9f27dd7f63c690dc4f3f745dde Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Thu, 6 Feb 2025 16:20:19 -0800 Subject: [PATCH 1/5] Weaviate destination connector: collection management behavior updates --- .../weaviate-api-placeholders.mdx | 3 ++- .../general-shared-text/weaviate-cli-api.mdx | 6 +++-- .../general-shared-text/weaviate-platform.mdx | 3 ++- snippets/general-shared-text/weaviate.mdx | 22 +++++++++++++++++-- 4 files changed, 28 insertions(+), 6 deletions(-) diff --git a/snippets/general-shared-text/weaviate-api-placeholders.mdx b/snippets/general-shared-text/weaviate-api-placeholders.mdx index 6646c5d6..63ecaefc 100644 --- a/snippets/general-shared-text/weaviate-api-placeholders.mdx +++ b/snippets/general-shared-text/weaviate-api-placeholders.mdx @@ -1,4 +1,5 @@ - `` (_required_) - A unique name for this connector. - `` (_required_) - The URL of the Weaviate database cluster. -- `` (_required_) - The name of the target collection within the cluster. +- `` - The name of the target collection within the cluster. If no value is provided, see the beginning of this article + for the behavior at run time. - `` (_required_) - The API key provided by Weaviate to access the cluster. \ No newline at end of file diff --git a/snippets/general-shared-text/weaviate-cli-api.mdx b/snippets/general-shared-text/weaviate-cli-api.mdx index 19121429..46e413e0 100644 --- a/snippets/general-shared-text/weaviate-cli-api.mdx +++ b/snippets/general-shared-text/weaviate-cli-api.mdx @@ -14,7 +14,8 @@ The following environment variables: - For Embedded Weaviate: - `WEAVIATE_HOST` - The connection URL to the instance, represented by `--hostname` (CLI) or `hostname` (Python). - - `WEAVIATE_COLLECTION` - The name of the target collection in the instance, represented by `--collection` (CLI) or `collection` (Python). + - `WEAVIATE_COLLECTION` - The name of the target collection in the instance, represented by `--collection` (CLI) or `collection` (Python). + If no value is provided, see the beginning of this article for the behavior at run time. - For Weaviate Cloud: @@ -23,4 +24,5 @@ The following environment variables: For the CLI, the `--api-key` option here is part of the `weaviate-cloud` command. For Python, the `api_key` parameter here is part of the `CloudWeaviateAccessConfig` object. - - `WEAVIATE_COLLECTION` - The name of the target collection in the database, represented by `--collection` (CLI) or `collection` (Python). \ No newline at end of file + - `WEAVIATE_COLLECTION` - The name of the target collection in the database, represented by `--collection` (CLI) or `collection` (Python). + If no value is provided, see the beginning of this article for the behavior at run time. \ No newline at end of file diff --git a/snippets/general-shared-text/weaviate-platform.mdx b/snippets/general-shared-text/weaviate-platform.mdx index d0bf5fb2..4137365b 100644 --- a/snippets/general-shared-text/weaviate-platform.mdx +++ b/snippets/general-shared-text/weaviate-platform.mdx @@ -2,5 +2,6 @@ Fill in the following fields: - **Name** (_required_): A unique name for the connector. - **Cluster URL** (_required_): The URL of the Weaviate database cluster. -- **Collection Name** (_required_): The name of the target collection within the cluster. +- **Collection Name**: The name of the target collection within the cluster. If no value is provided, see the beginning of this article + for the behavior at run time. - **API Key** (_required_): The API key provided by Weaviate to access the cluster. \ No newline at end of file diff --git a/snippets/general-shared-text/weaviate.mdx b/snippets/general-shared-text/weaviate.mdx index ab6e4e67..5bd22949 100644 --- a/snippets/general-shared-text/weaviate.mdx +++ b/snippets/general-shared-text/weaviate.mdx @@ -19,9 +19,27 @@ - A Weaviate database instance. The following information assumes that you have a Weaviate Cloud (WCD) account with a Weaviate database cluster in that account. [Create a WCD account](https://weaviate.io/developers/wcs/quickstart#create-a-wcd-account). [Create a database cluster](https://weaviate.io/developers/wcs/quickstart#create-a-weaviate-cluster). For other database options, [learn more](https://weaviate.io/developers/weaviate/installation). - The URL and API key for the database cluster. [Get the URL and API key](https://weaviate.io/developers/wcs/quickstart#explore-the-details-panel). - - The name of the target collection in the database. [Create a collection](https://weaviate.io/developers/wcs/tools/collections-tool). + - The name of the target collection in the database. [Create a collection](https://weaviate.io/developers/wcs/tools/collections-tool). + + An existing collection is not required. At runtime, the collection behavior is as follows: -Weaviate requires the collection to have a data schema before you add data. At minimum, this schema must contain the `record_id` property, as follows: + For the [Unstructured Platform](/platform/overview): + + - If an existing collection name is specified, and Unstructured generates embeddings, + but the number of dimensions that are generated do not match the existing collection's embedding settings, the run will fail. + You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. + - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. If Unstructured generates embeddings, + the new collection's name will be `Unstructured___`. + If Unstructured does not generate embeddings, the new collection's name will be `Unstructured_`. + + For [Unstructured Ingest](/ingestion/overview): + + - If an existing collection name is specified, and Unstructured generates embeddings, + but the number of dimensions that are generated do not match the existing collection's embedding settings, the run will fail. + You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. + - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Elements`. + +Weaviate requires an existing collection to have a data schema before you add data. At minimum, this schema must contain the `record_id` property, as follows: ```json { From 2a691caa212b876105af5e45c13d770b7281b502 Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Tue, 11 Feb 2025 16:31:36 -0800 Subject: [PATCH 2/5] Weaviate auto-collections: how to view embeddings/vectors --- snippets/general-shared-text/weaviate.mdx | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/snippets/general-shared-text/weaviate.mdx b/snippets/general-shared-text/weaviate.mdx index 5bd22949..c079932c 100644 --- a/snippets/general-shared-text/weaviate.mdx +++ b/snippets/general-shared-text/weaviate.mdx @@ -39,6 +39,26 @@ You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Elements`. + If Unstructured creates a new collection and generates embeddings, you will not see an embeddings property in tools such as the Weaviate Cloud + **Collections** user interface. To view the generated embeddings, you can run a Weaviate GraphQL query such as the following. In this query, replace `` with + the name of the new collection, and replace `` with the name of each additional available property that + you want to return results for, such as `text`, `type`, `element_id`, `record_id`, and so on. The embeddings will be + returned in the `vector` property. + + ```text + { + Get { + { + _additional { + vector + } + + + } + } + } + ``` + Weaviate requires an existing collection to have a data schema before you add data. At minimum, this schema must contain the `record_id` property, as follows: ```json From d7fc05e15ce6c9b240ff55b13237ca65feadcc40 Mon Sep 17 00:00:00 2001 From: Paul-Cornell Date: Wed, 12 Feb 2025 08:53:46 -0800 Subject: [PATCH 3/5] Apply suggestions from code review Co-authored-by: Maria Khalusova --- snippets/general-shared-text/weaviate.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/snippets/general-shared-text/weaviate.mdx b/snippets/general-shared-text/weaviate.mdx index c079932c..32a51547 100644 --- a/snippets/general-shared-text/weaviate.mdx +++ b/snippets/general-shared-text/weaviate.mdx @@ -26,7 +26,7 @@ For the [Unstructured Platform](/platform/overview): - If an existing collection name is specified, and Unstructured generates embeddings, - but the number of dimensions that are generated do not match the existing collection's embedding settings, the run will fail. + but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. If Unstructured generates embeddings, the new collection's name will be `Unstructured___`. @@ -35,7 +35,7 @@ For [Unstructured Ingest](/ingestion/overview): - If an existing collection name is specified, and Unstructured generates embeddings, - but the number of dimensions that are generated do not match the existing collection's embedding settings, the run will fail. + but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Elements`. From 6e07315f8294159dfb2851687fc0bafadaf0692e Mon Sep 17 00:00:00 2001 From: Paul-Cornell Date: Fri, 14 Feb 2025 08:19:32 -0800 Subject: [PATCH 4/5] Apply suggestions from code review --- snippets/general-shared-text/weaviate.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/snippets/general-shared-text/weaviate.mdx b/snippets/general-shared-text/weaviate.mdx index 32a51547..c69c5a04 100644 --- a/snippets/general-shared-text/weaviate.mdx +++ b/snippets/general-shared-text/weaviate.mdx @@ -29,8 +29,8 @@ but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. If Unstructured generates embeddings, - the new collection's name will be `Unstructured___`. - If Unstructured does not generate embeddings, the new collection's name will be `Unstructured_`. + the new collection's name will be `u__`. + If Unstructured does not generate embeddings, the new collection's name will be `u Date: Fri, 14 Feb 2025 15:51:20 -0800 Subject: [PATCH 5/5] Apply suggestions from code review --- snippets/general-shared-text/weaviate.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/snippets/general-shared-text/weaviate.mdx b/snippets/general-shared-text/weaviate.mdx index c69c5a04..6783fe68 100644 --- a/snippets/general-shared-text/weaviate.mdx +++ b/snippets/general-shared-text/weaviate.mdx @@ -29,8 +29,8 @@ but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again. - If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. If Unstructured generates embeddings, - the new collection's name will be `u__`. - If Unstructured does not generate embeddings, the new collection's name will be `u__`. + If Unstructured does not generate embeddings, the new collection's name will be `U