Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion snippets/general-shared-text/pinecone-api-placeholders.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
- `<name>` (required) - A unique name for this connector.
- `<index-name>` (required) - The name of the index in the Pinecone database.
- `<index-name>` - The name of the index in the Pinecone database. If no value is provided, see the beginning of this article for the behavior at run time.
- `<api-key>` (required) - The Pinecone API key.
- `<batch-size>` - The maximum number of records to transmit in a single batch. The default is `50` unless otherwise specified.
2 changes: 1 addition & 1 deletion snippets/general-shared-text/pinecone-cli-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
The following environment variables:

- `PINECONE_API_KEY` - The Pinecone API, represented by `--api-key` (CLI) or `api_key` (Python, in the `PineconeAccessConfig` object).
- `PINECONE_INDEX_NAME` - The Pinecone serverless index name, represented by `--index-name` (CLI) or `index_name` (Python).
- `PINECONE_INDEX_NAME` - The Pinecone serverless index name, represented by `--index-name` (CLI) or `index_name` (Python). If no value is provided, see the beginning of this article for the behavior at run time.
2 changes: 1 addition & 1 deletion snippets/general-shared-text/pinecone-platform.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Fill in the following fields:

- **Name** (_required_): A unique name for this connector.
- **Index Name** (_required_): The name of the index in the Pinecone database.
- **Index Name**: The name of the index in the Pinecone database. If no value is provided, see the beginning of this article for the behavior at run time.
- **Batch Size**: The number of records to use in a single batch. The default is `50` if not otherwise specified.
- **API Key** (_required_): The Pinecone API key.
20 changes: 19 additions & 1 deletion snippets/general-shared-text/pinecone.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,26 @@
- A Pinecone API key. [Get an API key](https://docs.pinecone.io/guides/get-started/authentication#find-your-pinecone-api-key).
- A Pinecone serverless index. [Create a serverless index](https://docs.pinecone.io/guides/indexes/create-an-index).

An existing index is not required. At runtime, the index behavior is as follows:

For the [Unstructured Platform](/platform/overview):

- If an existing index name is specified, and Unstructured generates embeddings,
but the number of dimensions that are generated does not match the existing index's embedding settings, the run will fail.
You must change your Unstructured embedding settings or your existing index's embedding settings to match, and try the run again.
- If an index name is not specified, Unstructured creates a new index in your Pinecone account. If Unstructured generates embeddings,
the new index's name will be `u<short-workflow-id>-<short-embedding-model-name>-<number-of-dimensions>`.
If Unstructured does not generate embeddings, the new index's name will be `u<short-workflow-id`.

For [Unstructured Ingest](/ingestion/overview):

- If an existing index name is specified, and Unstructured generates embeddings,
but the number of dimensions that are generated does not match the existing index's embedding settings, the run will fail.
You must change your Unstructured embedding settings or your existing index's embedding settings to match, and try the run again.
- If an index name is not specified, Unstructured creates a new index in your Pinecone account. The new index's name will be `unstructuredautocreated`.

<Note>
Unstructured recommends that all records in the target index have a field
If you create a new index or use an existing one, Unstructured recommends that all records in the target index have a field
named `record_id` with a string data type.
Unstructured can use this field to do intelligent document overwrites. Without this field, duplicate documents
might be written to the index or, in some cases, the operation could fail altogether.
Expand Down
2 changes: 1 addition & 1 deletion snippets/general-shared-text/weaviate.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
- If an existing collection name is specified, and Unstructured generates embeddings,
but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
- If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Elements`.
- If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Unstructuredautocreated`.

If Unstructured creates a new collection and generates embeddings, you will not see an embeddings property in tools such as the Weaviate Cloud
**Collections** user interface. To view the generated embeddings, you can run a Weaviate GraphQL query such as the following. In this query, replace `<collection-name>` with
Expand Down