diff --git a/api-reference/how-to/embedding.mdx b/api-reference/how-to/embedding.mdx index 87558986..33a45f61 100644 --- a/api-reference/how-to/embedding.mdx +++ b/api-reference/how-to/embedding.mdx @@ -43,41 +43,41 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo 1. Choose an embedding provider that you want to use from among the following allowed providers, and note the provider's ID: - - The provider ID `langchain-aws-bedrock` for [Amazon Bedrock](https://aws.amazon.com/bedrock/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/bedrock/). - - `langchain-huggingface` for [Hugging Face](https://huggingface.co/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/). - - `langchain-openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/). - - `langchain-vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/). - - `langchain-voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/). + - The provider ID `aws-bedrock` for [Amazon Bedrock](https://aws.amazon.com/bedrock/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/bedrock/). + - `huggingface` for [Hugging Face](https://huggingface.co/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/). + - `openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/). + - `vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/). + - `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/). - `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview). - `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents). 2. Run the following command to install the required Python package for the embedding provider: - - For `langchain-aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`. - - For `langchain-huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`. - - For `langchain-openai`, run `pip install "unstructured-ingest[openai]"`. - - For `langchain-vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`. - - For `langchain-voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`. + - For `aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`. + - For `huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`. + - For `openai`, run `pip install "unstructured-ingest[openai]"`. + - For `vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`. + - For `voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`. - For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`. - For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`. 3. For the following embedding providers, you can choose the model that you want to use. If you do choose a model, note the model's name: - - `langchain-aws-bedrock`. [Choose a model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). No default model is provided. [Learn more about the supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html). - - `langchain-huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). - - `langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`. - - `langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`. - - `langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. + - `aws-bedrock`. [Choose a model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). No default model is provided. [Learn more about the supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html). + - `huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). + - `openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`. + - `vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`. + - `voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. - `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1). - `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`. 4. Note the special settings to connect to the provider: - - For `langchain-aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html). - - For `langchain-huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation. - - For `langchain-openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key). - - For `langchain-vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth). - - For `langchain-voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys). + - For `aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html). + - For `huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation. + - For `openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key). + - For `vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth). + - For `voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys). - For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys). - For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token). @@ -87,10 +87,10 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo For the [source connector](/api-reference/ingest/source-connectors/overview) command: - - Set the command's `--embedding-provider` to the provider's ID, for example `langchain-huggingface`. + - Set the command's `--embedding-provider` to the provider's ID, for example `huggingface`. - Set `--embedding-model-name` to the model name, as applicable, for example `sentence-transformers/sentence-t5-xl`. Or omit this to use the default model, as applicable. - Set `--embedding-api-key` to the provider's required API key value or credentials JSON file path, as appropriate. - - For `langchain-aws-bedrock`: + - For `aws-bedrock`: - Set `--embedding-aws-access-key-id` to the AWS access key value. - Set `--embedding-aws-secret-access-key` to the corresponding AWS secret access key value. @@ -99,10 +99,10 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo For the [source connector's](/api-reference/ingest/source-connectors/overview) `EmbedderConfig` object: - - Set the `embedding_provider` parameter to the provider's ID, for example `langchain-huggingface`. + - Set the `embedding_provider` parameter to the provider's ID, for example `huggingface`. - Set `embedding_model_name` to the model name, as applicable, for example `sentence-transformers/sentence-t5-xl`. Or omit this to use the default model, as applicable. - Set `embedding_api_key` to the provider's required API key value or credentials JSON file path, as appropriate. - - For `langchain-aws-bedrock`: + - For `aws-bedrock`: - Set `embedding_aws_access_key_id` to the AWS access key value. - Set `embedding_aws_secret_access_key` to the corresponding AWS secret access key value. diff --git a/snippets/destination_connectors/astradb.sh.mdx b/snippets/destination_connectors/astradb.sh.mdx index 12329475..470d4c4f 100644 --- a/snippets/destination_connectors/astradb.sh.mdx +++ b/snippets/destination_connectors/astradb.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --partition-by-api \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --partition-by-api \ --api-key $UNSTRUCTURED_API_KEY \ --partition-endpoint $UNSTRUCTURED_API_URL \ diff --git a/snippets/destination_connectors/astradb.v1.py.mdx b/snippets/destination_connectors/astradb.v1.py.mdx index f8681553..3f18b3f0 100644 --- a/snippets/destination_connectors/astradb.v1.py.mdx +++ b/snippets/destination_connectors/astradb.v1.py.mdx @@ -57,7 +57,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/astradb.v2.py.mdx b/snippets/destination_connectors/astradb.v2.py.mdx index 1942f332..d5290f57 100644 --- a/snippets/destination_connectors/astradb.v2.py.mdx +++ b/snippets/destination_connectors/astradb.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=AstraDBConnectionConfig( access_config=AstraDBAccessConfig( api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"), diff --git a/snippets/destination_connectors/azure.sh.mdx b/snippets/destination_connectors/azure.sh.mdx index 6a7ee4e5..52dea9db 100644 --- a/snippets/destination_connectors/azure.sh.mdx +++ b/snippets/destination_connectors/azure.sh.mdx @@ -11,7 +11,7 @@ unstructured-ingest \ --partition-endpoint $UNSTRUCTURED_API_URL \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ azure \ --remote-url $AZURE_STORAGE_REMOTE_URL \ diff --git a/snippets/destination_connectors/azure.v1.py.mdx b/snippets/destination_connectors/azure.v1.py.mdx index 33cb79e3..54f36978 100644 --- a/snippets/destination_connectors/azure.v1.py.mdx +++ b/snippets/destination_connectors/azure.v1.py.mdx @@ -51,7 +51,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None ), writer=writer, diff --git a/snippets/destination_connectors/azure.v2.py.mdx b/snippets/destination_connectors/azure.v2.py.mdx index e9f987d1..182fb072 100644 --- a/snippets/destination_connectors/azure.v2.py.mdx +++ b/snippets/destination_connectors/azure.v2.py.mdx @@ -38,7 +38,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=AzureConnectionConfig( access_config=AzureAccessConfig( account_name=os.getenv("AZURE_STORAGE_ACCOUNT_NAME"), diff --git a/snippets/destination_connectors/azure_cognitive_search.sh.mdx b/snippets/destination_connectors/azure_cognitive_search.sh.mdx index 793e8eb8..72c39e37 100644 --- a/snippets/destination_connectors/azure_cognitive_search.sh.mdx +++ b/snippets/destination_connectors/azure_cognitive_search.sh.mdx @@ -8,7 +8,7 @@ unstructured-ingest \ --input-path $LOCAL_FILE_INPUT_DIR \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/azure_cognitive_search.v1.py.mdx b/snippets/destination_connectors/azure_cognitive_search.v1.py.mdx index a8624cb7..dec5cd6f 100644 --- a/snippets/destination_connectors/azure_cognitive_search.v1.py.mdx +++ b/snippets/destination_connectors/azure_cognitive_search.v1.py.mdx @@ -52,7 +52,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None ), writer=writer, diff --git a/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx b/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx index 9e2cdc4c..9041f2c7 100644 --- a/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx +++ b/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx @@ -38,7 +38,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=AzureCognitiveSearchConnectionConfig( access_config=AzureCognitiveSearchAccessConfig( key=os.getenv("AZURE_SEARCH_API_KEY") diff --git a/snippets/destination_connectors/box.sh.mdx b/snippets/destination_connectors/box.sh.mdx index 09c32c0c..306b7f7c 100644 --- a/snippets/destination_connectors/box.sh.mdx +++ b/snippets/destination_connectors/box.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ diff --git a/snippets/destination_connectors/box.v1.py.mdx b/snippets/destination_connectors/box.v1.py.mdx index 83bd36fe..2d287e52 100644 --- a/snippets/destination_connectors/box.v1.py.mdx +++ b/snippets/destination_connectors/box.v1.py.mdx @@ -52,7 +52,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/box.v2.py.mdx b/snippets/destination_connectors/box.v2.py.mdx index 1334759e..f1a55a59 100644 --- a/snippets/destination_connectors/box.v2.py.mdx +++ b/snippets/destination_connectors/box.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=BoxConnectionConfig( access_config=BoxAccessConfig( box_app_config=os.getenv("BOX_APP_CONFIG_PATH") diff --git a/snippets/destination_connectors/chroma.sh.mdx b/snippets/destination_connectors/chroma.sh.mdx index 46632d10..c737e3b1 100644 --- a/snippets/destination_connectors/chroma.sh.mdx +++ b/snippets/destination_connectors/chroma.sh.mdx @@ -8,7 +8,7 @@ unstructured-ingest \ --input-path $LOCAL_FILE_INPUT_DIR \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --work-dir $WORK_DIR \ diff --git a/snippets/destination_connectors/chroma.v1.py.mdx b/snippets/destination_connectors/chroma.v1.py.mdx index df556cfb..9c1d8bd3 100644 --- a/snippets/destination_connectors/chroma.v1.py.mdx +++ b/snippets/destination_connectors/chroma.v1.py.mdx @@ -56,7 +56,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/chroma.v2.py.mdx b/snippets/destination_connectors/chroma.v2.py.mdx index dabad9c6..8312c4f0 100644 --- a/snippets/destination_connectors/chroma.v2.py.mdx +++ b/snippets/destination_connectors/chroma.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=ChromaConnectionConfig( access_config=ChromaAccessConfig( settings={"persist_directory":"./chroma-persist"}, diff --git a/snippets/destination_connectors/couchbase.sh.mdx b/snippets/destination_connectors/couchbase.sh.mdx index 4ae922bb..63f3eebd 100644 --- a/snippets/destination_connectors/couchbase.sh.mdx +++ b/snippets/destination_connectors/couchbase.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/couchbase.v2.py.mdx b/snippets/destination_connectors/couchbase.v2.py.mdx index c2b97dda..77ed22aa 100644 --- a/snippets/destination_connectors/couchbase.v2.py.mdx +++ b/snippets/destination_connectors/couchbase.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=CouchbaseConnectionConfig( access_config=CouchbaseAccessConfig( password=os.getenv("CB_PASSWORD"), diff --git a/snippets/destination_connectors/databricks_volumes.sh.mdx b/snippets/destination_connectors/databricks_volumes.sh.mdx index d411fa66..15202c6a 100644 --- a/snippets/destination_connectors/databricks_volumes.sh.mdx +++ b/snippets/destination_connectors/databricks_volumes.sh.mdx @@ -15,7 +15,7 @@ unstructured-ingest \ --chunking-strategy by_title \ --chunk-api-key $UNSTRUCTURED_API_KEY \ --chunking-endpoint $UNSTRUCTURED_API_URL \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --embedding-model-name sentence-transformers/all-mpnet-base-v2 \ databricks-volumes \ --host $DATABRICKS_HOST \ diff --git a/snippets/destination_connectors/databricks_volumes.v1.py.mdx b/snippets/destination_connectors/databricks_volumes.v1.py.mdx index e169ec48..e4390d56 100644 --- a/snippets/destination_connectors/databricks_volumes.v1.py.mdx +++ b/snippets/destination_connectors/databricks_volumes.v1.py.mdx @@ -63,7 +63,7 @@ if __name__ == "__main__": chunking_strategy="by_title", ), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", model_name="sentence-transformers/all-mpnet-base-v2", ), writer=writer, diff --git a/snippets/destination_connectors/databricks_volumes.v2.py.mdx b/snippets/destination_connectors/databricks_volumes.v2.py.mdx index 216a4504..cd57272f 100644 --- a/snippets/destination_connectors/databricks_volumes.v2.py.mdx +++ b/snippets/destination_connectors/databricks_volumes.v2.py.mdx @@ -44,7 +44,7 @@ if __name__ == "__main__": chunking_strategy="by_title" ), embedder_config=EmbedderConfig( - embedding_provider="langchain-huggingface", + embedding_provider="huggingface", embedding_model_name="sentence-transformers/all-mpnet-base-v2" ), destination_connection_config=DatabricksVolumesConnectionConfig( diff --git a/snippets/destination_connectors/delta_table.py.mdx b/snippets/destination_connectors/delta_table.py.mdx index c5f654c6..25295f01 100644 --- a/snippets/destination_connectors/delta_table.py.mdx +++ b/snippets/destination_connectors/delta_table.py.mdx @@ -49,7 +49,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/delta_table.sh.mdx b/snippets/destination_connectors/delta_table.sh.mdx index ec0f0dfb..e1219f78 100644 --- a/snippets/destination_connectors/delta_table.sh.mdx +++ b/snippets/destination_connectors/delta_table.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ delta-table \ diff --git a/snippets/destination_connectors/dropbox.sh.mdx b/snippets/destination_connectors/dropbox.sh.mdx index 51b23eda..7f0c3c6e 100644 --- a/snippets/destination_connectors/dropbox.sh.mdx +++ b/snippets/destination_connectors/dropbox.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/dropbox.v1.py.mdx b/snippets/destination_connectors/dropbox.v1.py.mdx index 922e6148..6a3ab65e 100644 --- a/snippets/destination_connectors/dropbox.v1.py.mdx +++ b/snippets/destination_connectors/dropbox.v1.py.mdx @@ -49,7 +49,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=os.getenv("UNSTRUCTURED_API_KEY"), partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"), ), diff --git a/snippets/destination_connectors/dropbox.v2.py.mdx b/snippets/destination_connectors/dropbox.v2.py.mdx index e42a4d3b..180d27b0 100644 --- a/snippets/destination_connectors/dropbox.v2.py.mdx +++ b/snippets/destination_connectors/dropbox.v2.py.mdx @@ -38,7 +38,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=DropboxConnectionConfig( access_config=DropboxAccessConfig( token=os.getenv("DROPBOX_ACCESS_TOKEN") diff --git a/snippets/destination_connectors/elasticsearch.sh.mdx b/snippets/destination_connectors/elasticsearch.sh.mdx index 4bbe5eba..492d06e6 100644 --- a/snippets/destination_connectors/elasticsearch.sh.mdx +++ b/snippets/destination_connectors/elasticsearch.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 4 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/elasticsearch.v1.py.mdx b/snippets/destination_connectors/elasticsearch.v1.py.mdx index c23ab0f0..0577da5e 100644 --- a/snippets/destination_connectors/elasticsearch.v1.py.mdx +++ b/snippets/destination_connectors/elasticsearch.v1.py.mdx @@ -60,7 +60,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None ), writer=writer, diff --git a/snippets/destination_connectors/elasticsearch.v2.py.mdx b/snippets/destination_connectors/elasticsearch.v2.py.mdx index a387db99..8b253c70 100644 --- a/snippets/destination_connectors/elasticsearch.v2.py.mdx +++ b/snippets/destination_connectors/elasticsearch.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=ElasticsearchConnectionConfig( access_config=ElasticsearchAccessConfig( password=os.getenv("ELASTICSEARCH_PASSWORD"), diff --git a/snippets/destination_connectors/gcs.sh.mdx b/snippets/destination_connectors/gcs.sh.mdx index 86c972dd..be0e007b 100644 --- a/snippets/destination_connectors/gcs.sh.mdx +++ b/snippets/destination_connectors/gcs.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/gcs.v1.py.mdx b/snippets/destination_connectors/gcs.v1.py.mdx index 8622d9be..74cfd297 100644 --- a/snippets/destination_connectors/gcs.v1.py.mdx +++ b/snippets/destination_connectors/gcs.v1.py.mdx @@ -49,7 +49,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/gcs.v2.py.mdx b/snippets/destination_connectors/gcs.v2.py.mdx index 43fac750..ffa81b33 100644 --- a/snippets/destination_connectors/gcs.v2.py.mdx +++ b/snippets/destination_connectors/gcs.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=GcsConnectionConfig( access_config=GcsAccessConfig( service_account_key=os.getenv("GCS_SERVICE_ACCOUNT_KEY") diff --git a/snippets/destination_connectors/kafka.sh.mdx b/snippets/destination_connectors/kafka.sh.mdx index 3a879cd1..eace311a 100644 --- a/snippets/destination_connectors/kafka.sh.mdx +++ b/snippets/destination_connectors/kafka.sh.mdx @@ -8,7 +8,7 @@ unstructured-ingest \ --input-path $LOCAL_FILE_INPUT_DIR \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/kafka.v1.py.mdx b/snippets/destination_connectors/kafka.v1.py.mdx index 5354241e..074717fd 100644 --- a/snippets/destination_connectors/kafka.v1.py.mdx +++ b/snippets/destination_connectors/kafka.v1.py.mdx @@ -58,7 +58,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None ), writer=writer, diff --git a/snippets/destination_connectors/kdbai.sh.mdx b/snippets/destination_connectors/kdbai.sh.mdx index 4702cbc7..083736f1 100644 --- a/snippets/destination_connectors/kdbai.sh.mdx +++ b/snippets/destination_connectors/kdbai.sh.mdx @@ -7,7 +7,7 @@ unstructured-ingest \ local \ --input-path $LOCAL_FILE_INPUT_DIR \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --partition-by-api \ --api-key $UNSTRUCTURED_API_KEY \ --partition-endpoint $UNSTRUCTURED_API_URL \ diff --git a/snippets/destination_connectors/kdbai.v2.py.mdx b/snippets/destination_connectors/kdbai.v2.py.mdx index 6e526890..65361e16 100644 --- a/snippets/destination_connectors/kdbai.v2.py.mdx +++ b/snippets/destination_connectors/kdbai.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=KdbaiConnectionConfig( access_config=KdbaiAccessConfig( api_key=os.getenv("KDBAI_API_KEY") diff --git a/snippets/destination_connectors/milvus.sh.mdx b/snippets/destination_connectors/milvus.sh.mdx index 53563aef..5c017238 100644 --- a/snippets/destination_connectors/milvus.sh.mdx +++ b/snippets/destination_connectors/milvus.sh.mdx @@ -7,7 +7,7 @@ unstructured-ingest \ local \ --input-path $LOCAL_FILE_INPUT_DIR \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --partition-by-api \ --api-key $UNSTRUCTURED_API_KEY \ --partition-endpoint $UNSTRUCTURED_API_URL \ diff --git a/snippets/destination_connectors/milvus.v2.py.mdx b/snippets/destination_connectors/milvus.v2.py.mdx index 3f0e430f..824162c3 100644 --- a/snippets/destination_connectors/milvus.v2.py.mdx +++ b/snippets/destination_connectors/milvus.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=MilvusConnectionConfig( access_config=MilvusAccessConfig( password=os.getenv("MILVUS_PASSWORD") diff --git a/snippets/destination_connectors/mongodb.sh.mdx b/snippets/destination_connectors/mongodb.sh.mdx index 00940bf1..ac7d6219 100644 --- a/snippets/destination_connectors/mongodb.sh.mdx +++ b/snippets/destination_connectors/mongodb.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/destination_connectors/mongodb.v1.py.mdx b/snippets/destination_connectors/mongodb.v1.py.mdx index a66f1da6..55ff1bcc 100644 --- a/snippets/destination_connectors/mongodb.v1.py.mdx +++ b/snippets/destination_connectors/mongodb.v1.py.mdx @@ -50,7 +50,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/mongodb.v2.py.mdx b/snippets/destination_connectors/mongodb.v2.py.mdx index 06465c49..83aa577a 100644 --- a/snippets/destination_connectors/mongodb.v2.py.mdx +++ b/snippets/destination_connectors/mongodb.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=MongoDBConnectionConfig( access_config=MongoDBAccessConfig( uri=os.getenv("MONGODB_URI") diff --git a/snippets/destination_connectors/opensearch.sh.mdx b/snippets/destination_connectors/opensearch.sh.mdx index 6afd2af0..b2e0a565 100644 --- a/snippets/destination_connectors/opensearch.sh.mdx +++ b/snippets/destination_connectors/opensearch.sh.mdx @@ -8,7 +8,7 @@ unstructured-ingest \ --input-path $LOCAL_FILE_INPUT_DIR \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --partition-by-api \ --api-key $UNSTRUCTURED_API_KEY \ --partition-endpoint $UNSTRUCTURED_API_URL \ diff --git a/snippets/destination_connectors/opensearch.v1.py.mdx b/snippets/destination_connectors/opensearch.v1.py.mdx index 985c88b5..da7e7dd9 100644 --- a/snippets/destination_connectors/opensearch.v1.py.mdx +++ b/snippets/destination_connectors/opensearch.v1.py.mdx @@ -61,7 +61,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/opensearch.v2.py.mdx b/snippets/destination_connectors/opensearch.v2.py.mdx index e558d290..8e52b7ff 100644 --- a/snippets/destination_connectors/opensearch.v2.py.mdx +++ b/snippets/destination_connectors/opensearch.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=OpenSearchConnectionConfig( access_config=OpenSearchAccessConfig( password=os.getenv("OPENSEARCH_PASSWORD"), diff --git a/snippets/destination_connectors/pinecone.sh.mdx b/snippets/destination_connectors/pinecone.sh.mdx index 3269cb0f..cb2a1acc 100644 --- a/snippets/destination_connectors/pinecone.sh.mdx +++ b/snippets/destination_connectors/pinecone.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --api-key $UNSTRUCTURED_API_KEY \ diff --git a/snippets/destination_connectors/pinecone.v1.py.mdx b/snippets/destination_connectors/pinecone.v1.py.mdx index 6f4ab42b..76c640d1 100644 --- a/snippets/destination_connectors/pinecone.v1.py.mdx +++ b/snippets/destination_connectors/pinecone.v1.py.mdx @@ -57,7 +57,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None ), writer=writer, diff --git a/snippets/destination_connectors/pinecone.v2.py.mdx b/snippets/destination_connectors/pinecone.v2.py.mdx index 9def2634..824339bd 100644 --- a/snippets/destination_connectors/pinecone.v2.py.mdx +++ b/snippets/destination_connectors/pinecone.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=PineconeConnectionConfig( access_config=PineconeAccessConfig( api_key=os.getenv("PINECONE_API_KEY") diff --git a/snippets/destination_connectors/qdrant.py.mdx b/snippets/destination_connectors/qdrant.py.mdx index 154a957e..56a1e5ec 100644 --- a/snippets/destination_connectors/qdrant.py.mdx +++ b/snippets/destination_connectors/qdrant.py.mdx @@ -44,7 +44,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/qdrant.sh.mdx b/snippets/destination_connectors/qdrant.sh.mdx index b9358f5d..2e8c24d5 100644 --- a/snippets/destination_connectors/qdrant.sh.mdx +++ b/snippets/destination_connectors/qdrant.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ qdrant \ diff --git a/snippets/destination_connectors/s3.sh.mdx b/snippets/destination_connectors/s3.sh.mdx index 0029cf6e..ae603599 100644 --- a/snippets/destination_connectors/s3.sh.mdx +++ b/snippets/destination_connectors/s3.sh.mdx @@ -11,7 +11,7 @@ unstructured-ingest \ --partition-endpoint $UNSTRUCTURED_API_URL \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ s3 \ --remote-url $AWS_S3_URL \ diff --git a/snippets/destination_connectors/s3.v1.py.mdx b/snippets/destination_connectors/s3.v1.py.mdx index 497411f6..121c7918 100644 --- a/snippets/destination_connectors/s3.v1.py.mdx +++ b/snippets/destination_connectors/s3.v1.py.mdx @@ -47,7 +47,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None, ), writer=writer, diff --git a/snippets/destination_connectors/s3.v2.py.mdx b/snippets/destination_connectors/s3.v2.py.mdx index 03c3fa15..efd00024 100644 --- a/snippets/destination_connectors/s3.v2.py.mdx +++ b/snippets/destination_connectors/s3.v2.py.mdx @@ -37,7 +37,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=S3ConnectionConfig( access_config=S3AccessConfig( key=os.getenv("AWS_ACCESS_KEY_ID"), diff --git a/snippets/destination_connectors/sftp.v2.py.mdx b/snippets/destination_connectors/sftp.v2.py.mdx index 0a2c5a28..05968c0e 100644 --- a/snippets/destination_connectors/sftp.v2.py.mdx +++ b/snippets/destination_connectors/sftp.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=SftpConnectionConfig( access_config=SftpAccessConfig(password=os.getenv("SFTP_PASSWORD")), host=os.getenv("SFTP_HOST"), diff --git a/snippets/destination_connectors/singlestore.sh.mdx b/snippets/destination_connectors/singlestore.sh.mdx index 16d4e579..4f60dad7 100644 --- a/snippets/destination_connectors/singlestore.sh.mdx +++ b/snippets/destination_connectors/singlestore.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --partition-by-api \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --partition-by-api \ --api-key $UNSTRUCTURED_API_KEY \ --partition-endpoint $UNSTRUCTURED_API_URL \ diff --git a/snippets/destination_connectors/singlestore.v2.py.mdx b/snippets/destination_connectors/singlestore.v2.py.mdx index 60b14e48..7713b68f 100644 --- a/snippets/destination_connectors/singlestore.v2.py.mdx +++ b/snippets/destination_connectors/singlestore.v2.py.mdx @@ -38,7 +38,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=SingleStoreConnectionConfig( host=os.getenv("SINGLESTORE_HOST"), port=os.getenv("SINGLESTORE_PORT"), diff --git a/snippets/destination_connectors/sql.v2.py.mdx b/snippets/destination_connectors/sql.v2.py.mdx index 066f0744..df219dab 100644 --- a/snippets/destination_connectors/sql.v2.py.mdx +++ b/snippets/destination_connectors/sql.v2.py.mdx @@ -54,7 +54,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=SQLConnectionConfig( access_config=SQLAccessConfig( username=os.getenv("PGUSER"), diff --git a/snippets/destination_connectors/weaviate.sh.mdx b/snippets/destination_connectors/weaviate.sh.mdx index 8eafe927..8f5d3e8f 100644 --- a/snippets/destination_connectors/weaviate.sh.mdx +++ b/snippets/destination_connectors/weaviate.sh.mdx @@ -9,7 +9,7 @@ unstructured-ingest \ --output-dir $LOCAL_FILE_OUTPUT_DIR \ --strategy hi_res \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --strategy fast \ diff --git a/snippets/destination_connectors/weaviate.v1.py.mdx b/snippets/destination_connectors/weaviate.v1.py.mdx index 3c31ac52..276a14e4 100644 --- a/snippets/destination_connectors/weaviate.v1.py.mdx +++ b/snippets/destination_connectors/weaviate.v1.py.mdx @@ -53,7 +53,7 @@ if __name__ == "__main__": strategy="hi_res", chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface", + provider="huggingface", api_key=None ), writer=writer, diff --git a/snippets/destination_connectors/weaviate.v2.py.mdx b/snippets/destination_connectors/weaviate.v2.py.mdx index 906c4827..7c4cb5d7 100644 --- a/snippets/destination_connectors/weaviate.v2.py.mdx +++ b/snippets/destination_connectors/weaviate.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=WeaviateConnectionConfig( access_config=WeaviateAccessConfig( api_key=os.getenv("WEAVIATE_API_KEY") diff --git a/snippets/ingest-configuration-shared/embedding-configuration.mdx b/snippets/ingest-configuration-shared/embedding-configuration.mdx index b7df690e..c3514dba 100644 --- a/snippets/ingest-configuration-shared/embedding-configuration.mdx +++ b/snippets/ingest-configuration-shared/embedding-configuration.mdx @@ -10,7 +10,7 @@ A common embedding configuration is a critical component that allows for dynamic *   `aws_secret_access_key`: The AWS secret access key to be used for AWS-based embedders, such as Amazon Bedrock. -*   `embedding_provider`: The embedding provider to use while doing embedding. Available values include `langchain-openai`, `langchain-huggingface`, `langchain-aws-bedrock`, `langchain-vertexai`, `langchain-voyageai`, and `octoai`. +*   `embedding_provider`: The embedding provider to use while doing embedding. Available values include `openai`, `huggingface`, `aws-bedrock`, `vertexai`, `voyageai`, and `octoai`. *   `embedding_api_key`: The API key to use, if one is required to generate the embeddings through an API service, such as OpenAI. @@ -24,20 +24,20 @@ A common embedding configuration is a critical component that allows for dynamic *   `model_name`: The specific model to use for the embedding provider, if necessary. -*   `provider`: The embedding provider to use while doing embedding. Available values include `langchain-openai`, `langchain-huggingface`, `langchain-aws-bedrock`, `langchain-vertexai`, `langchain-voyageai`, and `octoai`. +*   `provider`: The embedding provider to use while doing embedding. Available values include `openai`, `huggingface`, `aws-bedrock`, `vertexai`, `voyageai`, and `octoai`.   The default `model_name` values unless otherwise specified are: -* `langchain-openai`: `text-embedding-ada-002` +* `openai`: `text-embedding-ada-002` -* `langchain-huggingface`: `sentence-transformers/all-MiniLM-L6-v2` +* `huggingface`: `sentence-transformers/all-MiniLM-L6-v2` -* `langchain-aws-bedrock`: None +* `aws-bedrock`: None -* `langchain-vertexai`: `textembedding-gecko@001` +* `vertexai`: `textembedding-gecko@001` -* `langchain-voyageai`: None +* `voyageai`: None * `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1` diff --git a/snippets/source_connectors/azure.sh.mdx b/snippets/source_connectors/azure.sh.mdx index 6f6e054c..57085f77 100644 --- a/snippets/source_connectors/azure.sh.mdx +++ b/snippets/source_connectors/azure.sh.mdx @@ -12,7 +12,7 @@ unstructured-ingest \ --partition-endpoint $UNSTRUCTURED_API_URL \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ local \ --output-dir $LOCAL_FILE_OUTPUT_DIR diff --git a/snippets/source_connectors/azure.v2.py.mdx b/snippets/source_connectors/azure.v2.py.mdx index 61e52171..18d4afa4 100644 --- a/snippets/source_connectors/azure.v2.py.mdx +++ b/snippets/source_connectors/azure.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() diff --git a/snippets/source_connectors/box.v2.py.mdx b/snippets/source_connectors/box.v2.py.mdx index 9ea57d0f..e67348b8 100644 --- a/snippets/source_connectors/box.v2.py.mdx +++ b/snippets/source_connectors/box.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/couchbase.v2.py.mdx b/snippets/source_connectors/couchbase.v2.py.mdx index ad98c9e0..78f55117 100644 --- a/snippets/source_connectors/couchbase.v2.py.mdx +++ b/snippets/source_connectors/couchbase.v2.py.mdx @@ -47,7 +47,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=LocalConnectionConfig() ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/dropbox.v2.py.mdx b/snippets/source_connectors/dropbox.v2.py.mdx index 60441b3b..cd4302a9 100644 --- a/snippets/source_connectors/dropbox.v2.py.mdx +++ b/snippets/source_connectors/dropbox.v2.py.mdx @@ -42,7 +42,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/elasticsearch.v2.py.mdx b/snippets/source_connectors/elasticsearch.v2.py.mdx index 19887607..1b5cb413 100644 --- a/snippets/source_connectors/elasticsearch.v2.py.mdx +++ b/snippets/source_connectors/elasticsearch.v2.py.mdx @@ -50,7 +50,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=LocalConnectionConfig() ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/gcs.v2.py.mdx b/snippets/source_connectors/gcs.v2.py.mdx index 31129277..19d93574 100644 --- a/snippets/source_connectors/gcs.v2.py.mdx +++ b/snippets/source_connectors/gcs.v2.py.mdx @@ -41,7 +41,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/google_drive.sh.mdx b/snippets/source_connectors/google_drive.sh.mdx index e1155557..6d1da5a4 100644 --- a/snippets/source_connectors/google_drive.sh.mdx +++ b/snippets/source_connectors/google_drive.sh.mdx @@ -14,7 +14,7 @@ unstructured-ingest \ --partition-endpoint $UNSTRUCTURED_API_URL \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ local \ --output-dir $LOCAL_FILE_OUTPUT_DIR diff --git a/snippets/source_connectors/google_drive.v2.py.mdx b/snippets/source_connectors/google_drive.v2.py.mdx index 06cb7b68..e055ba9a 100644 --- a/snippets/source_connectors/google_drive.v2.py.mdx +++ b/snippets/source_connectors/google_drive.v2.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/kafka.sh.mdx b/snippets/source_connectors/kafka.sh.mdx index 8839c987..5bf37545 100644 --- a/snippets/source_connectors/kafka.sh.mdx +++ b/snippets/source_connectors/kafka.sh.mdx @@ -15,7 +15,7 @@ unstructured-ingest \ --timeout 1.0 --output-dir $LOCAL_FILE_OUTPUT_DIR \ --chunk-elements \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --num-processes 2 \ --verbose \ --partition-by-api \ diff --git a/snippets/source_connectors/kafka.v1.py.mdx b/snippets/source_connectors/kafka.v1.py.mdx index 0d14c84e..59fa063e 100644 --- a/snippets/source_connectors/kafka.v1.py.mdx +++ b/snippets/source_connectors/kafka.v1.py.mdx @@ -40,7 +40,7 @@ if __name__ == "__main__": ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( - provider="langchain-huggingface" + provider="huggingface" ) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/local.sh.mdx b/snippets/source_connectors/local.sh.mdx index 772d1945..b21063cc 100644 --- a/snippets/source_connectors/local.sh.mdx +++ b/snippets/source_connectors/local.sh.mdx @@ -11,7 +11,7 @@ unstructured-ingest \ --partition-endpoint $UNSTRUCTURED_API_URL \ --strategy hi_res \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ local \ --output-dir $LOCAL_FILE_OUTPUT_DIR diff --git a/snippets/source_connectors/local.v2.py.mdx b/snippets/source_connectors/local.v2.py.mdx index abf2954b..258b4467 100644 --- a/snippets/source_connectors/local.v2.py.mdx +++ b/snippets/source_connectors/local.v2.py.mdx @@ -33,7 +33,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/onedrive.v2.py.mdx b/snippets/source_connectors/onedrive.v2.py.mdx index b08ea8eb..3ff19b91 100644 --- a/snippets/source_connectors/onedrive.v2.py.mdx +++ b/snippets/source_connectors/onedrive.v2.py.mdx @@ -46,7 +46,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=LocalConnectionConfig() ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/opensearch.v2.py.mdx b/snippets/source_connectors/opensearch.v2.py.mdx index 1744eba8..8ed06566 100644 --- a/snippets/source_connectors/opensearch.v2.py.mdx +++ b/snippets/source_connectors/opensearch.v2.py.mdx @@ -46,7 +46,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), destination_connection_config=LocalConnectionConfig() ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/s3.sh.mdx b/snippets/source_connectors/s3.sh.mdx index 850cc4d0..89a5f9da 100644 --- a/snippets/source_connectors/s3.sh.mdx +++ b/snippets/source_connectors/s3.sh.mdx @@ -14,7 +14,7 @@ unstructured-ingest \ --partition-endpoint $UNSTRUCTURED_API_URL \ --strategy fast \ --chunking-strategy by_title \ - --embedding-provider langchain-huggingface \ + --embedding-provider huggingface \ --strategy hi_res \ --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \ local \ diff --git a/snippets/source_connectors/s3.v2.py.mdx b/snippets/source_connectors/s3.v2.py.mdx index bdfdea33..0b4a7979 100644 --- a/snippets/source_connectors/s3.v2.py.mdx +++ b/snippets/source_connectors/s3.v2.py.mdx @@ -39,7 +39,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/salesforce.v2.py.mdx b/snippets/source_connectors/salesforce.v2.py.mdx index cabc99cd..f1442732 100644 --- a/snippets/source_connectors/salesforce.v2.py.mdx +++ b/snippets/source_connectors/salesforce.v2.py.mdx @@ -42,7 +42,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file diff --git a/snippets/source_connectors/sftp.v2.py.mdx b/snippets/source_connectors/sftp.v2.py.mdx index 8b9b837d..a76fc833 100644 --- a/snippets/source_connectors/sftp.v2.py.mdx +++ b/snippets/source_connectors/sftp.v2.py.mdx @@ -45,7 +45,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() diff --git a/snippets/source_connectors/sharepoint.v2.py.mdx b/snippets/source_connectors/sharepoint.v2.py.mdx index 7fceb075..e75d8346 100644 --- a/snippets/source_connectors/sharepoint.v2.py.mdx +++ b/snippets/source_connectors/sharepoint.v2.py.mdx @@ -52,7 +52,7 @@ if __name__ == "__main__": } ), chunker_config=ChunkerConfig(chunking_strategy="by_title"), - embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"), + embedder_config=EmbedderConfig(embedding_provider="huggingface"), uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR")) ).run() ``` \ No newline at end of file