Unstructured-IO · Paul-Cornell · Oct 4, 2024 · Oct 4, 2024
diff --git a/api-reference/how-to/embedding.mdx b/api-reference/how-to/embedding.mdx
@@ -43,41 +43,41 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
 
 1. Choose an embedding provider that you want to use from among the following allowed providers, and note the provider's ID:
 
-   - The provider ID `langchain-aws-bedrock` for [Amazon Bedrock](https://aws.amazon.com/bedrock/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/bedrock/).
-   - `langchain-huggingface` for [Hugging Face](https://huggingface.co/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/).
-   - `langchain-openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
-   - `langchain-vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
-   - `langchain-voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
+   - The provider ID `aws-bedrock` for [Amazon Bedrock](https://aws.amazon.com/bedrock/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/bedrock/).
+   - `huggingface` for [Hugging Face](https://huggingface.co/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/).
+   - `openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
+   - `vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
+   - `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
    - `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
    - `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
 
 2. Run the following command to install the required Python package for the embedding provider:
 
-   - For `langchain-aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`.
-   - For `langchain-huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`.
-   - For `langchain-openai`, run `pip install "unstructured-ingest[openai]"`.
-   - For `langchain-vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`.
-   - For `langchain-voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`.
+   - For `aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`.
+   - For `huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`.
+   - For `openai`, run `pip install "unstructured-ingest[openai]"`.
+   - For `vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`.
+   - For `voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`.
    - For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
    - For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
 
 3. For the following embedding providers, you can choose the model that you want to use. If you do choose a model, note the model's name:
 
-   - `langchain-aws-bedrock`. [Choose a model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). No default model is provided. [Learn more about the supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html).
-   - `langchain-huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
-   - `langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
-   - `langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
-   - `langchain-voyageai`.  [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
+   - `aws-bedrock`. [Choose a model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html). No default model is provided. [Learn more about the supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html).
+   - `huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
+   - `openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
+   - `vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
+   - `voyageai`.  [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
    - `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
    - `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
 
 4. Note the special settings to connect to the provider:
 
-   - For `langchain-aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
-   - For `langchain-huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation. 
-   - For `langchain-openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key).
-   - For `langchain-vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth).
-   - For `langchain-voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).
+   - For `aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
+   - For `huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation. 
+   - For `openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key).
+   - For `vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth).
+   - For `voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).
    - For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
    - For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
 
@@ -87,10 +87,10 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
     <Accordion title="Ingest CLI">
         For the [source connector](/api-reference/ingest/source-connectors/overview) command:
 
-        - Set the command's `--embedding-provider` to the provider's ID, for example `langchain-huggingface`.
+        - Set the command's `--embedding-provider` to the provider's ID, for example `huggingface`.
         - Set `--embedding-model-name` to the model name, as applicable, for example `sentence-transformers/sentence-t5-xl`. Or omit this to use the default model, as applicable.
         - Set `--embedding-api-key` to the provider's required API key value or credentials JSON file path, as appropriate.
-        - For `langchain-aws-bedrock`:
+        - For `aws-bedrock`:
 
           - Set `--embedding-aws-access-key-id` to the AWS access key value.
           - Set `--embedding-aws-secret-access-key` to the corresponding AWS secret access key value.
@@ -99,10 +99,10 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
     <Accordion title="Ingest Python library">
         For the [source connector's](/api-reference/ingest/source-connectors/overview) `EmbedderConfig` object:
 
-        - Set the `embedding_provider` parameter to the provider's ID, for example `langchain-huggingface`.
+        - Set the `embedding_provider` parameter to the provider's ID, for example `huggingface`.
         - Set `embedding_model_name` to the model name, as applicable, for example `sentence-transformers/sentence-t5-xl`. Or omit this to use the default model, as applicable.
         - Set `embedding_api_key` to the provider's required API key value or credentials JSON file path, as appropriate.
-        - For `langchain-aws-bedrock`:
+        - For `aws-bedrock`:
 
           - Set `embedding_aws_access_key_id` to the AWS access key value.
           - Set `embedding_aws_secret_access_key` to the corresponding AWS secret access key value.

diff --git a/snippets/destination_connectors/astradb.sh.mdx b/snippets/destination_connectors/astradb.sh.mdx
@@ -9,7 +9,7 @@ unstructured-ingest \
     --partition-by-api \
     --strategy hi_res \
     --chunking-strategy by_title \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --partition-by-api \
     --api-key $UNSTRUCTURED_API_KEY \
     --partition-endpoint $UNSTRUCTURED_API_URL \

diff --git a/snippets/destination_connectors/astradb.v1.py.mdx b/snippets/destination_connectors/astradb.v1.py.mdx
@@ -57,7 +57,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=None,
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/astradb.v2.py.mdx b/snippets/destination_connectors/astradb.v2.py.mdx
@@ -39,7 +39,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=AstraDBConnectionConfig(
             access_config=AstraDBAccessConfig(
                 api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"),

diff --git a/snippets/destination_connectors/azure.sh.mdx b/snippets/destination_connectors/azure.sh.mdx
@@ -11,7 +11,7 @@ unstructured-ingest \
     --partition-endpoint $UNSTRUCTURED_API_URL \
     --strategy hi_res \
     --chunking-strategy by_title \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
   azure \
     --remote-url $AZURE_STORAGE_REMOTE_URL \

diff --git a/snippets/destination_connectors/azure.v1.py.mdx b/snippets/destination_connectors/azure.v1.py.mdx
@@ -51,7 +51,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=None
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/azure.v2.py.mdx b/snippets/destination_connectors/azure.v2.py.mdx
@@ -38,7 +38,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=AzureConnectionConfig(
             access_config=AzureAccessConfig(
                 account_name=os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),

diff --git a/snippets/destination_connectors/azure_cognitive_search.sh.mdx b/snippets/destination_connectors/azure_cognitive_search.sh.mdx
@@ -8,7 +8,7 @@ unstructured-ingest \
     --input-path $LOCAL_FILE_INPUT_DIR \
     --output-dir $LOCAL_FILE_OUTPUT_DIR \
     --chunk-elements \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --num-processes 2 \
     --verbose \
     --partition-by-api \

diff --git a/snippets/destination_connectors/azure_cognitive_search.v1.py.mdx b/snippets/destination_connectors/azure_cognitive_search.v1.py.mdx
@@ -52,7 +52,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=None
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx b/snippets/destination_connectors/azure_cognitive_search.v2.py.mdx
@@ -38,7 +38,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=AzureCognitiveSearchConnectionConfig(
             access_config=AzureCognitiveSearchAccessConfig(
                 key=os.getenv("AZURE_SEARCH_API_KEY")

diff --git a/snippets/destination_connectors/box.sh.mdx b/snippets/destination_connectors/box.sh.mdx
@@ -9,7 +9,7 @@ unstructured-ingest \
     --output-dir $LOCAL_FILE_OUTPUT_DIR \
     --strategy hi_res \
     --chunk-elements \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --num-processes 2 \
     --verbose \
     --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \

diff --git a/snippets/destination_connectors/box.v1.py.mdx b/snippets/destination_connectors/box.v1.py.mdx
@@ -52,7 +52,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=None,
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/box.v2.py.mdx b/snippets/destination_connectors/box.v2.py.mdx
@@ -39,7 +39,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=BoxConnectionConfig(
             access_config=BoxAccessConfig(
                 box_app_config=os.getenv("BOX_APP_CONFIG_PATH")

diff --git a/snippets/destination_connectors/chroma.sh.mdx b/snippets/destination_connectors/chroma.sh.mdx
@@ -8,7 +8,7 @@ unstructured-ingest \
     --input-path $LOCAL_FILE_INPUT_DIR \
     --output-dir $LOCAL_FILE_OUTPUT_DIR \
     --chunk-elements \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --num-processes 2 \
     --verbose \
     --work-dir $WORK_DIR \

diff --git a/snippets/destination_connectors/chroma.v1.py.mdx b/snippets/destination_connectors/chroma.v1.py.mdx
@@ -56,7 +56,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=None,
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/chroma.v2.py.mdx b/snippets/destination_connectors/chroma.v2.py.mdx
@@ -39,7 +39,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=ChromaConnectionConfig(
             access_config=ChromaAccessConfig(
                 settings={"persist_directory":"./chroma-persist"},

diff --git a/snippets/destination_connectors/couchbase.sh.mdx b/snippets/destination_connectors/couchbase.sh.mdx
@@ -9,7 +9,7 @@ unstructured-ingest \
     --output-dir $LOCAL_FILE_OUTPUT_DIR \
     --strategy hi_res \
     --chunk-elements \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --num-processes 2 \
     --verbose \
     --partition-by-api \

diff --git a/snippets/destination_connectors/couchbase.v2.py.mdx b/snippets/destination_connectors/couchbase.v2.py.mdx
@@ -39,7 +39,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=CouchbaseConnectionConfig(
             access_config=CouchbaseAccessConfig(
                 password=os.getenv("CB_PASSWORD"),

diff --git a/snippets/destination_connectors/databricks_volumes.sh.mdx b/snippets/destination_connectors/databricks_volumes.sh.mdx
@@ -15,7 +15,7 @@ unstructured-ingest \
     --chunking-strategy by_title \
     --chunk-api-key $UNSTRUCTURED_API_KEY \
     --chunking-endpoint $UNSTRUCTURED_API_URL \ 
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --embedding-model-name sentence-transformers/all-mpnet-base-v2 \
   databricks-volumes \
     --host $DATABRICKS_HOST \

diff --git a/snippets/destination_connectors/databricks_volumes.v1.py.mdx b/snippets/destination_connectors/databricks_volumes.v1.py.mdx
@@ -63,7 +63,7 @@ if __name__ == "__main__":
             chunking_strategy="by_title",
         ),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             model_name="sentence-transformers/all-mpnet-base-v2",
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/databricks_volumes.v2.py.mdx b/snippets/destination_connectors/databricks_volumes.v2.py.mdx
@@ -44,7 +44,7 @@ if __name__ == "__main__":
             chunking_strategy="by_title"
         ),
         embedder_config=EmbedderConfig(
-            embedding_provider="langchain-huggingface",
+            embedding_provider="huggingface",
             embedding_model_name="sentence-transformers/all-mpnet-base-v2"
         ),
         destination_connection_config=DatabricksVolumesConnectionConfig(

diff --git a/snippets/destination_connectors/delta_table.py.mdx b/snippets/destination_connectors/delta_table.py.mdx
@@ -49,7 +49,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=None,
         ),
         writer=writer,

diff --git a/snippets/destination_connectors/delta_table.sh.mdx b/snippets/destination_connectors/delta_table.sh.mdx
@@ -9,7 +9,7 @@ unstructured-ingest \
     --output-dir $LOCAL_FILE_OUTPUT_DIR \
     --strategy hi_res \
     --chunk-elements \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --num-processes 2 \
     --verbose \
   delta-table \

diff --git a/snippets/destination_connectors/dropbox.sh.mdx b/snippets/destination_connectors/dropbox.sh.mdx
@@ -9,7 +9,7 @@ unstructured-ingest \
     --output-dir $LOCAL_FILE_OUTPUT_DIR \
     --strategy hi_res \
     --chunk-elements \
-    --embedding-provider langchain-huggingface \
+    --embedding-provider huggingface \
     --num-processes 2 \
     --verbose \
     --partition-by-api \

diff --git a/snippets/destination_connectors/dropbox.v1.py.mdx b/snippets/destination_connectors/dropbox.v1.py.mdx
@@ -49,7 +49,7 @@ if __name__ == "__main__":
         ),
         chunking_config=ChunkingConfig(chunk_elements=True),
         embedding_config=EmbeddingConfig(
-            provider="langchain-huggingface",
+            provider="huggingface",
             api_key=os.getenv("UNSTRUCTURED_API_KEY"),
             partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
         ),

diff --git a/snippets/destination_connectors/dropbox.v2.py.mdx b/snippets/destination_connectors/dropbox.v2.py.mdx
@@ -38,7 +38,7 @@ if __name__ == "__main__":
             }
         ),
         chunker_config=ChunkerConfig(chunking_strategy="by_title"),
-        embedder_config=EmbedderConfig(embedding_provider="langchain-huggingface"),
+        embedder_config=EmbedderConfig(embedding_provider="huggingface"),
         destination_connection_config=DropboxConnectionConfig(
             access_config=DropboxAccessConfig(
                 token=os.getenv("DROPBOX_ACCESS_TOKEN")