SciPhi-AI · emrgnt-cmplxty · Apr 19, 2024 · Apr 19, 2024
diff --git a/docs/pages/getting-started/configure-your-pipeline.mdx b/docs/pages/getting-started/configure-your-pipeline.mdx
@@ -22,7 +22,7 @@ For more information, refer to [vector database providers](/providers/vector-dat
 R2R supports OpenAI and local inference as embedding providers. To configure the embedding settings, update the `embedding` section in the `config.json` file. Specify the desired embedding model, dimension, and batch size according to your requirements. This can easily be extended by request.
 
 - `openai`: Integration with OpenAI, supporting models like `text-embedding-3-small` and `text-embedding-3-large`.
-- `sentence-transformers`: Integration with the sentence transformers library, providing support for models available on HuggingFace, like `all-MiniLM-L6-v2`.
+- `sentence-transformers`: Integration with the sentence transformers library, providing support for models available on HuggingFace, like `mixedbread-ai/mxbai-embed-large-v1`.
 
 For more information, refer to [embedding providers](/providers/embeddings).
 

diff --git a/docs/pages/providers/embeddings.mdx b/docs/pages/providers/embeddings.mdx
@@ -35,4 +35,4 @@ Anything supported by OpenAI, such as:
 - **Pricing**: Approximately 12,500 pages per dollar. Balances cost and performance effectively.
 - **More**: [Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
 
-Lastly, the `sentence_transformer` package from HuggingFace is also supported as a provider. For example, one such popular model is [`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
+Lastly, the `sentence_transformer` package from HuggingFace is also supported as a provider. For example, one such popular model is [`mixedbread-ai/mxbai-embed-large-v1`](https://huggingface.co/sentence-transformers/mixedbread-ai/mxbai-embed-large-v1).
diff --git a/docs/pages/tutorials/configuring_your_rag_pipeline.mdx b/docs/pages/tutorials/configuring_your_rag_pipeline.mdx
@@ -226,7 +226,7 @@ Set the `provider` field under `vector_database` in `config.json` to specify you
 #### Embedding Provider
 R2R supports OpenAI and local inference embedding providers:
 - `openai`: OpenAI models like `text-embedding-3-small`
-- `sentence-transformers`: HuggingFace models like `all-MiniLM-L6-v2`
+- `sentence-transformers`: HuggingFace models like `mixedbread-ai/mxbai-embed-large-v1`
 
 Configure the `embedding` section to set your desired embedding model, dimension, and batch size.
 

diff --git a/docs/pages/tutorials/local_rag.mdx b/docs/pages/tutorials/local_rag.mdx
@@ -59,7 +59,7 @@ To streamline this process, we've provided pre-configured local settings in the
 {
   "embedding": {
     "provider": "sentence-transformers",
-    "model": "all-MiniLM-L6-v2",
+    "model": "mixedbread-ai/mxbai-embed-large-v1",
     "dimension": 384,
     "batch_size": 32
   },
@@ -78,7 +78,7 @@ To streamline this process, we've provided pre-configured local settings in the
 
 You may also modify the configuration defaults for ingestion, logging, and your vector database provider in a similar manner. More information on this follows below.
 
-This chosen config modification above instructs R2R to use the `sentence-transformers` library for embeddings with the `all-MiniLM-L6-v2` model, turns off evals, and sets the LLM provider to `ollama`. During ingestion, the default is to split documents into chunks of 512 characters with 20 characters of overlap between chunks. 
+This chosen config modification above instructs R2R to use the `sentence-transformers` library for embeddings with the `mixedbread-ai/mxbai-embed-large-v1` model, turns off evals, and sets the LLM provider to `ollama`. During ingestion, the default is to split documents into chunks of 512 characters with 20 characters of overlap between chunks. 
 
 A local vector database will be used to store the embeddings. The current default is a minimal sqlite implementation, with plans to migrate the tutorial to LanceDB shortly.
 
@@ -117,7 +117,7 @@ The output should look something like this:
 Here's what's happening under the hood:
 1. R2R loads the included PDF and converts it to text using PyPDF2.
 2. It splits the text into chunks of 512 characters each, with 20 characters overlapping between chunks.
-3. Each chunk is embedded using the `all-MiniLM-L6-v2` model from `sentence-transformers`.
+3. Each chunk is embedded using the `mixedbread-ai/mxbai-embed-large-v1` model from `sentence-transformers`.
 4. The chunks and embeddings are stored in the specified vector database, which defaults to a local SQLite database.
 
 With just one command, we've gone from a raw document to an embedded knowledge base we can query. In addition to the raw chunks, metadata such as user ID or document ID can be attached to enable easy filtering later.
@@ -151,7 +151,7 @@ python -m r2r.examples.clients.run_qna_client rag_completion \
 ```
 
 This command tells R2R to use the specified model to generate a completion for the given query. R2R will:
-1. Embed the query using `all-MiniLM-L6-v2`.
+1. Embed the query using `mixedbread-ai/mxbai-embed-large-v1`.
 2. Find the chunks most similar to the query embedding.
 3. Pass the query and relevant chunks to the LLM to generate a response.
 
@@ -176,7 +176,7 @@ Set the `provider` field under `vector_database` in `config.json` to specify you
 #### Embedding Provider
 R2R supports OpenAI and local inference embedding providers:
 - `openai`: OpenAI models like `text-embedding-3-small`
-- `sentence-transformers`: HuggingFace models like `all-MiniLM-L6-v2`
+- `sentence-transformers`: HuggingFace models like `mixedbread-ai/mxbai-embed-large-v1`
 
 Configure the `embedding` section to set your desired embedding model, dimension, and batch size.