# Building RAG pipelines with Optimized Embedding Models

In the following notebook we will show how to utilize two fastRAG components that use an optimized and quantized bi-encoder.

We will showcase `QuantizedBiEncoderRetriever` for embedding documents in a vectors store, and `QuantizedBiEncoderRanker` for re-ranking documents in a retrieval pipeline.

**NOTE**: Please read carefuly the [guide](../scripts/optimizations/embedders/README.md) we provided on how to maximize the speed/latency on Intel Xeon backends.

First, lets build an index. We define the embedding dimension to be as the embedding model, and `return_embedding=True` so we could look at the embeddings.

In [None]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=False, embedding_dim=384, return_embedding=True)

In [None]:
from haystack.schema import Document

# 3 example documents to index
examples = [
    "There is a blue house on Oxford street",
    "Paris is the capital of France",
    "fastRAG had its first commit in 2022"
]

documents = []
for i, d in enumerate(examples):
    documents.append(Document(content=d, id=i))

document_store.write_documents(documents)

Initialize and load an optimized embedding model into a Bi-encoder retriever.

In [None]:
from fastrag.retrievers import QuantizedBiEncoderRetriever

retriever = QuantizedBiEncoderRetriever(document_store=document_store, embedding_model="<path to model>")

Update the embedding vectors of all documents in the index with encoder. 

In [None]:
document_store.update_embeddings(retriever=retriever)

We can look at the embedding vectors stores in the index. For example, lets look at the first document's embedding vector.

In [None]:
docs = document_store.get_all_documents()
docs[0].embedding.shape

---

## Adding an optimized ranker

We can add an optimized ranker to re-order the documents coming from the retriever. 
Note that this is component has no dependencies on the previous retrieval steps. It takes the document content and query, and encodes all to vectors to be re-ordered by ordering the similarities in a descending order.

In [None]:
from fastrag.rankers import QuantizedBiEncoderRanker

ranker = QuantizedBiEncoderRanker("<path to model>")

Combining all into a pipeline.

In [None]:
from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="retriever", inputs=["Query"])
p.add_node(component=ranker, name="ranker", inputs=["retriever"])

In [None]:
p.run(query="What is Paris?")