# Lesson 4: Retrieval Methods and Vector Databases

**Objective**: Build a retrieval system that efficiently searches for relevant document chunks.

**Topics**:
- Sparse vs. dense retrieval methods
- Hybrid search methods (e.g., combining BM25 with dense retrieval)
- Overview of vector databases: Milvus, Faiss, Qdrant

**Practical Task**: Set up a vector database and implement a retrieval method.

**Resources**:
- What is a vector database
- Choosing a vector database


#### Load the dataset

In [1]:
from langchain_community.document_loaders import PyPDFLoader
from dotenv import load_dotenv

load_dotenv()

file_path = (
    "../practicos-rag/data/benchmark_data/VFOOD6120 - Excepted items_ Confectionery_ The bounds of confectionery, sweets, chocolates, chocolate biscuits, cakes and biscuits_ Definition of confectionery - HMRC internal manual - GOV.UK.pdf"
)
loader = PyPDFLoader(file_path)
docs = loader.load_and_split()

In [20]:
docs[0].page_content

'From:\nPublished\nUpdated:\nHMRC internal manual\nVAT Food\nHM Revenue & Customs\n(/government/organisations/hm-revenue-customs)\n13 March 2016\n16 February 2023 - See all updates\nVFOOD6120 - Excepted items:Confectionery: The bounds of confectionery,sweets, chocolates, chocolate biscuits,cakes and biscuits: Deﬁnition ofconfectionery\nIn considering the meaning of the term confectionery,\nmost tribunals have considered a number of judicial\nauthorities which have attempted general deﬁnitions.\nCommonly quoted is the purchase tax case of\nPopcorn House Ltd, (1968 4 All ER page 782), where\nthe judges considered the meaning of similar\nconfectionery in the phrase chocolates, sweets and\nBETA This part of GOV.UK is being rebuilt – ﬁnd out what beta means\n( / h e l p / b e t a ) \n GOV.UK\nHome Environment Food and farming\nContents VFOOD4000 VFOOD6000 VFOOD6100\n30/4/24, 14:38 VFOOD6120 - Excepted items: Confectionery: The bounds of confectionery, sweets, chocolates, chocolate biscuits,

### Embeddings function

In [2]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
embedded_document = embedding_model.embed_query(docs[0].page_content)
embedded_document[:3]

[0.024098867550492287, -0.022627031430602074, -0.0012089662486687303]

# A first approach

In [3]:
from dotenv import load_dotenv

load_dotenv()

True

In [4]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

In [6]:
client = QdrantClient(path="/tmp/langchain_qdrant")

In [7]:
client.delete_collection(collection_name="demo_collection")

True

In [8]:
client.create_collection(
    collection_name="demo_collection",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client,
    collection_name="demo_collection",
    embedding=embedding_model,
)

In [9]:
vector_store.add_documents(docs)


['4aa81ef7cd6c4b35be68e94ec07c5f66',
 '0789cc5a689d40338cd93ec3eb6da146',
 '41b6d1510b6844649703fc0fcdcb0846',
 '5585dfec03d7405b97acd61320410566']

In [10]:
client.scroll(collection_name="demo_collection", limit=3)

([Record(id='0789cc5a689d40338cd93ec3eb6da146', payload={'page_content': 'similar confectionery (including drained, glacé or\ncrystallised fruits).\nThe following deﬁnition of confectionery arose from\nthat case: any form of food normally eaten with the\nﬁngers and made by a cooking process, other than\nbaking, which contains a substantial amount of\nsweetening matter. Both chocolates and sweets fall\nwithin this deﬁnition: they are normally eaten with the\nﬁngers, they are not made by baking, and they have\nsubstantial amounts of sweetening matter in them.\nHowever, the High Court, in the case of Premier\nFoods ([2007] EWHC 3134 (Ch)) has subsequently\ncommented that the criteria of baking and sweetening\nare not to be relied upon. Accordingly, it appears the\nerror of the Tribunal in applying the dictum of Mr\nJustice Lawton in Popcorn must be recognised as an\nerror of law. Its application also gave rise to two more\nerrors… the tribunal clearly directed themselves that\nfor an item

# Dense search

In [11]:
from langchain_qdrant import RetrievalMode

qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=embedding_model,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.DENSE,
)

query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search(query)

In [12]:
found_docs

[Document(metadata={'source': '../practicos-rag/data/benchmark_data/VFOOD6120 - Excepted items_ Confectionery_ The bounds of confectionery, sweets, chocolates, chocolate biscuits, cakes and biscuits_ Definition of confectionery - HMRC internal manual - GOV.UK.pdf', 'page': 1, '_id': '956b038cd6ea4aee8b4fa9e359930533', '_collection_name': 'my_documents'}, page_content='similar confectionery (including drained, glacé or\ncrystallised fruits).\nThe following deﬁnition of confectionery arose from\nthat case: any form of food normally eaten with the\nﬁngers and made by a cooking process, other than\nbaking, which contains a substantial amount of\nsweetening matter. Both chocolates and sweets fall\nwithin this deﬁnition: they are normally eaten with the\nﬁngers, they are not made by baking, and they have\nsubstantial amounts of sweetening matter in them.\nHowever, the High Court, in the case of Premier\nFoods ([2007] EWHC 3134 (Ch)) has subsequently\ncommented that the criteria of baking and

In [14]:
retriever = qdrant.as_retriever()
retriever.invoke(query)


[Document(metadata={'source': '../practicos-rag/data/benchmark_data/VFOOD6120 - Excepted items_ Confectionery_ The bounds of confectionery, sweets, chocolates, chocolate biscuits, cakes and biscuits_ Definition of confectionery - HMRC internal manual - GOV.UK.pdf', 'page': 1, '_id': '956b038cd6ea4aee8b4fa9e359930533', '_collection_name': 'my_documents'}, page_content='similar confectionery (including drained, glacé or\ncrystallised fruits).\nThe following deﬁnition of confectionery arose from\nthat case: any form of food normally eaten with the\nﬁngers and made by a cooking process, other than\nbaking, which contains a substantial amount of\nsweetening matter. Both chocolates and sweets fall\nwithin this deﬁnition: they are normally eaten with the\nﬁngers, they are not made by baking, and they have\nsubstantial amounts of sweetening matter in them.\nHowever, the High Court, in the case of Premier\nFoods ([2007] EWHC 3134 (Ch)) has subsequently\ncommented that the criteria of baking and

In [25]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

# Define LLM
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

# Define prompt template
QA_generation_prompt = """
Your task is to write a factoid question and an answer given a context.
Your factoid question should be answerable with a specific, concise piece of factual information from the context.
Your factoid question should be formulated in the same style as questions users could ask in a search engine.
This means that your factoid question MUST NOT mention something like "according to the passage" or "context".

Provide your answer as follows:

Output:::
Factoid question: (your factoid question)
Answer: (your answer to the factoid question)

Now here is the context.

Context: {context}\n
Output:::"""

prompt = ChatPromptTemplate.from_template(QA_generation_prompt)

# Setup RAG pipeline
qa_chain = (
    {"context": retriever, "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)


In [27]:
qa_chain.invoke({"context": docs[0].page_content, "question": "What is chocolate?"})

AttributeError: 'dict' object has no attribute 'replace'

In [24]:
from datasets import Dataset

questions = ["What is chocolate?", 
             "What is confectionery?",
             "What is the definition of confectionery?",
            ]
ground_truths = [["Chocolate is a food made from roasted and ground cacao beans."],
                ["Confectionery is a general term for sweet food made from sugar or chocolate."],
                ["Confectionery is a general term for sweet food made from sugar or chocolate."]]
answers = []
contexts = []

# Inference
for query in questions:
  answers.append(qa_chain.invoke(query))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

# To dict
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    "ground_truths": ground_truths
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [15]:
QA_generation_prompt = """
Your task is to write a factoid question and an answer given a context.
Your factoid question should be answerable with a specific, concise piece of factual information from the context.
Your factoid question should be formulated in the same style as questions users could ask in a search engine.
This means that your factoid question MUST NOT mention something like "according to the passage" or "context".

Provide your answer as follows:

Output:::
Factoid question: (your factoid question)
Answer: (your answer to the factoid question)

Now here is the context.

Context: {context}\n
Output:::"""

In [16]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template(QA_generation_prompt)

# Setup RAG pipeline
rag_chain = (
    {"context": retriever,  "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)

  llm = ChatOpenAI(model="gpt-4o-mini")


In [None]:
from datasets import Dataset

questions = ["What did the president say about Justice Breyer?", 
             "What did the president say about Intel's CEO?",
             "What did the president say about gun violence?",
            ]
ground_truths = [["The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service."],
                ["The president said that Pat Gelsinger is ready to increase Intel's investment to $100 billion."],
                ["The president asked Congress to pass proven measures to reduce gun violence."]]
answers = []
contexts = []

# Inference
for query in questions:
  answers.append(rag_chain.invoke(query))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

# To dict
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    "ground_truths": ground_truths
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

# Sparse Vector Search

To search with only sparse vectors,

The retrieval_mode parameter should be set to RetrievalMode.SPARSE.
An implementation of the SparseEmbeddings interface using any sparse embeddings provider has to be provided as value to the sparse_embedding parameter.
The langchain-qdrant package provides a FastEmbed based implementation out of the box.

In [None]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25", cache_dir="cache")

qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=embedding_model,
    sparse_embedding=sparse_embeddings,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.SPARSE,
)

query = "What is chocolate?"
found_docs = qdrant.similarity_search(query)

In [None]:
found_docs

# Hybrid Search

To perform a hybrid search using dense and sparse vectors with score fusion,

The retrieval_mode parameter should be set to RetrievalMode.HYBRID.
A dense embeddings value should be provided to the embedding parameter.
An implementation of the SparseEmbeddings interface using any sparse embeddings provider has to be provided as value to the sparse_embedding parameter.
Note that if you've added documents with the HYBRID mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection.

In [None]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=embedding_model,
    sparse_embedding=sparse_embeddings,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.HYBRID,
)

query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search(query)

In [None]:
found_docs

In [None]:
#If you want to execute a similarity search and receive the corresponding scores you can run:
results = vector_store.similarity_search_with_score(
    query="What is chocolate?", k=1
)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

# Metadata filtering

In [20]:
from qdrant_client.http import models

results = vector_store.similarity_search(
    query="What is chocolate?",
    k=1,
    filter=models.Filter(
        should=[
            models.FieldCondition(
                key="page",
                match=models.MatchValue(
                    value="5"
                ),
            ),
        ]
    ),
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

In [None]:
results

## Query by turning into a retriever

In [None]:
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 5})
retriever.invoke("What is chocolate?")
