# Chapter 3: Building Your First Question-Answering Engine


>This notebook is based on the open-source project [wow-rag](https://github.com/datawhalechina/wow-rag) by Datawhale China.  
>I’ve adapted and annotated parts of it for personal learning and experimentation.


## 1. Introduction 


In this chapter, we take a hands-on approach to building a functional **Retrieval-Augmented Generation (RAG)** pipeline by implementing four distinct methods to construct a question-answering engine.

Each method demonstrates a different level of control, flexibility, and scalability, helping you understand the design space of modern QA systems.

---

### Why Explore Multiple Methods?

RAG systems typically consist of three main components:
1. **Document Ingestion & Indexing**
2. **Semantic Retrieval**
3. **Answer Generation via LLMs**

While the basic pipeline might seem straightforward, the implementation details—such as how documents are parsed, where vectors are stored, and how queries are synthesized—can significantly impact performance and usability. Exploring multiple methods allows you to:

-  Compare ease of use vs. customizability
-  Evaluate performance across different storage backends (in-memory vs. vector DB)
-  Understand how preprocessing affects search quality
-  Learn how to modularize and scale your RAG system

---

### Overview of the Four Methods

| Method | Description | Use Case |
|--------|-------------|----------|
| **Method 1** | Use `VectorStoreIndex` to build an index directly from documents | Best for quick prototyping and minimal setup |
| **Method 2** | Split documents into `nodes` using `SentenceSplitter`, then build a custom FAISS index | Offers more control over chunking and indexing, improving semantic retrieval |
| **Method 3** | Construct custom `Retriever` + `ResponseSynthesizer` components, then bind into a `QueryEngine` | Suitable for fine-tuning retrieval and generation separately |
| **Method 4** | Use an external **vector DB (Qdrant)** with file-based ingestion, retriever, synthesizer, and metadata filters | Best for production-scale systems with large datasets and advanced filtering needs |


### What We Will Learn

By the end of this chapter, we will be able to:

- Construct RAG pipelines with increasing complexity
- Choose between local or external vector storage backends
- Preprocess documents for better retrieval performance
- Combine custom components to optimize the user’s question-answering experience



## 2. Preparation

At the end of the previous chapter, we proved that the embedding performance of OpenAI's model is generally better than that of the local model, so we directly call the OpenAI model here.

In [1]:
import os
from dotenv import load_dotenv


load_dotenv()
api_key = os.getenv('API_KEY')

base_url = "https://api.openai.com/v1"  
chat_model = "gpt-4.1-nano-2025-04-14"   
emb_model = "text-embedding-3-small"


from llama_index.llms.openai import OpenAI
llm = OpenAI(
    api_key = api_key,
    model = chat_model,
)


from llama_index.embeddings.openai import OpenAIEmbedding
embedding = OpenAIEmbedding(
    api_key = api_key,
    model = emb_model,
)

emb = embedding.get_text_embedding("Hello")
len(emb) # Output 1536 if working

1536

Before everything starts, we need to prepare a document for this part, for example we prepared a example.txt in the  ```./docs/example.txt ```


it still the same excerpt with **Chapter 1**  From [arXiv:2401.03568](https://ar5iv.labs.arxiv.org/html/2401.03568):

In order to to have a grasp of the the search capabilities, I recommend that readers replace the `example.txt` with an article they are familiar with, or at least read it once.

In [2]:
# Read from the specified file, input is List
from llama_index.core import SimpleDirectoryReader,Document
documents = SimpleDirectoryReader(input_files=['./docs/example.txt']).load_data()

## 3. Method 1:  Build  index directly from documents

### 3.1 Build semantic vector index 

By using `LlamaIndex`, we can build a *semantic vector index* from a list of documents .

this line performs:
``` Documents → Text chunks → Embedding vectors → Indexed into a vector store ```

This is equivalent to what we manually did in **Lesson 1 (Hack-a-RAG)**:

-  Manually chunking the text

- Calling embedding API

-  Normalizing + storing in FAISS

But now it's all automated in one line.

In [3]:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents,embed_model=embedding,show_progress=True) 

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

### 3.2 Build Query Engine

In [4]:
query_engine = index.as_query_engine(llm=llm)
# Start answer the question 
response = query_engine.query("what is AI？")

raw_answer = response.response

# Automatically add newline after each period (optionally followed by a space)
formatted_answer = raw_answer.replace('. ', '.\n')

# Print the result with line breaks
print(formatted_answer)

AI refers to systems that can perform tasks typically requiring human intelligence, such as content generation, action prediction, and scenario synthesis.
These systems can be embodied and empathetic, operating in simulated or real environments, and are used across various applications like interactive agents, healthcare, gaming, and manufacturing.
Responsible development and deployment are essential to address ethical concerns, privacy, bias, and safety issues associated with AI technologies.


#### Note: Key Elements of `response`

| Attribute                   | Type                  | Description                                                                 |
|-----------------------------|-----------------------|-----------------------------------------------------------------------------|
| `response.response`         | `str`                 | The **main answer text** generated by the LLM.                              |
| `response.source_nodes`     | `List[NodeWithScore]` | A list of **retrieved document chunks** (nodes) used to form the answer. Each includes metadata, text, and similarity score. |
| `response.metadata`         | `Dict`                | Metadata info about the response. Often contains file paths, creation date, etc. |
| `response.formatted_sources`| `str` (optional)      | A preformatted string of sources, if enabled in config.                     |
| `response.extra_info`       | `Dict` or `None`      | Any additional information added during processing, e.g., confidence scores, reranking data. |


#### ✅ When to Use This Method

- You want a **quick and simple** way to build a RAG pipeline.
- Your dataset is **small to medium-sized** and doesn't require fine-tuned control.
- You're in a **prototype or experimentation** phase and want fast iteration.
- You prefer to let `LlamaIndex` handle chunking, embedding, and indexing for you.

---

#### 🚫 When *Not* to Use This Method

- You need **custom chunking logic** (e.g., sentence-aware splits, metadata injection).
- You want to use **external vector stores** like Qdrant, Pinecone, or Weaviate.
- You’re preparing for **production deployment** and need full control over indexing and retrieval.
- Your documents contain **structured content** (e.g., tables, metadata) that require preprocessing.

---

This method is perfect for fast onboarding, but you’ll eventually need to explore more modular approaches (like the ones in Method 2 and beyond) to scale and optimize performance.


## 4. Method 2: Custom index with FAISS

Here we will split documents into `nodes` using `SentenceSplitter`, then build a custom FAISS index these will offers more control over chunking and indexing, improving semantic retrieval

#### Step-by-Step Breakdown

 1. `SentenceSplitter(chunk_size=512)`

- A **node parser** that breaks long text into smaller **chunks (nodes)**.
- It tries to preserve **sentence boundaries** when splitting.
- Each resulting node contains approximately **512 characters**.
- This improves embedding quality by keeping **semantic units together**.

---

 2. `transformations = [...]`

- A list of **transformation steps** applied to the documents.
- You can chain multiple transformations (e.g., **splitting**, **cleaning**, **metadata injection**).
- In this example, we only use one: `SentenceSplitter`.

---

 3. `run_transformations(documents, transformations=...)`

- This function **applies all transformations** in order to your list of documents.
- The output is a list of **`Node` objects**, each representing a semantically meaningful chunk of text.

### 4.1 Nodes construction (Chunking) 

First we will**manually processes documents into nodes (chunks)** using LlamaIndex’s transformation pipeline.

In [5]:
# 1. Split documents into chunks

from llama_index.core.node_parser import SentenceSplitter
transformations = [SentenceSplitter(chunk_size = 256)]

from llama_index.core.ingestion.pipeline import run_transformations
nodes = run_transformations(documents, transformations=transformations)

In [6]:
# 2. Assign stable, manual IDs to each node, By default each node.id used random UUID.
for i, node in enumerate(nodes):
    node.id_ = str(i)


In [7]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.schema import TextNode  # ✅ updated path
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

# 3. Build FAISS vector store
dims = len(embedding.get_text_embedding("hello"))
faiss_index = faiss.IndexFlatL2(dims) # Why L2 insted of Cos ?
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

####  Note: why use `IndexFlatL2` L2 insted of  Cosine Similarity

- FAISS **does not** have native cosine similarity.
- But cosine and L2 behave similarly when **vectors are normalized**.
- So many embedding providers normalize vectors, making **L2 ≈ Cosine**.

### 4.2 Construct vector index 

In [8]:
# 4. Create index and insert nodes
index = VectorStoreIndex(nodes=[], embed_model=embedding, storage_context=storage_context)
index.insert_nodes(nodes)

### 4.3 Build Query Engine

In [9]:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("what is AI？")

raw_answer = response.response
formatted_answer = raw_answer.replace('. ', '.\n')
print(formatted_answer)

Artificial Intelligence (AI) refers to systems and agents that can perform tasks typically requiring human intelligence, such as learning, decision-making, and problem-solving.
These systems can be used across various domains, including healthcare, gaming, and manufacturing, to enhance efficiency and innovation.
However, their development and deployment require careful consideration of ethical, privacy, and safety concerns to prevent misuse and address potential societal impacts.


### 4.4 Save the Index

We can also save the index locally to save the need to calculate it every time

In [10]:
# save index to disk
persist_dir = "./storage"
index.storage_context.persist(persist_dir)

In [11]:
# load index from disk
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
from llama_index.core import StorageContext, load_index_from_storage

vector_store = FaissVectorStore.from_persist_dir(persist_dir)
storage_context = StorageContext.from_defaults(
    vector_store=vector_store, persist_dir=persist_dir
)
index = load_index_from_storage(storage_context=storage_context,embed_model = embedding)

Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage\docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage\index_store.json.


Reload the index and use the same query, we should get the same reponse

In [12]:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("what is AI？")

raw_answer = response.response
formatted_answer = raw_answer.replace('. ', '.\n')
print(formatted_answer)

Artificial Intelligence (AI) refers to systems and agents that can perform tasks typically requiring human intelligence, such as learning, decision-making, and problem-solving.
These systems can be used in various domains, including healthcare, gaming, and manufacturing, to enhance efficiency and innovation.
However, their development and deployment require careful consideration of ethical, privacy, and safety concerns to prevent misuse and ensure beneficial outcomes.


## 5. Method 3: Custom Retriever + ResponseSynthesizer 

In this method, we manually build the core components of a RAG pipeline using `LlamaIndex`:

- A **Retriever**: Responsible for fetching top-k semantically similar chunks from the index.
- A **ResponseSynthesizer**: Responsible for generating a final answer from the retrieved chunks using the LLM.
- A **RetrieverQueryEngine**: Combines both components to form a flexible and extensible query engine.

---

###  Why use this method?

- Gives you **more granular control** over the retrieval and generation pipeline.
- Allows you to **fine-tune parameters** (e.g., number of retrieved documents, LLM behavior).
- Enables **experimentation and customization**, such as:
  - Using different LLMs for retrieval and response.
  - Adding reranking, metadata filtering, or summarization.

---

###  When is it useful?

- When you're moving beyond prototypes and need **production-ready pipelines**.
- When you want to **optimize retrieval quality and generation separately**.
- When integrating into **larger systems** with multiple components or data sources.

---

This method offers a balanced mix of automation and flexibility, and it's a great stepping stone toward building more advanced and modular RAG systems.

### 5.1 Construct Retriever

In [14]:
from llama_index.core.retrievers import VectorIndexRetriever


kwargs = {'similarity_top_k': 5, 'index': index, 'dimensions': len(emb)} 
retriever = VectorIndexRetriever(**kwargs)

### 5.2 Construct Reponse Synthesizer

In [15]:
from llama_index.core.response_synthesizers  import get_response_synthesizer
response_synthesizer = get_response_synthesizer(llm=llm)

### 5.3 Construct Query Engine

In [16]:
from llama_index.core.query_engine import RetrieverQueryEngine
engine = RetrieverQueryEngine(
      retriever=retriever,
      response_synthesizer=response_synthesizer
        )

### 5.4 GO ！

In [21]:
question = "what is AI？"
answer = engine.query(question)

formatted_answer = answer.response.replace('. ', '.\n')
print(formatted_answer)

Artificial Intelligence (AI) refers to systems and technologies that enable machines to perform tasks that typically require human intelligence.
These include understanding language, recognizing images, making decisions, and learning from data.
AI can be applied across various fields such as healthcare, gaming, manufacturing, and content generation, with a focus on developing responsible and ethical deployment practices to address potential risks like bias, privacy concerns, and manipulation.


### 5.5 Summary
✅ When to Use

- You want **full control** over the retrieval and response synthesis process.
- You need to **fine-tune retrieval parameters**, such as `similarity_top_k`, distance metrics, or filtering strategies.
- You're working in a **modular or production setting**, where components (retriever, generator) may change independently.
- You're experimenting with **multi-stage pipelines**, such as reranking or hybrid retrieval.
- You need to **integrate metadata filtering**, custom prompt templates, or advanced LLM behaviors.
---

ℹ️ **Tip**: You can always start with Method 1 (simple) and migrate to Method 3 when your use case demands more flexibility or optimization.


##  6. Method 4 : Using a Vector Database (Qdrant) for Scalable Retrieval and Filtering



In this section, we explore how to use **Qdrant**, a powerful open-source vector database, as the backend for vector indexing and retrieval.

Unlike previous in-memory approaches (e.g., FAISS), Qdrant supports **persistent storage**, **advanced metadata filtering**, and **production-level performance**.

We’ll walk through how to:

- Load documents and create vector indexes backed by Qdrant
- Use the index for semantic search
- Enhance retrieval with structured metadata filters
- Perform complex logical filtering with Qdrant's native API


---

### 6.1 Package installation 

In this section we need some additional libries: 

1. **qdrant-client**  
   The official Python client for Qdrant, a high-performance vector database.  
   - Enables connection to Qdrant (local or cloud).
   - Allows uploading vectors, managing collections.
   - Supports running similarity searches.

2. **llama-index-vector-stores-qdrant**  
   Adds Qdrant support to LlamaIndex.  
   - Lets you use Qdrant as the backend vector store.
   - Useful for semantic search in a RAG pipeline.

3. **llama-index-readers-file**  
   Provides file reading capabilities for local document ingestion.  
   - Allows loading of `.txt`, `.md`, `.pdf`, etc.
   - Uses tools like `SimpleDirectoryReader`.


In [30]:
%pip install qdrant-client
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-readers-file

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.



### 6.2 Load document

In [22]:
import qdrant_client
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(
    input_files=['./docs/example.txt']
).load_data()

print("Document ID:", documents[0].doc_id)

Document ID: 7717dc8a-7445-4cf2-a630-1923a9000cfa


### 6.3 Construct index

In [23]:
# Create an index over the documents
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

# Initializes a Qdrant client using a local directory named "qdrant" for storage.
qclient = qdrant_client.QdrantClient(path="qdrant") 
# Configure the Vector Store, Think of this as a table or namespace in a database
vector_store = QdrantVectorStore(client=qclient, collection_name="QA")
# Creates a storage configuration that binds the index to your Qdrant vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Build the Vector Index
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context,
    embed_model = embedding
)

  self._client.create_payload_index(


### 6.4 Construct Retriver

In [25]:
from llama_index.core.retrievers import VectorIndexRetriever
kwargs = {'similarity_top_k': 5, 'index': index, 'dimensions': len(emb)} 
retriever = VectorIndexRetriever(**kwargs)

### 6.5 Construct Reponse synthesizer

In [26]:

from llama_index.core.response_synthesizers  import get_response_synthesizer
response_synthesizer = get_response_synthesizer(llm=llm)

### 6.6 Construct Query Engine

In [None]:
question = "What is AI ?"
answer = engine.query(question)

formatted_answer = answer.response.replace('. ', '.\n')
print(formatted_answer)

AI, or Artificial Intelligence, refers to systems and agents designed to perform tasks that typically require human intelligence.
These include content generation, decision-making, learning collaboration policies, and adaptive behaviors across various domains such as healthcare, gaming, manufacturing, and content creation.
AI systems can be multimodal, integrating different types of data like visual and textual information, and are developed to model embodied and empathetic interactions in both simulated and real-world environments.
Responsible development and deployment of AI emphasize transparency, privacy, and minimizing biases to ensure positive societal impacts.


### 6.7 Defining Custom Text Nodes with Metadata for Filtering

In this subsection, we demonstrate how to use **metadata filters** to perform semantic retrieval **only on a subset of nodes** that meet certain criteria.

####  What we are doing:

1. **Manually construct `TextNode` objects**, each with:
   - A text string (e.g. a movie title or summary)
   - Associated metadata (e.g., author, theme, year)

2. **Create a `QdrantVectorStore`** and store the `TextNode`s with embeddings.

3. Use `MetadataFilters` to **limit retrieval** to only those nodes where:
   - The `"theme"` field is equal to `"Mafia"`.

4. Call `retriever.retrieve("What is inception about?")` to:
   - Embed the query
   - Search **only among documents with `"theme": "Mafia"`** for relevant results
   - Return a list of semantically similar `TextNode`s

#### 🎯 Why we do this:

- In real-world applications (e.g., RAG systems in production), you often want to **narrow down the search scope** before doing vector similarity search.
- For example, only retrieve:
  - Documents from a specific department or time period
  - Articles written by a specific author
  - Logs tagged with a specific error type

- **Metadata filtering allows you to combine symbolic rules with semantic retrieval**, improving both accuracy and efficiency.


In [31]:
from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text="The Shawshank Redemption",
        metadata={
            "author": "Stephen King",
            "theme": "Friendship",
            "year": 1994,
        },
    ),
    TextNode(
        text="The Godfather",
        metadata={
            "director": "Francis Ford Coppola",
            "theme": "Mafia",
            "year": 1972,
        },
    ),
    TextNode(
        text="Inception",
        metadata={
            "director": "Christopher Nolan",
            "theme": "Fiction",
            "year": 2010,
        },
    ),
    TextNode(
        text="To Kill a Mockingbird",
        metadata={
            "author": "Harper Lee",
            "theme": "Mafia",
            "year": 1960,
        },
    ),
    TextNode(
        text="1984",
        metadata={
            "author": "George Orwell",
            "theme": "Totalitarianism",
            "year": 1949,
        },
    ),
    TextNode(
        text="The Great Gatsby",
        metadata={
            "author": "F. Scott Fitzgerald",
            "theme": "The American Dream",
            "year": 1925,
        },
    ),
    TextNode(
        text="Harry Potter and the Sorcerer's Stone",
        metadata={
            "author": "J.K. Rowling",
            "theme": "Fiction",
            "year": 1997,
        },
    ),
]

### 6.8 Construct index base on the Custom Text Nodes

In [32]:
vector_store = QdrantVectorStore(client=qclient, collection_name="filter")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(
    nodes, 
    storage_context=storage_context,
    embed_model = embedding
)

### 6.9 Construct metadata filter

In [33]:
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", operator=FilterOperator.EQ, value="Mafia"),
    ]
)

### 6.10 Construct retriever

In [34]:
retriever = index.as_retriever(filters=filters, llm=llm)
retriever.retrieve("What is inception about?")

[NodeWithScore(node=TextNode(id_='6e6ffe25-ef1a-40ac-81fb-436e8eeabf7d', embedding=None, metadata={'director': 'Francis Ford Coppola', 'theme': 'Mafia', 'year': 1972}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='The Godfather', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.1802265105404427)]

**Result explanation**
Only nodes that match the metadata condition are included in the search space.

Among those, the semantic similarity score determines which one is returned.

Since only "The Godfather" matched the theme: Mafia tag, it was returned—even though it’s semantically unrelated to the question.

### 6.11 Advanced metadata filter

We can also combine multiple filters using AND or OR.

In [None]:
from llama_index.core.vector_stores import FilterOperator, FilterCondition

filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", value="Fiction"),
        MetadataFilter(key="year", value=1997, operator=FilterOperator.GT), # Notice that there is a operator GT stand for ">"
    ],
    condition=FilterCondition.AND,
)

retriever = index.as_retriever(filters=filters, llm=llm)
retriever.retrieve("Harry Potter?")

[NodeWithScore(node=TextNode(id_='d83c4f7e-951b-486a-b19e-407ccfbbfbf4', embedding=None, metadata={'director': 'Christopher Nolan', 'theme': 'Fiction', 'year': 2010}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='Inception', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.19519303978658628)]

We can also directly use the filter dictionary as a parameter to construct a retriever, which can build a more complex filter.

In [41]:
retriever = index.as_retriever(
    vector_store_kwargs={"filter": {"theme": "Mafia"}},
    llm=llm
)
retriever.retrieve("What is inception about?")

[NodeWithScore(node=TextNode(id_='d83c4f7e-951b-486a-b19e-407ccfbbfbf4', embedding=None, metadata={'director': 'Christopher Nolan', 'theme': 'Fiction', 'year': 2010}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='Inception', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.4061743952578851),
 NodeWithScore(node=TextNode(id_='602f9373-ace2-45df-9508-ebd1062284bd', embedding=None, metadata={'author': 'F. Scott Fitzgerald', 'theme': 'The American Dream', 'year': 1925}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='The Great Gatsby', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.19638923898952426)]

### 6.12 Using Qdrant's Native Filtering Capabilities with `llama-index`

In addition to `llama-index'`s built-in filtering and retrieval pipeline, you can also directly leverage Qdrant's native vector store filtering capabilities. This can be useful when:

- You want to use complex filtering logic, including nested conditions.

- You want to offload filtering computation to Qdrant, improving performance on large datasets.



In addition to the search methods provided by llama-index, we can also use Qdrant's own search capabilities. That is, Default Qdrant Filters

In [42]:
nodes = [
    TextNode(
        text="りんごとは",
        metadata={"author": "Tanaka", "fruit": "apple", "city": "Tokyo"},
    ),
    TextNode(
        text="Was ist Apfel?",
        metadata={"author": "David", "fruit": "apple", "city": "Berlin"},
    ),
    TextNode(
        text="Orange like the sun",
        metadata={"author": "Jane", "fruit": "orange", "city": "Hong Kong"},
    ),
    TextNode(
        text="Grape is...",
        metadata={"author": "Jane", "fruit": "grape", "city": "Hong Kong"},
    ),
    TextNode(
        text="T-dot > G-dot",
        metadata={"author": "George", "fruit": "grape", "city": "Toronto"},
    ),
    TextNode(
        text="6ix Watermelons",
        metadata={
            "author": "George",
            "fruit": "watermelon",
            "city": "Toronto",
        },
    ),
]

In [69]:
vector_store = QdrantVectorStore(client=qclient, collection_name="default")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(
    nodes, 
    storage_context=storage_context,
    embed_model = embedding
)

In [70]:
from qdrant_client.http.models import Filter, FieldCondition, MatchValue
filters = Filter(
    should=[
        Filter(
            must=[
                FieldCondition(
                    key="fruit",
                    match=MatchValue(value="apple"),
                ),
                FieldCondition(
                    key="city",
                    match=MatchValue(value="Tokyo"),
                ),
            ]
        ),
        Filter(
            must=[
                FieldCondition(
                    key="fruit",
                    match=MatchValue(value="grape"),
                ),
                FieldCondition(
                    key="city",
                    match=MatchValue(value="Toronto"),
                ),
            ]
        ),
    ]
)

In [71]:
retriever = index.as_retriever(
    vector_store_kwargs={"qdrant_filters": filters},
    llm=llm
)

In [72]:
response = retriever.retrieve("Who makes grapes?")
for node in response:
    print("node", node.score)
    print("node", node.text)
    print("node", node.metadata)

node 0.4104185504365517
node T-dot > G-dot
node {'author': 'George', 'fruit': 'grape', 'city': 'Toronto'}
node 0.2518488836316279
node りんごとは
node {'author': 'Tanaka', 'fruit': 'apple', 'city': 'Tokyo'}


Two nodes matched the filters.

Among them, "T-dot > G-dot" scored highest, since it's semantically closer to the query "Who makes grapes?"

"りんごとは" also matched the filter (apple + Tokyo), but was less relevant semantically.

---

This method is ideal when:
- You need scalable vector search with **persistent storage**
- You want to **filter results based on metadata** (e.g., category, author, language, year)
- You're building applications that require **flexible and fast querying at scale**