### Import components

In [1]:
from src.document_loader import load_pdf_document
from src.chunking import chunk_text
from src.embedding import Embedder
from src.vector_store import FaissVectorStore
from src.prompt_constructor import construct_prompt
from src.lm import LM

## Offline Preparation

In [2]:
PDF_DOCUMENT_PATH = r"./data/document.pdf"

In [3]:
LOADED_DOCUMENT_PATH = r"./data/document.md"

### Load document(s)

In [4]:
pages = load_pdf_document(PDF_DOCUMENT_PATH, LOADED_DOCUMENT_PATH)

### Chunk document(s)

In [5]:
chunks = chunk_text(LOADED_DOCUMENT_PATH)

### Embed document(s) chunks

Please install the following package:
```bash
pip install -U sentence-transformers
```

In [6]:
embedder = Embedder()

Model: BAAI/bge-base-en-v1.5
Device: mps


In [7]:
embedded_chunks = embedder.embed_chunks(chunks)

Batches:   0%|          | 0/9 [00:00<?, ?it/s]

In [8]:
embeddings = [chunk["embedding"] for chunk in embedded_chunks]

### Store vectors of embedded document(s) into a vector index

In [9]:
faissVectorStore = FaissVectorStore(len(embeddings[0]))

In [10]:
faissVectorStore.add(embeddings)

## Online Phase

### Embed user query

In [11]:
query = "What is Advanced RAG?"

In [12]:
query_embedding = embedder.embed_query(query)

### Retrieve relevant chunks from the stored vector index

In [13]:
retrieved_chuncks = faissVectorStore.search(query_embedding)

### Construct prompt

In [14]:
prompt = construct_prompt(query, retrieved_chuncks, embedded_chunks)

### Generate answer with SLM

Please run the following commands in your terminal to access to huggingface models.

```bash
pip install huggingface_hub
```

After installing its package, please provide a token to login. 

You can create a token from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

```bash
huggingface-cli login
```

In [15]:
lm = LM()

Device: mps
Model: google/gemma-3-1b-it


Without RAG:

In [16]:
output = lm.generate_text(query)
print(output)

Prompt Length: 7
Context Window: 32768
Max New Tokens: 256
<bos>What is Advanced RAG?

Advanced RAG (Retrieval-Augmented Generation) goes beyond basic RAG by incorporating more sophisticated techniques to improve the accuracy, consistency, and usability of generated responses. It's essentially a layered approach combining retrieval and generation, improving the model's ability to understand context and deliver relevant answers.

Here's a breakdown of key advancements in Advanced RAG:

1.  **Memory Networks/Key-Value Stores:** These networks store and retrieve relevant information from both the input prompt and the retrieved context. This allows the model to explicitly "remember" key details and provide more contextually appropriate answers.

2.  **Structured Context:** Going beyond just token-level context, Advanced RAG utilizes structured context like knowledge graphs or tables to represent information in a more organized way. This helps the model understand relationships between conc

With RAG:

In [17]:
output = lm.generate_text(prompt)
print(output)

Prompt Length: 354
Context Window: 32768
Max New Tokens: 256
<bos>
You are an assistant.
Answer the question using ONLY the provided context.
The answer should be direct and compact.

Context:
[Chunk 30]
_B. Advanced RAG_


Advanced RAG introduces specific improvements to overcome the limitations of Naive RAG. Focusing on enhancing retrieval quality, it employs pre-retrieval and post-retrieval strategies. To tackle the indexing issues, Advanced RAG refines
its indexing techniques through the use of a sliding window
approach, fine-grained segmentation, and the incorporation of
metadata. Additionally, it incorporates several optimization
methods to streamline the retrieval process [8].

[Chunk 20]
The RAG research paradigm is continuously evolving, and
we categorize it into three stages: Naive RAG, Advanced
RAG, and Modular RAG, as showed in Figure 3. Despite
RAG method are cost-effective and surpass the performance
of the native LLM, they also exhibit several limitations.
The developmen