NestedRAG: Hierarchical Retrieval-Augmented Generation

A RAG architecture that uses hierarchical semantic chunking and graph-based context exclusion to maximize the amount of relevant information in the retrieved context while minimizing the total volume.

Introduction

Traditional RAG systems face a critical trade-off: retrieving larger chunks provides more context but includes irrelevant information, while smaller chunks are more focused but may lack necessary context. NestedRAG addresses this by optimizing the balance between relevance and chunk size.

Main Idea

NestedRAG dynamically selects optimally-sized chunks by:

Hierarchical Tree Structure: Documents are recursively split into semantic units, creating a tree where each branch represents nested text segments at different granularities
Branch-Level Selection: During retrieval, only one datapoint per branch is selected, ensuring we get the most relevant segment from each hierarchical chain

Why This Matters

Each query gets optimally-sized chunks based on where relevance lies in the hierarchy
Higher relevant-to-total information ratio
No nested or overlapping chunks in results
Retrieves from diverse document branches automatically

Drawbacks

Increased number of data points: The algorithm produces s^l distinct data points, where s is the number of semantic chunks per level and l is the number of levels.

Application Area

This architecture was developed for use cases where the source data is long, unstructured text, e.g., conversation transcripts, long-form articles, etc.

Architecture

Hierarchical Chunking

Root (Full Document)
├── Level 1 Chunk A
│   ├── Level 2 Chunk A1
│   │   └── Level 3 Chunk A1a
│   └── Level 2 Chunk A2
└── Level 1 Chunk B
    ├── Level 2 Chunk B1
    └── Level 2 Chunk B2

Retrieval Algorithm

Search: Find the most semantically similar chunk via vector search
Identify: Get all ancestors and descendants of this chunk in the graph
Exclude: Mark these nodes as excluded for future searches
Repeat: Continue until desired number of chunks retrieved
Result: Return diverse chunks from different document branches

Installation

Install from GitHub

Option 1: Clone and install (for development)

git clone https://github.com/darwinapps/nested-rag.git
cd nested-rag
pip install -e .

Option 2: Install directly with pip

pip install git+https://github.com/darwinapps/nested-rag.git

Dependencies

NestedRAG requires:

Python 3.9+
langchain-core
langchain-qdrant
qdrant-client
networkx

Quick Start

Basic Usage

from nested_rag import NestedRAG
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient

# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
client = QdrantClient(":memory:")  # Or use persistent storage

vector_store = QdrantVectorStore(
    client=client,
    collection_name="my_documents",
    embedding=embeddings,
)

# Create NestedRAG instance
rag = NestedRAG(
    vector_store=vector_store,
    embedding=embeddings,
    max_depth=6,              # Maximum hierarchy depth
    num_semantic_chunks=2,    # Chunks per level
)

# Ingest a document
document_text = """
Your long document text here...
This could be a research paper, article, documentation, etc.
"""

rag.ingest_document(
    document_text=document_text,
    document_label="my_doc_1",
    save_graph=True,
)

# Retrieve relevant chunks
query = "What is the main contribution?"
results = rag.retrieve(
    query=query,
    limit=5,  # Number of chunks to retrieve
)

# Use results with your LLM
for i, doc in enumerate(results):
    print(f"Chunk {i+1}:")
    print(doc.page_content)
    print(f"Metadata: {doc.metadata}")
    print("-" * 80)

Advanced Usage

Custom Filtering

# Retrieve only from specific documents
results = rag.retrieve(
    query="Your query",
    limit=5,
    custom_filters={"label": "my_doc_1"},
)

Load Saved Graphs

# Load a previously saved document graph
graph = rag.load_graph("my_doc_1")
print(f"Loaded graph with {len(graph.nodes)} nodes")

Get Statistics

# View graph statistics
stats = rag.get_statistics()
print(f"Total nodes: {stats['total_nodes']}")
print(f"Actual max depth: {stats['actual_max_depth']}")
print(f"Nodes per level: {stats['nodes_per_level']}")

Configuration Options

Parameter	Default	Description
`max_depth`	6	Maximum depth of hierarchical chunking
`num_semantic_chunks`	2	Number of chunks to create at each level
`sentence_split_regex`	`r"(?<=[.!])\s+"`	Regex pattern for sentence splitting
`graphs_path`	`"./graphs"`	Directory for persisting graph structures
`internal_node_id_field`	`"node_id"`	Metadata field for internal node IDs
`node_id_for_llm_field`	`"node_id_for_llm"`	Metadata field for LLM-facing IDs

Examples

See the examples/ directory for:

basic_usage.py: Simple end-to-end example
advanced_usage.ipynb: Jupyter notebook with visualizations
comparison.py: Comparison with traditional RAG approaches
multi_document.py: Handling multiple documents

API Reference

NestedRAG

`init(...)`

Initialize the NestedRAG system.

Parameters:

vector_store (QdrantVectorStore): Vector store for document storage
embedding (Embeddings): Embedding model for semantic chunking
graph (Optional[DiGraph]): Existing document hierarchy graph
max_depth (int): Maximum depth for hierarchical chunking
num_semantic_chunks (int): Number of chunks per level
Additional configuration parameters...

`ingest_document(document_text, document_label, save_graph=True)`

Ingest and process a document.

Parameters:

document_text (str): Full document text
document_label (str): Unique identifier for the document
save_graph (bool): Whether to persist graph to disk

Returns: DiGraph representing document hierarchy

`retrieve(query, limit=7, offset=0, custom_filters=None)`

Retrieve relevant chunks using hierarchical exclusion.

Parameters:

query (str): Search query
limit (int): Maximum number of chunks to return
offset (int): Number of top results to skip
custom_filters (dict): Optional filters for search

Returns: List of Document objects

`load_graph(document_label)`

Load a saved document graph from disk.

`get_statistics()`

Get statistics about the current graph structure.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/darwinapps/nested-rag.git
cd nested-rag
pip install -e ".[dev]"

Running Tests

# Basic test run
pytest tests/ -v

# With coverage (requires pytest-cov)
pytest tests/ --cov=nested_rag --cov-report=term-missing

Code Style

We use black and ruff for code formatting and linting:

black nested_rag/
ruff check nested_rag/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Developed by Gregory Potemkin at DarwinApps LLC.

Implementation presented here is built on top of the following open-source projects:

LangChain - LLM application framework
Qdrant - Vector database
NetworkX - Graph algorithms

Star ⭐ this repository if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
nested_rag.py		nested_rag.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

NestedRAG: Hierarchical Retrieval-Augmented Generation

Introduction

Main Idea

Why This Matters

Drawbacks

Application Area

Architecture

Hierarchical Chunking

Retrieval Algorithm

Installation

Install from GitHub

Dependencies

Quick Start

Basic Usage

Advanced Usage

Custom Filtering

Load Saved Graphs

Get Statistics

Configuration Options

Examples

API Reference

NestedRAG

__init__(...)

ingest_document(document_text, document_label, save_graph=True)

retrieve(query, limit=7, offset=0, custom_filters=None)

load_graph(document_label)

get_statistics()

Contributing

Development Setup

Running Tests

Code Style

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`init(...)`

`ingest_document(document_text, document_label, save_graph=True)`

`retrieve(query, limit=7, offset=0, custom_filters=None)`

`load_graph(document_label)`

`get_statistics()`

Packages