Skip to content

darwinapps/nestedRAG

Repository files navigation

NestedRAG: Hierarchical Retrieval-Augmented Generation

Python 3.9+ License: MIT

A RAG architecture that uses hierarchical semantic chunking and graph-based context exclusion to maximize the amount of relevant information in the retrieved context while minimizing the total volume.

Introduction

Traditional RAG systems face a critical trade-off: retrieving larger chunks provides more context but includes irrelevant information, while smaller chunks are more focused but may lack necessary context. NestedRAG addresses this by optimizing the balance between relevance and chunk size.

Main Idea

NestedRAG dynamically selects optimally-sized chunks by:

  1. Hierarchical Tree Structure: Documents are recursively split into semantic units, creating a tree where each branch represents nested text segments at different granularities

  2. Branch-Level Selection: During retrieval, only one datapoint per branch is selected, ensuring we get the most relevant segment from each hierarchical chain

Why This Matters

  • Each query gets optimally-sized chunks based on where relevance lies in the hierarchy
  • Higher relevant-to-total information ratio
  • No nested or overlapping chunks in results
  • Retrieves from diverse document branches automatically

Drawbacks

  • Increased number of data points: The algorithm produces s^l distinct data points, where s is the number of semantic chunks per level and l is the number of levels.

Application Area

This architecture was developed for use cases where the source data is long, unstructured text, e.g., conversation transcripts, long-form articles, etc.

Architecture

Hierarchical Chunking

Root (Full Document)
├── Level 1 Chunk A
│   ├── Level 2 Chunk A1
│   │   └── Level 3 Chunk A1a
│   └── Level 2 Chunk A2
└── Level 1 Chunk B
    ├── Level 2 Chunk B1
    └── Level 2 Chunk B2

Retrieval Algorithm

  1. Search: Find the most semantically similar chunk via vector search
  2. Identify: Get all ancestors and descendants of this chunk in the graph
  3. Exclude: Mark these nodes as excluded for future searches
  4. Repeat: Continue until desired number of chunks retrieved
  5. Result: Return diverse chunks from different document branches

Installation

Install from GitHub

Option 1: Clone and install (for development)

git clone https://github.com/darwinapps/nested-rag.git
cd nested-rag
pip install -e .

Option 2: Install directly with pip

pip install git+https://github.com/darwinapps/nested-rag.git

Dependencies

NestedRAG requires:

  • Python 3.9+
  • langchain-core
  • langchain-qdrant
  • qdrant-client
  • networkx

Quick Start

Basic Usage

from nested_rag import NestedRAG
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient

# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
client = QdrantClient(":memory:")  # Or use persistent storage

vector_store = QdrantVectorStore(
    client=client,
    collection_name="my_documents",
    embedding=embeddings,
)

# Create NestedRAG instance
rag = NestedRAG(
    vector_store=vector_store,
    embedding=embeddings,
    max_depth=6,              # Maximum hierarchy depth
    num_semantic_chunks=2,    # Chunks per level
)

# Ingest a document
document_text = """
Your long document text here...
This could be a research paper, article, documentation, etc.
"""

rag.ingest_document(
    document_text=document_text,
    document_label="my_doc_1",
    save_graph=True,
)

# Retrieve relevant chunks
query = "What is the main contribution?"
results = rag.retrieve(
    query=query,
    limit=5,  # Number of chunks to retrieve
)

# Use results with your LLM
for i, doc in enumerate(results):
    print(f"Chunk {i+1}:")
    print(doc.page_content)
    print(f"Metadata: {doc.metadata}")
    print("-" * 80)

Advanced Usage

Custom Filtering

# Retrieve only from specific documents
results = rag.retrieve(
    query="Your query",
    limit=5,
    custom_filters={"label": "my_doc_1"},
)

Load Saved Graphs

# Load a previously saved document graph
graph = rag.load_graph("my_doc_1")
print(f"Loaded graph with {len(graph.nodes)} nodes")

Get Statistics

# View graph statistics
stats = rag.get_statistics()
print(f"Total nodes: {stats['total_nodes']}")
print(f"Actual max depth: {stats['actual_max_depth']}")
print(f"Nodes per level: {stats['nodes_per_level']}")

Configuration Options

Parameter Default Description
max_depth 6 Maximum depth of hierarchical chunking
num_semantic_chunks 2 Number of chunks to create at each level
sentence_split_regex r"(?<=[.!])\s+" Regex pattern for sentence splitting
graphs_path "./graphs" Directory for persisting graph structures
internal_node_id_field "node_id" Metadata field for internal node IDs
node_id_for_llm_field "node_id_for_llm" Metadata field for LLM-facing IDs

Examples

See the examples/ directory for:

  • basic_usage.py: Simple end-to-end example
  • advanced_usage.ipynb: Jupyter notebook with visualizations
  • comparison.py: Comparison with traditional RAG approaches
  • multi_document.py: Handling multiple documents

API Reference

NestedRAG

__init__(...)

Initialize the NestedRAG system.

Parameters:

  • vector_store (QdrantVectorStore): Vector store for document storage
  • embedding (Embeddings): Embedding model for semantic chunking
  • graph (Optional[DiGraph]): Existing document hierarchy graph
  • max_depth (int): Maximum depth for hierarchical chunking
  • num_semantic_chunks (int): Number of chunks per level
  • Additional configuration parameters...

ingest_document(document_text, document_label, save_graph=True)

Ingest and process a document.

Parameters:

  • document_text (str): Full document text
  • document_label (str): Unique identifier for the document
  • save_graph (bool): Whether to persist graph to disk

Returns: DiGraph representing document hierarchy

retrieve(query, limit=7, offset=0, custom_filters=None)

Retrieve relevant chunks using hierarchical exclusion.

Parameters:

  • query (str): Search query
  • limit (int): Maximum number of chunks to return
  • offset (int): Number of top results to skip
  • custom_filters (dict): Optional filters for search

Returns: List of Document objects

load_graph(document_label)

Load a saved document graph from disk.

get_statistics()

Get statistics about the current graph structure.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/darwinapps/nested-rag.git
cd nested-rag
pip install -e ".[dev]"

Running Tests

# Basic test run
pytest tests/ -v

# With coverage (requires pytest-cov)
pytest tests/ --cov=nested_rag --cov-report=term-missing

Code Style

We use black and ruff for code formatting and linting:

black nested_rag/
ruff check nested_rag/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Developed by Gregory Potemkin at DarwinApps LLC.

Implementation presented here is built on top of the following open-source projects:

Star ⭐ this repository if you find it useful!

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages