# RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

In [1]:
# NOTE: An OpenAI API key must be set here for application initialization, even if not in use.
# If you're not utilizing OpenAI models, assign a placeholder string (e.g., "not_used").
import os

# os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["OPENAI_API_KEY"] = "not_used"

In [2]:
# Cinderella story defined in sample.txt
with open("demo/sample.txt", "r") as file:
    text = file.read()

print(text[:100])

The wife of a rich man fell sick, and as she felt that her end
was drawing near, she called her only


1) **Building**: RAPTOR recursively embeds, clusters, and summarizes chunks of text to construct a tree with varying levels of summarization from the bottom up. You can create a tree from the text in 'sample.txt' using `RA.add_documents(text)`.

2) **Querying**: At inference time, the RAPTOR model retrieves information from this tree, integrating data across lengthy documents at different abstraction levels. You can perform queries on the tree with `RA.answer_question`.

### Building the tree

In [3]:
from raptor import RetrievalAugmentation

In [4]:
RA = RetrievalAugmentation()

2025-12-23 16:18:19,477 - Use pytorch device_name: mps
2025-12-23 16:18:19,477 - Load pretrained SentenceTransformer: nomic-ai/modernbert-embed-base


Start initializing RetrievalAugmentation...
Validating QA model...
QA model not provided in config
Validating embedding model...
Embedding model not provided in config
Validating summarization model...
Summarization model not provided in config
Setting TreeBuilderConfig...


2025-12-23 16:18:20,860 - Use pytorch device_name: mps
2025-12-23 16:18:20,862 - Load pretrained SentenceTransformer: nomic-ai/modernbert-embed-base


Setting TreeRetrieverConfig...
Embedding model not provided, defaulting to SBertEmbeddingModel
2 config done


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2025-12-23 16:18:39,151 - Successfully initialized TreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 5
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <raptor.SummarizationModels.DeepSeekSummarizationModel object at 0x3115106e0>
            Embedding Models: {'SBERT': <raptor.EmbeddingModels.SBertEmbeddingModel object at 0x311510

In [5]:
# construct the tree and corresponding retriever
RA.add_documents(text)

2025-12-23 16:29:23,732 - Creating Leaf Nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-12-23 16:29:28,651 - Created 35 Leaf Embeddings
2025-12-23 16:29:28,651 - Building All Nodes
2025-12-23 16:29:28,652 - Using Cluster TreeBuilder
2025-12-23 16:29:28,652 - Constructing Layer 0
2025-12-23 16:29:34,609 - Summarization Length: 100
2025-12-23 16:29:34,976 - HTTP Request: POST https://api.deepseek.com/chat/completions "HTTP/1.1 200 OK"
2025-12-23 16:29:39,768 - Node Texts Length: 1047, Summarized Text Length: 99


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-12-23 16:29:40,117 - HTTP Request: POST https://api.deepseek.com/chat/completions "HTTP/1.1 200 OK"
2025-12-23 16:29:44,785 - Node Texts Length: 960, Summarized Text Length: 100


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-12-23 16:29:45,086 - HTTP Request: POST https://api.deepseek.com/chat/completions "HTTP/1.1 200 OK"
2025-12-23 16:29:50,033 - Node Texts Length: 407, Summarized Text Length: 102


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-12-23 16:29:50,417 - HTTP Request: POST https://api.deepseek.com/chat/completions "HTTP/1.1 200 OK"
2025-12-23 16:29:55,060 - Node Texts Length: 867, Summarized Text Length: 101


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-12-23 16:29:55,276 - Constructing Layer 1
2025-12-23 16:29:55,277 - Stopping Layer construction: Cannot Create More Layers. Total Layers in tree: 1
2025-12-23 16:29:55,277 - Successfully initialized TreeRetriever with Config 
        TreeRetrieverConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Context Embedding Model: SBERT
            Embedding Model: <raptor.EmbeddingModels.SBertEmbeddingModel object at 0x317791a90>
            Num Layers: None
            Start Layer: None
        


### Querying from the tree

```python
question = # any question
RA.answer_question(question)
```

In [11]:
question = "How did Cinderella reach her happy ending ?"

answer = RA.answer_question(question=question)

print("Answer: ", answer)

2025-12-23 16:36:45,718 - Using collapsed_tree


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Answer:  remarried the woman with two beautiful but cruel daughters.


In [7]:
# Save the tree by calling RA.save("path/to/save")
SAVE_PATH = "demo/cinderella_myself"
RA.save(SAVE_PATH)

2025-12-23 16:31:51,494 - Tree successfully saved to demo/cinderella_myself


In [None]:
# load back the tree by passing it into RetrievalAugmentation

RA = RetrievalAugmentation(tree=SAVE_PATH)

2025-12-23 16:38:28,302 - Use pytorch device_name: mps
2025-12-23 16:38:28,303 - Load pretrained SentenceTransformer: nomic-ai/modernbert-embed-base


Start initializing RetrievalAugmentation...
Validating QA model...
QA model not provided in config
Validating embedding model...
Embedding model not provided in config
Validating summarization model...
Summarization model not provided in config
Setting TreeBuilderConfig...


2025-12-23 16:38:29,698 - Use pytorch device_name: mps
2025-12-23 16:38:29,698 - Load pretrained SentenceTransformer: nomic-ai/modernbert-embed-base


Setting TreeRetrieverConfig...
Embedding model not provided, defaulting to SBertEmbeddingModel
2 config done


2025-12-23 16:38:35,144 - Successfully initialized TreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 5
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <raptor.SummarizationModels.DeepSeekSummarizationModel object at 0x333c06b10>
            Embedding Models: {'SBERT': <raptor.EmbeddingModels.SBertEmbeddingModel object at 0x1065878a0>}
            Cluster Embedding Model: SBERT
        
        Reduction Dimension: 10
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}
        
2025-12-23 16:38:35,145 - Successfully initialized ClusterTreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 5
            Threshold: 0.5
            Top K: 5
            Selection

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1078 > 512). Running this sequence through the model will result in indexing errors


Answer:  remarried the woman with two beautiful but cruel daughters.


In [None]:
answer = RA.answer_question(question=question)
print("Answer: ", answer)

## Using other Open Source Models for Summarization/QA/Embeddings

If you want to use other models such as Llama or Mistral, you can very easily define your own models and use them with RAPTOR. 

In [None]:
import torch
from raptor import (
    BaseSummarizationModel,
    BaseQAModel,
    BaseEmbeddingModel,
    RetrievalAugmentationConfig,
)
from transformers import AutoTokenizer, pipeline

In [None]:
# if you want to use the Gemma, you will need to authenticate with HuggingFace, Skip this step, if you have the model already downloaded
from huggingface_hub import login

login()

In [None]:
from transformers import AutoTokenizer, pipeline
import torch


# You can define your own Summarization model by extending the base Summarization Class.
class GEMMASummarizationModel(BaseSummarizationModel):
    def __init__(self, model_name="google/gemma-2b-it"):
        # Initialize the tokenizer and the pipeline for the GEMMA model
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.summarization_pipeline = pipeline(
            "text-generation",
            model=model_name,
            model_kwargs={"torch_dtype": torch.bfloat16},
            device=torch.device(
                "cuda" if torch.cuda.is_available() else "cpu"
            ),  # Use "cpu" if CUDA is not available
        )

    def summarize(self, context, max_tokens=150):
        # Format the prompt for summarization
        messages = [
            {
                "role": "user",
                "content": f"Write a summary of the following, including as many key details as possible: {context}:",
            }
        ]

        prompt = self.tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )

        # Generate the summary using the pipeline
        outputs = self.summarization_pipeline(
            prompt,
            max_new_tokens=max_tokens,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95,
        )

        # Extracting and returning the generated summary
        summary = outputs[0]["generated_text"].strip()
        return summary

In [None]:
class GEMMAQAModel(BaseQAModel):
    def __init__(self, model_name="google/gemma-2b-it"):
        # Initialize the tokenizer and the pipeline for the model
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.qa_pipeline = pipeline(
            "text-generation",
            model=model_name,
            model_kwargs={"torch_dtype": torch.bfloat16},
            device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
        )

    def answer_question(self, context, question):
        # Apply the chat template for the context and question
        messages = [
            {
                "role": "user",
                "content": f"Given Context: {context} Give the best full answer amongst the option to question {question}",
            }
        ]
        prompt = self.tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )

        # Generate the answer using the pipeline
        outputs = self.qa_pipeline(
            prompt,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            top_p=0.95,
        )

        # Extracting and returning the generated answer
        answer = outputs[0]["generated_text"][len(prompt) :]
        return answer

In [None]:
from sentence_transformers import SentenceTransformer


class SBertEmbeddingModel(BaseEmbeddingModel):
    def __init__(self, model_name="sentence-transformers/multi-qa-mpnet-base-cos-v1"):
        self.model = SentenceTransformer(model_name)

    def create_embedding(self, text):
        return self.model.encode(text)

In [None]:
RAC = RetrievalAugmentationConfig(
    summarization_model=GEMMASummarizationModel(),
    qa_model=GEMMAQAModel(),
    embedding_model=SBertEmbeddingModel(),
)

In [None]:
RA = RetrievalAugmentation(config=RAC)

In [None]:
with open("demo/sample.txt", "r") as file:
    text = file.read()

RA.add_documents(text)

In [None]:
question = "How did Cinderella reach her happy ending?"

answer = RA.answer_question(question=question)

print("Answer: ", answer)