# Hybrid RAG Model Workflow

### **Models:**
1. **Summarization Model:** LLM (phi-2) for generating concise summaries.
2. **Embedding Model:** SentenceTransformer (BAAI/bge-small-en-v1.5) for dense vector embeddings.
3. Model Quantization(fp16) and GPU optimization to make it scalable for large documents.

### **Document Processing:**
- Chunk documents into smaller text segments using separators=["\n\n", "\n", ".", " ", ""]


### **Tree Construction (Bottom-Up):**
- Group text chunks consecutively.
- Continue grouping until each document has a single root node.  
- **Result:** `d` root nodes for `d` documents.


### **Raptor Call (Inter-Document Grouping):**
- Use the `d` root nodes as input.
- Group until convergence, i.e., until there is a single root node or no more merging can be done.


### **Retrieval Call:**
- Compare query `q` embedding with `d` root node embeddings.
- Select top `k` most similar root nodes.
- Perform **bottom-up BFS** from leaf nodes of the selected `k` nodes until a common parent node is reached or no parent remains.
- **Retrieved Context:** Concatenate the retrieved text to preserve original ordering.


# Part 1: Contiguous Grouping(Intra Part)

###  ChromaDB Initialization and Collection Setup

In [1]:
!pip install -q chromadb > /dev/null 2>&1

In [2]:
import chromadb

# Correct way to initialize client with persistence
client = chromadb.PersistentClient(path="/kaggle/working/chroma_storage")

# Create or get collection
collection = client.get_or_create_collection(name="hierarchy_nodes")

### Model and Pipeline Initialization

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from sentence_transformers import SentenceTransformer
import torch

model_id = "microsoft/phi-2"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load model with `accelerate`-based device mapping
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16,  
    device_map="auto"  # Auto-distribute model across available devices
)

# Create text generation pipeline without specifying `device`
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer
)

# Load embedding model on GPU
embedding_model = SentenceTransformer("BAAI/bge-small-en-v1.5", device="cuda")

2025-05-13 04:41:22.096832: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747111282.295077      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747111282.352060      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cuda:0


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
!pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-core<1.0.0,>=0.3.59 (from langchain-community)
  Downloading langchain_core-0.3.59-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain<1.0.0,>=0.3.25 (from langchain-community)
  Downloading langchain-0.3.25-py3-none-any.whl.metadata (7.8 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain<1.0.0,>=0.3.25->langchain-community)
  Downloading langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting packaging<25,>=23.2 (from langchain-core<1.0.0,>=0.3.59->langchain-community)
  Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Downloading langcha

### Document Initialization

In [5]:
from langchain.document_loaders import WebBaseLoader

def load_multiple_docs(url_list):
    loader = WebBaseLoader(url_list)
    return loader.load()

urls = [
    "https://python.langchain.com/docs/introduction/", 
    "https://python.langchain.com/docs/tutorials/",
     "https://python.langchain.com/docs/concepts/" ,
    "https://python.langchain.com/docs/how_to/"
]

docs = load_multiple_docs(urls)
docs_texts = [doc.page_content for doc in docs]

print(f"Number of docs loaded: {len(docs)}")

Number of docs loaded: 4


### Document Chunking

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs_into_chunks(docs, chunk_size=500, chunk_overlap=100):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        separators=["\n\n", "\n", ".", " ", ""]
    )

    all_chunks = []
    for doc in docs:
        chunks = splitter.split_text(doc.page_content)
        all_chunks.append(chunks)

    return all_chunks

chunked_docs_2d = split_docs_into_chunks(docs)

In [7]:
(len(chunked_docs_2d))

4

- The chunking is based on recursive splitting using specified separators (["\n\n", "\n", ".", " ", ""]), aiming to keep chunks within a maximum length of chunk_size (e.g., 500 characters).  
- Overlap of chunk_overlap (e.g., 100 characters) ensures contextual continuity between consecutive chunks.

### Core Part 1 Algo Logic

Features:  
1. **Efficient Summarization**: Uses summarize_with_llm() for fast, low-latency summary generation with no gradient tracking.
2. **Flexible Node Representation**: TreeNode class captures hierarchy with child, parent, and document index management.
3. **Persistent Storage**: Nodes are stored in ChromaDB with full metadata, enabling long-term vector persistence.
4. **Adaptive Grouping**: Dynamically adjusts grouping size using get_adaptive_group_size() for efficient tree construction.
5. **Optimized Embedding Management**: Efficient GPU embedding with memory cleanup to prevent CUDA out-of-memory errors.
6. **Memory Optimization**: Frequent cache clearing to reduce GPU memory pressure during intensive processing.

In [8]:
from typing import List, Optional
import numpy as np
import math
import torch
import json  

def summarize_with_llm(texts):
    prompt = "Summarize all the important points in shortest way possible for the following text:\n\n" + "\n\n".join(texts) + "\n\nSummary:"
    with torch.no_grad():
        # Disable gradient tracking
        response = pipe(prompt, max_new_tokens=150, do_sample=False)  
        
    torch.cuda.empty_cache()  
    return response[0]["generated_text"].strip()

class TreeNode:
    def __init__(self, index: int, children: List[int], summary_text: str, embedding, parent_index: Optional[int] = None, doc_index: Optional[int] = None):
        self.index = index
        self.children = children  # list of child indices
        self.summary_text = summary_text
        self.embedding = embedding
        self.parent_index = parent_index  # None if root node
        self.doc_index = doc_index  # Only assigned for leaf nodes

def store_node_in_chroma(node: TreeNode, doc_id: int):
    collection.add(
        ids=[f"doc{doc_id}_node{node.index}"],
        embeddings=[node.embedding],
        documents=[node.summary_text],
        metadatas=[{
            "node_index": node.index,
            "doc_id": doc_id,
            "children": json.dumps(node.children),  # Convert list to JSON string
            "parent_index": node.parent_index,
            "doc_index": node.doc_index  
        }]
    )

def get_adaptive_group_size(num_chunks: int, min_group: int = 2, max_group: int = 8):
    return max(min_group, min(max_group, int(math.log2(num_chunks + 1))))

# Embedding function
def embed_text(text):
    embedding = embedding_model.encode(text)
    embedding = torch.tensor(embedding, dtype=torch.float16, device="cuda")  # Move to GPU
    embedding = embedding.cpu().numpy()  # Move back to CPU
    torch.cuda.empty_cache()  # Free unused memory
    return embedding
    
def build_consecutive_hierarchy_tree(chunks: List[str], group_size: int = 2, start_index: int = 0, doc_id: int = 0):
    """
    Build a hierarchy tree from consecutive text chunks (no clustering),
    and store each node in a Chroma vector store.
    Returns: root node index and list of all nodes
    """
    all_nodes = []
    current_level_nodes = []

    print(f"\n Building hierarchy for Document {doc_id}")
    print(f"Level 0 (leaf level): {len(chunks)} chunks")

    # Step 1: Create leaf nodes
    for text in chunks:
        embedding = embed_text(text)
        node = TreeNode(index=start_index, children=[], summary_text=text, embedding=embedding, parent_index=-1, 
            doc_index=doc_id)  # Assign document index for leaf nodes)
        all_nodes.append(node)
        current_level_nodes.append(node)
        
        store_node_in_chroma(node, doc_id)  # ⬅ Store in Chroma
        start_index += 1

    # Step 2: Build higher levels by grouping consecutive nodes
    level = 1
    while len(current_level_nodes) > 1:
        new_level_nodes = []
        print(f"Level {level}: Processing {len(current_level_nodes)} nodes → grouping into {((len(current_level_nodes)-1)//group_size)+1} nodes")
        for i in range(0, len(current_level_nodes), group_size):
            group_nodes = current_level_nodes[i:i + group_size]
            group_texts = [n.summary_text for n in group_nodes]
            summary = summarize_with_llm(group_texts)
            summary_embedding = embed_text(summary)
            child_indices = [n.index for n in group_nodes]
            parent_index = start_index  # Index of the new node being created

            # Update parent index in child nodes
            for child in group_nodes:
                child.parent_index = parent_index
                
            node = TreeNode(index=parent_index, children=child_indices, summary_text=summary, embedding=summary_embedding,parent_index=-1, 
                doc_index=-1 ) # Higher-level nodes have `doc_index=None`)
            all_nodes.append(node)
            new_level_nodes.append(node)
            
            store_node_in_chroma(node, doc_id)  # ⬅ Store in Chroma
            start_index += 1
            
            # Free memory after storing embeddings
            del summary_embedding  
            torch.cuda.empty_cache()
            
        current_level_nodes = new_level_nodes
        level += 1

    torch.cuda.empty_cache()
    root_node = current_level_nodes[0]
    print(f" Hierarchy built for Document {doc_id} → Final root node index: {root_node.index}\n")
    return root_node, all_nodes

def build_hierarchy_trees_for_documents(chunked_docs_2d: List[List[str]]):
    """
    For each document (a list of chunks), build a bottom-up hierarchy tree using consecutive grouping.
    Returns a list of root nodes and list of all nodes across all trees.
    """
    all_nodes = []
    root_nodes = []
    node_index_counter = 0

    for doc_id, doc_chunks in enumerate(chunked_docs_2d):
        adaptive_group_size = get_adaptive_group_size(len(doc_chunks))
        print(f"Adaptive group size for Document {doc_id}: {adaptive_group_size}")
        
        root_node, doc_nodes = build_consecutive_hierarchy_tree(doc_chunks, group_size=adaptive_group_size, start_index=node_index_counter, doc_id=doc_id)
        root_nodes.append(root_node)
        all_nodes.extend(doc_nodes)
        node_index_counter = all_nodes[-1].index + 1  # update for global index

        #  Clear GPU cache after each document
        # torch.cuda.empty_cache()
    
    return root_nodes, all_nodes

In [9]:
import torch

def print_gpu_memory():
    if torch.cuda.is_available():
        print(f"GPU Memory Usage:")
        print(f"Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
        print(f"Cached: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")
    else:
        print("CUDA is not available.")

print_gpu_memory()

GPU Memory Usage:
Allocated: 5.30 GB
Cached: 5.32 GB


In [10]:
root_nodes, all_nodes = build_hierarchy_trees_for_documents(chunked_docs_2d)

Adaptive group size for Document 0: 5

 Building hierarchy for Document 0
Level 0 (leaf level): 35 chunks


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 1: Processing 35 nodes → grouping into 7 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (3236 > 2048). Running this sequence through the model will result in indexing errors
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


Level 2: Processing 7 nodes → grouping into 2 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 3: Processing 2 nodes → grouping into 1 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

 Hierarchy built for Document 0 → Final root node index: 44

Adaptive group size for Document 1: 4

 Building hierarchy for Document 1
Level 0 (leaf level): 28 chunks


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 1: Processing 28 nodes → grouping into 7 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 2: Processing 7 nodes → grouping into 2 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 3: Processing 2 nodes → grouping into 1 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

 Hierarchy built for Document 1 → Final root node index: 82

Adaptive group size for Document 2: 5

 Building hierarchy for Document 2
Level 0 (leaf level): 46 chunks


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 1: Processing 46 nodes → grouping into 10 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 2: Processing 10 nodes → grouping into 2 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 3: Processing 2 nodes → grouping into 1 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

 Hierarchy built for Document 2 → Final root node index: 141

Adaptive group size for Document 3: 5

 Building hierarchy for Document 3
Level 0 (leaf level): 58 chunks


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 1: Processing 58 nodes → grouping into 12 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 2: Processing 12 nodes → grouping into 3 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Level 3: Processing 3 nodes → grouping into 1 nodes


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

 Hierarchy built for Document 3 → Final root node index: 215



In [11]:
print_gpu_memory()

GPU Memory Usage:
Allocated: 5.31 GB
Cached: 5.34 GB


### Data Structure Display

In [12]:
# Print all root node indices
print("All root node indices:", [node.index for node in root_nodes])

All root node indices: [44, 82, 141, 215]


- Number of chunks to group
- Chunk splititng strategy
- Number of levels

In [13]:
def display_tree_from_roots(root_nodes, all_nodes):
    """
    Display the tree structure from each root node.
    Only prints node indices and their children.
    """
    index_to_node = {node.index: node for node in all_nodes}

    def traverse(node, depth=0):
        indent = "  " * depth
        print(f"{indent}- Node {node.index}")
        for child_idx in node.children:
            child_node = index_to_node.get(child_idx)
            if child_node:
                traverse(child_node, depth + 1)

    for root in root_nodes:
        print(f"\n Tree starting from root Node {root.index}")
        traverse(root)
        
display_tree_from_roots(root_nodes, all_nodes)


 Tree starting from root Node 44
- Node 44
  - Node 42
    - Node 35
      - Node 0
      - Node 1
      - Node 2
      - Node 3
      - Node 4
    - Node 36
      - Node 5
      - Node 6
      - Node 7
      - Node 8
      - Node 9
    - Node 37
      - Node 10
      - Node 11
      - Node 12
      - Node 13
      - Node 14
    - Node 38
      - Node 15
      - Node 16
      - Node 17
      - Node 18
      - Node 19
    - Node 39
      - Node 20
      - Node 21
      - Node 22
      - Node 23
      - Node 24
  - Node 43
    - Node 40
      - Node 25
      - Node 26
      - Node 27
      - Node 28
      - Node 29
    - Node 41
      - Node 30
      - Node 31
      - Node 32
      - Node 33
      - Node 34

 Tree starting from root Node 82
- Node 82
  - Node 80
    - Node 73
      - Node 45
      - Node 46
      - Node 47
      - Node 48
    - Node 74
      - Node 49
      - Node 50
      - Node 51
      - Node 52
    - Node 75
      - Node 53
      - Node 54
      - Node 55
      - No

In [14]:
# Create a mapping dictionary
root_node_mapping = {node.index: i for i, node in enumerate(root_nodes)}

# Print the mapping
print("Root node mapping:", root_node_mapping)

Root node mapping: {44: 0, 82: 1, 141: 2, 215: 3}


# Part 2: Raptor Part(Inter Part)

In [1]:
!pip install -q umap-learn scikit-learn langchain-google-genai faiss-cpu  
!pip install -U langchain-community

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.8.3 requires cubinlinker, which is not installed.
cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.8.3 requires ptxcompiler, which is not installed.
cuml 24.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.8.3 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.6.0 which is incompatible.
distributed 2024.7.1 requires dask==2024.7.1, but you have dask 2024.9.0 which is incompatible.
google-cloud-bigquery 2.34.4 requires packaging<22.0dev,>=14.3, but you have packaging 24.2 which is incompatible.
google-generativeai 0.8.1 requires google-ai-generativelanguage==0.6.9, but you have google-ai-generativelanguage 0.6.17 which is incompatible.
jupyterlab 4.2.5 requires jupyter-lsp>

In [4]:
import os
from huggingface_hub import login

# Try retrieving API key from environment variables
hf_token = os.getenv("HUGGINGFACE_API_KEY")

if not hf_token:
    try:
        # If running on Kaggle, try retrieving from Kaggle Secrets
        from kaggle_secrets import UserSecretsClient
        user_secrets = UserSecretsClient()
        hf_token = user_secrets.get_secret("HUGGINGFACE_API_KEY")
    except Exception:
        raise ValueError("Hugging Face API Key Missing! Add it to environment variables or Kaggle Secrets.")

# Perform Hugging Face Login
login(token=hf_token)
print("Successfully logged into Hugging Face!")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful
✅ Successfully logged into Hugging Face!


In [6]:
pip install -q sentence-transformers

  pid, fd = os.forkpty()


Note: you may need to restart the kernel to use updated packages.


### Core Part 2 Algo Logic

In [17]:
from sklearn.mixture import GaussianMixture
import umap
import numpy as np
import faiss 
import torch  
from sentence_transformers import SentenceTransformer
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from transformers import pipeline

hf_embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")

# Main Inter-Document Tree Builder
def build_bottom_up_tree_inter_from_intra_roots(root_nodes_intra, max_levels=5):
    """
    Builds an inter-document hierarchy (RAPTOR) using intra-document root summaries.
    Each leaf in the inter-doc tree represents one document (its intra-root summary).
    """
    all_nodes_inter = []
    current_level_nodes = []
    node_index_counter = 0

    print(f"\n Using {len(root_nodes_intra)} intra roots as inter-doc leaves.")

    # Step 1: Leaf Nodes from Intra Roots
    for doc_idx, root in enumerate(root_nodes_intra):
        embedding = embed_text(root.summary_text)
        node = TreeNode(
            index=node_index_counter,
            children=[],
            summary_text=root.summary_text,
            embedding=embedding,
            parent_index=-1,
            doc_index=doc_idx  # Set for linkage
        )
        all_nodes_inter.append(node)
        current_level_nodes.append(node)
        node_index_counter += 1

    # Step 2: Hierarchical Clustering
    for level in range(max_levels):
        print(f"\n Inter-Raptor - Building level {level + 1}...")

        if len(current_level_nodes) <= 2:
            print(" Too few nodes left. Stopping.")
            break

        embeddings = np.array([n.embedding for n in current_level_nodes])
        reduced_embeddings = (
            umap.UMAP(n_components=2, n_neighbors=5, metric="cosine").fit_transform(embeddings)
            if len(current_level_nodes) >= 10 else embeddings
        )

        n_clusters = min(len(current_level_nodes) // 2, 10)
        if len(current_level_nodes) <= n_clusters:
            break

        gmm = GaussianMixture(n_components=n_clusters, random_state=42)
        labels = gmm.fit_predict(reduced_embeddings)

        new_level_nodes = []
        for cluster_id in range(n_clusters):
            cluster_idxs = np.where(labels == cluster_id)[0]
            if len(cluster_idxs) == 0:
                continue

            cluster_nodes = [current_level_nodes[i] for i in cluster_idxs]
            cluster_texts = [n.summary_text for n in cluster_nodes]
            summary = summarize_with_llm(cluster_texts)
            summary_embedding = embed_text(summary)
            child_indices = [n.index for n in cluster_nodes]

            # Update parent index in children
            for child in cluster_nodes:
                child.parent_index = node_index_counter

            parent_node = TreeNode(
                index=node_index_counter,
                children=child_indices,
                summary_text=summary,
                embedding=summary_embedding,
                parent_index=-1,
                doc_index=-1
            )
            all_nodes_inter.append(parent_node)
            new_level_nodes.append(parent_node)
            node_index_counter += 1

            #  Memory cleanup
            del summary_embedding
            torch.cuda.empty_cache()

        current_level_nodes = new_level_nodes
        if len(current_level_nodes) == 1:
            print(" Inter Root Found.")
            break

    # FAISS Index
    print("\n Storing inter-document nodes in FAISS...")
    faiss_store_inter = FAISS.from_texts([n.summary_text for n in all_nodes_inter], hf_embeddings)
    print(" FAISS Inter-Document Index Built!")

    return current_level_nodes, all_nodes_inter, faiss_store_inter


# Function: Display Tree Hierarchy
def display_tree_full(root_nodes_inter, all_nodes_inter, root_nodes_intra=None, all_nodes_intra=None):
    """
    Display the full tree: inter-document (Raptor) → intra-document (hierarchical) nodes.
    Works if intra roots are indexed with doc_index for matching.
    """
    index_to_node_inter = {node.index: node for node in all_nodes_inter}
    index_to_node_intra = {node.index: node for node in all_nodes_intra} if all_nodes_intra else {}

    def traverse_inter(node, depth=0):
        indent = "  " * depth
        print(f"{indent}- [INTER] Node {node.index} → Children: {node.children}")

        # Traverse children (inter-doc Raptor tree)
        for child_idx in node.children:
            child_node = index_to_node_inter.get(child_idx)
            if child_node:
                traverse_inter(child_node, depth + 1)

        # If this is a leaf inter-node and linked to intra-root
        if not node.children and root_nodes_intra:
            for intra_root in root_nodes_intra:
                if getattr(intra_root, "doc_index", None) == node.doc_index:
                    print(f"{indent}  ↳ [INTRA Tree for doc {intra_root.doc_index}]")
                    traverse_intra(intra_root, depth + 2)

    def traverse_intra(node, depth=0):
        indent = "  " * depth
        print(f"{indent}- [INTRA] Node {node.index} → Children: {node.children}")
        for child_idx in node.children:
            child_node = index_to_node_intra.get(child_idx)
            if child_node:
                traverse_intra(child_node, depth + 1)

    for root in root_nodes_inter:
        print(f"\n Inter-Doc Tree from Root Node {root.index}")
        traverse_inter(root)


  hf_embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")


In [20]:
root_nodes_inter, all_nodes_inter, faiss_store_inter = build_bottom_up_tree_inter_from_intra_roots(root_nodes)


📦 Using 4 intra roots as inter-doc leaves.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


🔄 Inter-Raptor - Building level 1...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


🔄 Inter-Raptor - Building level 2...
✅ Too few nodes left. Stopping.

🔄 Storing inter-document nodes in FAISS...
✅ FAISS Inter-Document Index Built!


In [21]:
display_tree_full(
    root_nodes_inter=root_nodes_inter,
    all_nodes_inter=all_nodes_inter,
    root_nodes_intra=root_nodes,
    all_nodes_intra=all_nodes
)


🌐 Inter-Doc Tree from Root Node 4
- [INTER] Node 4 → Children: [0, 2, 3]
  - [INTER] Node 0 → Children: []
  - [INTER] Node 2 → Children: []
  - [INTER] Node 3 → Children: []

🌐 Inter-Doc Tree from Root Node 5
- [INTER] Node 5 → Children: [1]
  - [INTER] Node 1 → Children: []


# Part 3: Hierarchical Retrieval

In [32]:
from typing import List
import torch
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from collections import defaultdict, deque

def cosine_sim(query_embedding, node_embedding):
    return float(cosine_similarity([query_embedding], [node_embedding])[0][0])

def hierarchical_retrieve_combined(# uses intra and inter parts both
    query: str,
    root_nodes_inter: List[TreeNode],
    all_nodes_inter: List[TreeNode],
    root_nodes_intra: List[TreeNode],
    all_nodes_intra: List[TreeNode],
    similarity_threshold: float = 0.5,
    top_k: int = 3
):
    # Step 1: Embed query
    query_embedding = embed_text(query)

    # Step 2: Score inter-document (Raptor) root nodes
    inter_scores = [(cosine_sim(query_embedding, node.embedding), node) for node in root_nodes_inter]

    print("\n Inter-Document Similarity Scores:")
    for score, node in inter_scores:
        print(f"  Inter Root Node {node.index}: {score:.4f}")

    # Step 3: Select top-k inter roots
    inter_scores.sort(reverse=True, key=lambda x: x[0])
    selected_inter_roots = [node for sim, node in inter_scores[:top_k]]
    selected_inter_indices = set(node.index for node in selected_inter_roots)
    print(f"\n Top-{top_k} Inter Root Nodes: {list(selected_inter_indices)}")

    # Step 4: Map doc_index from inter-leaf to intra-roots
    inter_leaf_doc_indices = set()
    inter_node_map = {n.index: n for n in all_nodes_inter}
    for root in selected_inter_roots:
        stack = [root]
        while stack:
            node = stack.pop()
            if not node.children:
                if node.doc_index is not None:
                    inter_leaf_doc_indices.add(node.doc_index)
            else:
                stack.extend(inter_node_map[c] for c in node.children)

    # Step 5: Identify matching intra-doc roots
    matched_intra_roots = [n for n in root_nodes_intra if n.doc_index in inter_leaf_doc_indices]
    print(f"\n Matching Intra Roots from Inter Leafs: {[n.index for n in matched_intra_roots]}")

    # Step 6: Retrieve leaf nodes from intra trees
    node_map_intra = {n.index: n for n in all_nodes_intra}
    child_to_parent = defaultdict(list)
    for node in all_nodes_intra:
        for child in node.children:
            child_to_parent[child].append(node.index)

    leaf_nodes = [node for node in all_nodes_intra if not node.children and node.doc_index in inter_leaf_doc_indices]
    selected_leaf_nodes = [n for n in leaf_nodes if cosine_sim(query_embedding, n.embedding) >= similarity_threshold]

    print(f"\n Matching Intra Leaf Nodes: {[n.index for n in selected_leaf_nodes]}")
    selected_leaf_nodes.sort(key=lambda x: x.index)
    retrieved_text = "\n".join([n.summary_text for n in selected_leaf_nodes])

    # Step 7: Propagate upward
    current = set(n.index for n in selected_leaf_nodes)
    visited = set(current)

    while True:
        parent_candidates = set()
        for child in current:
            parent_candidates.update(child_to_parent.get(child, []))

        if not parent_candidates:
            break

        new_parents = []
        for p in parent_candidates:
            if p not in visited:
                parent_node = node_map_intra[p]
                sim = cosine_sim(query_embedding, parent_node.embedding)
                if sim >= similarity_threshold:
                    new_parents.append(parent_node)

        if not new_parents:
            break

        print(f"\n Promoting Parent Nodes: {[n.index for n in new_parents]}")
        current = set(n.index for n in new_parents)
        visited.update(current)

    return retrieved_text

## uses just intra part(initial code part, used for debugging)
def hierarchical_retrieve(query: str, root_nodes: List[TreeNode], all_nodes: List[TreeNode], root_node_mapping: dict, top_k: int = 3, similarity_threshold: float = 0.5):
    # Step 1: Embed the query
    query_embedding = embed_text(query)

    # Step 2: Select top-k root nodes by cosine similarity
    root_scores = [(cosine_sim(query_embedding, node.embedding), node) for node in root_nodes]

    # Print all root node scores before sorting
    print("\n Similarity Scores for All Root Nodes:")
    for score, node in root_scores:
        print(f"Root Node {node.index}: {score:.4f}")
    
    root_scores.sort(reverse=True, key=lambda x: x[0])
    selected_roots = [node for sim, node in root_scores[:top_k]]
    selected_root_indices = set(node.index for node in selected_roots)
    print(f"\n Selected Top-{top_k} Root Nodes (similarity ≥ {similarity_threshold}): {[node.index for node in selected_roots]}")

    # Step 3: Map node index to node
    node_map = {node.index: node for node in all_nodes}

    # Step 4: Map selected root node indices to document indices
    selected_doc_indices = set(root_node_mapping[root_index] for root_index in selected_root_indices if root_index in root_node_mapping)
    print(f" Mapped Root Nodes to Document Indices: {selected_doc_indices}")

    # Step 5: Build a reverse child-to-parent map
    child_to_parent = defaultdict(list)
    for node in all_nodes:
        for child in node.children:
            child_to_parent[child].append(node.index)

    # Step 6: Filter eligible leaf nodes under selected root documents
    leaf_nodes = [node for node in all_nodes if not node.children and node.doc_index in selected_doc_indices]

    # Step 7: Select leaf nodes by similarity threshold
    selected_leaf_nodes = [node for node in leaf_nodes if cosine_sim(query_embedding, node.embedding) >= similarity_threshold]
    selected_leaf_indices = set(node.index for node in selected_leaf_nodes)

    # Sort by node index for ordered output
    selected_leaf_nodes.sort(key=lambda n: n.index)
    retrieved_text = "\n".join([n.summary_text for n in selected_leaf_nodes])
    print(f" Selected Leaf Nodes (under selected roots): {[n.index for n in selected_leaf_nodes]}")

    # Step 8: Upward traversal - propagate through parent levels
    current_selected = selected_leaf_indices
    visited = set(current_selected)

    while True:
        parent_candidates = set()
        for child_idx in current_selected:
            parent_idxs = child_to_parent.get(child_idx, [])
            parent_candidates.update(parent_idxs)

        if not parent_candidates:
            print(" No more parents found — stopping traversal.")
            break

        parent_nodes = [node_map[i] for i in parent_candidates if i not in visited]
        parent_scores = [(cosine_sim(query_embedding, node.embedding), node) for node in parent_nodes]
        parent_nodes_filtered = [node for sim, node in parent_scores if sim >= similarity_threshold]

        if not parent_nodes_filtered:
            print(" No more relevant parents (filtered by threshold) — stopping traversal.")
            break

        print(f" Next Level Parent Nodes: {[n.index for n in parent_nodes_filtered]}")

        current_selected = set(n.index for n in parent_nodes_filtered)
        visited.update(current_selected)

    return retrieved_text

In [24]:
root_node_mapping

{44: 0, 82: 1, 141: 2, 215: 3}

In [29]:
retrieved_text = hierarchical_retrieve(# this code is hierarchical_retrieve; uses just intra part while hierarhical retrieve combined uses intra and inter part)
    "How to reduce security risks in cloud platforms?",
    root_nodes,
    all_nodes,
    root_node_mapping,
    top_k=3,
    similarity_threshold=0.55
)

print("\n Final Retrieved Text:\n", retrieved_text)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


📊 Similarity Scores for All Root Nodes:
Root Node 44: 0.4756
Root Node 82: 0.4898
Root Node 141: 0.4783
Root Node 215: 0.4902

📌 Selected Top-3 Root Nodes (similarity ≥ 0.55): [215, 82, 141]
🔄 Mapped Root Nodes to Document Indices: {1, 2, 3}
📄 Selected Leaf Nodes (under selected roots): [52, 90, 106, 149, 176, 177, 180]
🛑 No more relevant parents (filtered by threshold) — stopping traversal.

 Final Retrieved Text:
 the MultiQueryRetrieverHow to add scores to retriever resultsCachingHow to use callbacks in async environmentsHow to attach callbacks to a runnableHow to propagate callbacks  constructorHow to dispatch custom callback eventsHow to pass callbacks in at runtimeHow to split by characterHow to cache chat model responsesHow to handle rate limitsHow to init any model in one lineHow to track token usage in ChatModelsHow to add tools to chatbotsHow to split codeHow to do retrieval with contextual
the MultiQueryRetrieverHow to add scores to retriever resultsCachingHow to use callba

In [33]:
retrieved_text = hierarchical_retrieve_combined(
    query="How to reduce security risks in cloud platforms?",
    root_nodes_inter=root_nodes_inter,
    all_nodes_inter=all_nodes_inter,
    root_nodes_intra=root_nodes,
    all_nodes_intra=all_nodes,
    top_k=3,
    similarity_threshold=0.55
)

print("\n Final Retrieved Text:\n", retrieved_text)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


🌐 Inter-Document Similarity Scores:
  Inter Root Node 4: 0.4820
  Inter Root Node 5: 0.4932

✅ Top-3 Inter Root Nodes: [4, 5]

🔗 Matching Intra Roots from Inter Leafs: []

📄 Matching Intra Leaf Nodes: [7, 33, 52, 90, 106, 149, 176, 177, 180]

📢 Final Retrieved Text:
 the MultiQueryRetrieverHow to add scores to retriever resultsCachingHow to use callbacks in async environmentsHow to attach callbacks to a runnableHow to propagate callbacks  constructorHow to dispatch custom callback eventsHow to pass callbacks in at runtimeHow to split by characterHow to cache chat model responsesHow to handle rate limitsHow to init any model in one lineHow to track token usage in ChatModelsHow to add tools to chatbotsHow to split codeHow to do retrieval with contextual
Additional resources​
Versions​
See what changed in v0.3, learn how to migrate legacy code, read up on our versioning policies, and more.
Security​
Read up on security best practices to make sure you're developing safely with LangChain.


In [34]:
retrieved_text = hierarchical_retrieve_combined(
    query="What is LangChain and what problems does it solve?",
    root_nodes_inter=root_nodes_inter,
    all_nodes_inter=all_nodes_inter,
    root_nodes_intra=root_nodes,
    all_nodes_intra=all_nodes,
    top_k=3,
    similarity_threshold=0.70
)

print("\n Final Retrieved Text:\n", retrieved_text)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


🌐 Inter-Document Similarity Scores:
  Inter Root Node 4: 0.7614
  Inter Root Node 5: 0.7042

✅ Top-3 Inter Root Nodes: [4, 5]

🔗 Matching Intra Roots from Inter Leafs: []

📄 Matching Intra Leaf Nodes: [0, 19, 21, 22, 23, 25, 26, 27, 29, 30, 31, 32, 33, 45, 64, 66, 68, 71, 72, 83, 102, 106, 109, 114, 121, 122, 142, 161, 164, 183, 190, 193, 194, 198]

🔼 Promoting Parent Nodes: [129, 133, 134, 135, 136, 35, 39, 40, 41, 200, 73, 204, 77, 78, 79, 208, 210, 211]

🔼 Promoting Parent Nodes: [42, 43, 139, 140, 81, 212, 214]

🔼 Promoting Parent Nodes: [44, 141, 215]

📢 Final Retrieved Text:
 Introduction | 🦜️🔗 LangChain
.0 chainsMigrating from ConstitutionalChainMigrating from ConversationalChainMigrating from ConversationalRetrievalChainMigrating from LLMChainMigrating from LLMMathChainMigrating from LLMRouterChainMigrating from MapReduceDocumentsChainMigrating from MapRerankDocumentsChainMigrating from MultiPromptChainMigrating from RefineDocumentsChainMigrating from RetrievalQAMigrating from