## Multi-Representation Indexing

The core idea of Multi-Representation Indexing is simple but powerful: instead of embedding the full document chunks, we create a smaller, more focused representation of each chunk (like a summary) and embed that instead.

In [1]:
from langchain_community.document_loaders import WebBaseLoader

# Load two different blog posts to create a more diverse knowledge base
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

loader = WebBaseLoader("https://lilianweng.github.io/posts/2024-02-05-human-data-quality/")
docs.extend(loader.load())

print(f"Loaded {len(docs)} documents.")

USER_AGENT environment variable not set, consider setting it to identify your requests.


Loaded 2 documents.


Next, we’ll create a chain to generate a summary for each of these documents.

In [2]:
# initialize llm to use
import os
from dotenv import load_dotenv
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

load_dotenv()

model = HuggingFaceEndpoint(
    model="openai/gpt-oss-20b",
    max_new_tokens=1024,
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

llm = ChatHuggingFace(
    llm=model
)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
import uuid
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

summary_chain = (
    {"doc": lambda x: x.page_content} # extract page content from doc
    | ChatPromptTemplate.from_template("Summarize the following document: \n\n{doc}")
    | llm
    | StrOutputParser()
)

summaries = summary_chain.batch(docs, {"max_conceurrency": 5})

print(summaries[0])

            id = uuid7()
Future versions will require UUID v7.
  input_data = validator(cls_, input_data)


**LLM‑Powered Autonomous Agents – Key Concepts & Highlights**

| Section | Main Ideas |
|---------|------------|
| **Agent System Overview** | Large Language Models (LLMs) act as the “brain” of an autonomous agent, complemented by three core modules: (1) **Planning**, (2) **Memory**, and (3) **Tool Use**. |
| **Planning** | *Task Decomposition* – chain‑of‑thought (CoT), Tree‑of‑Thoughts, or LLM + P (classical planner).<br>*Self‑Reflection* – ReAct, Reflexion, Chain‑of‑Hindsight, Algorithm Distillation – mechanisms for the agent to critique and improve past actions. |
| **Memory** | *Short‑term (STM)* – in‑context learning (limited by model’s context window).<br>*Long‑term (LTM)* – external vector stores (FAISS, HNSW, ScaNN, ANNOY, LSH) that support fast Maximum Inner Product Search (MIPS) for retrieval‑augmented reasoning. |
| **Tool Use** | *Neuro‑symbolic routing* – MRKL system (LLM routes queries to expert modules).<br>*Tool‑augmented LLMs* – TALM, Toolformer, ChatGPT plugins, OpenA

In [4]:
len(summaries)

2

Now comes the crucial part. We need a MultiVectorRetriever which requires two main components:

1. A vectorstore to store the embeddings of our summaries.
2. A docstore (a simple key-value store) to hold the original, full documents.

In [43]:
from typing import Dict, List, Optional
from enum import Enum
from pydantic import BaseModel, Field, model_validator
from langchain_core.vectorstores import VectorStore
from langchain_core.stores import BaseStore, ByteStore
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import (
    CallbackManagerForRetrieverRun,
    AsyncCallbackManagerForRetrieverRun,
)
from langchain_core.stores import InMemoryStore


class SearchType(str, Enum):
    """Enumerator of the types of search to perform."""

    similarity = "similarity"
    """Similarity search."""
    similarity_score_threshold = "similarity_score_threshold"
    """Similarity search with a score threshold."""
    mmr = "mmr"
    """Maximal Marginal Relevance reranking of similarity search."""


class MultiVectorRetriever(BaseRetriever):
    """Retrieve from a set of multiple embeddings for the same document."""

    vectorstore: VectorStore
    """The underlying vectorstore to use to store small chunks
    and their embedding vectors"""
    byte_store: Optional[ByteStore] = None
    """The lower-level backing storage layer for the parent documents"""
    docstore: BaseStore[str, Document]
    """The storage interface for the parent documents"""
    id_key: str = "doc_id"
    search_kwargs: dict = Field(default_factory=dict)
    """Keyword arguments to pass to the search function."""
    search_type: SearchType = SearchType.similarity
    """Type of search to perform (similarity / mmr)"""

    @model_validator(mode='before')
    @classmethod
    def shim_docstore(cls, values: Dict) -> Dict:
        """Validate and set up the docstore from byte_store if needed."""
        byte_store = values.get("byte_store")
        docstore = values.get("docstore")
        if docstore is None:
            if byte_store is not None:
                # Use byte_store directly as docstore
                docstore = byte_store
            else:
                # Create a default InMemoryStore
                docstore = InMemoryStore()
        values["docstore"] = docstore
        return values

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        """Get documents relevant to a query.
        Args:
            query: String to find relevant documents for
            run_manager: The callbacks handler to use
        Returns:
            List of relevant documents
        """
        if self.search_type == SearchType.mmr:
            sub_docs = self.vectorstore.max_marginal_relevance_search(
                query, **self.search_kwargs
            )
        elif self.search_type == SearchType.similarity_score_threshold:
            sub_docs_and_similarities = (
                self.vectorstore.similarity_search_with_relevance_scores(
                    query, **self.search_kwargs
                )
            )
            sub_docs = [sub_doc for sub_doc, _ in sub_docs_and_similarities]
        else:
            sub_docs = self.vectorstore.similarity_search(query, **self.search_kwargs)

        # We do this to maintain the order of the ids that are returned
        ids = []
        for d in sub_docs:
            if self.id_key in d.metadata and d.metadata[self.id_key] not in ids:
                ids.append(d.metadata[self.id_key])
        docs = self.docstore.mget(ids)
        return [d for d in docs if d is not None]

    async def _aget_relevant_documents(
        self, query: str, *, run_manager: AsyncCallbackManagerForRetrieverRun
    ) -> List[Document]:
        """Asynchronously get documents relevant to a query.
        Args:
            query: String to find relevant documents for
            run_manager: The callbacks handler to use
        Returns:
            List of relevant documents
        """
        if self.search_type == SearchType.mmr:
            sub_docs = await self.vectorstore.amax_marginal_relevance_search(
                query, **self.search_kwargs
            )
        elif self.search_type == SearchType.similarity_score_threshold:
            sub_docs_and_similarities = (
                await self.vectorstore.asimilarity_search_with_relevance_scores(
                    query, **self.search_kwargs
                )
            )
            sub_docs = [sub_doc for sub_doc, _ in sub_docs_and_similarities]
        else:
            sub_docs = await self.vectorstore.asimilarity_search(
                query, **self.search_kwargs
            )

        # We do this to maintain the order of the ids that are returned
        ids = []
        for d in sub_docs:
            if self.id_key in d.metadata and d.metadata[self.id_key] not in ids:
                ids.append(d.metadata[self.id_key])
        docs = await self.docstore.amget(ids)
        return [d for d in docs if d is not None]

In [44]:
from langchain_core.stores import InMemoryByteStore
from langchain_core.documents import Document
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEndpointEmbeddings

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model="Qwen/Qwen3-Embedding-8B",
    task="feature-extraction",
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

vector_store = Chroma(
    collection_name="summaries",
    embedding_function=hf_embeddings,
)

store = InMemoryByteStore()
id_key = "doc_id"  # will link sumamries to parent doc ids

retriever = MultiVectorRetriever(
    vectorstore=vector_store,
    byte_store=store,
    id_key=id_key,
)

# Generate unique IDs for each of our original documents
doc_ids = [str(uuid.uuid4()) for _ in docs]

# Create new Document objects for the summaries, adding the 'doc_id' to their metadata
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(summaries)
]

# Add the summaries to the vectorstore
retriever.vectorstore.add_documents(summary_docs)

# Add the original documents to the docstore, linking them by the same IDs
retriever.docstore.mset(list(zip(doc_ids, docs)))


In [45]:
query = "memory in agents"

sub_docs = vector_store.similarity_search(query, k=1)
print("Searching with summaries")
print(sub_docs[0].page_content)

Searching with summaries
**LLM‑Powered Autonomous Agents – Key Concepts & Highlights**

| Section | Main Ideas |
|---------|------------|
| **Agent System Overview** | Large Language Models (LLMs) act as the “brain” of an autonomous agent, complemented by three core modules: (1) **Planning**, (2) **Memory**, and (3) **Tool Use**. |
| **Planning** | *Task Decomposition* – chain‑of‑thought (CoT), Tree‑of‑Thoughts, or LLM + P (classical planner).<br>*Self‑Reflection* – ReAct, Reflexion, Chain‑of‑Hindsight, Algorithm Distillation – mechanisms for the agent to critique and improve past actions. |
| **Memory** | *Short‑term (STM)* – in‑context learning (limited by model’s context window).<br>*Long‑term (LTM)* – external vector stores (FAISS, HNSW, ScaNN, ANNOY, LSH) that support fast Maximum Inner Product Search (MIPS) for retrieval‑augmented reasoning. |
| **Tool Use** | *Neuro‑symbolic routing* – MRKL system (LLM routes queries to expert modules).<br>*Tool‑augmented LLMs* – TALM, Toolforme

In [46]:
print("\n--- Metadata showing the link to the parent document ---")
print(sub_docs[0].metadata)


--- Metadata showing the link to the parent document ---
{'doc_id': '4ca91542-36ad-4941-ae19-79afbf7b47aa'}


As you can see, the search found the summary that mentions “memory.” Now, the MultiVectorRetriever will use the doc_id from this summary's metadata to automatically fetch the full parent document from the docstore.

In [50]:
# Let the full retriever do its job
retrieved_docs = retriever.invoke(query, n_results=1)

# Print the beginning of the retrieved full document
print("\n--- The full document retrieved by the MultiVectorRetriever ---")
print(retrieved_docs[0].page_content[0:500])


--- The full document retrieved by the MultiVectorRetriever ---






LLM Powered Autonomous Agents | Lil'Log







































Lil'Log

















|






Posts




Archive




Search




Tags




FAQ









      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


 


Table of Contents



Agent System Overview

Component One: Planning

Task Decomposition

Self-Reflection


Component Two: Memory

Types of Memory

Maximum Inner Product Search (MIPS)


Component Three:


### Hierarchical Indexing (RAPTOR) Knowledge Tree

The Theory: RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) 

It takes the multi-representation idea a step further. Instead of just one layer of summaries, RAPTOR builds a multi-level tree of summaries. It starts by clustering small document chunks. It then summarizes each cluster.

Then, it takes these summaries, clusters them, and summarizes the new clusters. This process repeats, creating a hierarchy of knowledge from fine-grained details to high-level concepts. When you query, you can search at different levels of this tree, allowing for retrieval that can be as specific or as general as needed.

### Token-level Precision (ColBert)

ColBERT (Contextualized Late Interaction over BERT) offers a more granular approach. It generates a separate, context-aware embedding for every single token in the document.

Now, let’s index a Wikipedia page using ColBERT’s unique token-level approach.