In [1]:
import os
from dotenv import load_dotenv
load_dotenv() 
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] =  os.environ.get('LANGCHAIN_API_KEY')
os.environ['GEMINI_API_KEY'] = os.environ.get('GEMINI_API_KEY')

In [2]:
api_key = os.environ.get('GEMINI_API_KEY')

## 4. Indexing
### 4.1 Multi Representation Indexing works by : 
- generating propositions(summaries,big,main ideas) from documents
- storing these propositions as embeddings (in vectorstore) indexed to full documenets(in docstore)
- and on a user question it locates the relevant summary and retrieves the document for providing context to the llm"""

In [3]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

loader = WebBaseLoader("https://lilianweng.github.io/posts/2024-02-05-human-data-quality/")
docs.extend(loader.load())

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
import uuid
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI

chain = (
    {"doc":lambda x:x.page_content}
    | ChatPromptTemplate.from_template("Summarize the following document: \n\n{doc}")
    | ChatGoogleGenerativeAI(model="gemini-2.5-flash",api_key=api_key,temperature=0)
    | StrOutputParser()
)

summaries = chain.batch(docs,{"max_concurrency":5})

In [7]:
from langchain.storage import InMemoryByteStore
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.retrievers.multi_vector import MultiVectorRetriever

# The vectorstore to use to index the child chunks
vectorstore = Chroma(collection_name="summaries",
                     embedding_function=GoogleGenerativeAIEmbeddings(model="text-embedding-004", google_api_key=api_key))

# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"

# The retriever
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]

# Docs linked to summaries
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(summaries)
]

# Add
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

In [8]:
query = "Memory in agents"
sub_docs = vectorstore.similarity_search(query,k=1)
sub_docs[0]

Document(metadata={'doc_id': '44c0cde1-30cd-46d9-8963-95db51b7917d'}, page_content="This document provides a comprehensive overview of LLM-powered autonomous agents, which leverage large language models as their core controllers for general problem-solving.\n\nThe architecture of these agents is built around three key components:\n1.  **Planning:** Agents break down complex tasks into manageable subgoals (e.g., Chain of Thought, Tree of Thoughts) and employ self-reflection mechanisms (e.g., ReAct, Reflexion, Chain of Hindsight, Algorithm Distillation) to learn from past actions, correct mistakes, and refine future plans.\n2.  **Memory:** This includes short-term memory (in-context learning within the LLM's finite context window) and long-term memory (external vector stores for infinite information retention and fast retrieval, often using Maximum Inner Product Search (MIPS) algorithms like HNSW or FAISS).\n3.  **Tool Use:** Agents are equipped to call external APIs and modules (e.g., M

In [None]:
retrieved_docs = retriever.get_relevant_documents(query,n_results=1)
retrieved_docs[0].page_content[0:500]

## 4.2 RAPTOR

- Full form → Recursive Abstractive Processing for Tree-Organized Retrieval.

- Purpose → Makes retrieval more effective on long or complex documents by creating a hierarchical index.

- How it works:

    - Chunking — break large documents into small passages.
    - Recursive summarization — group passages into clusters and generate summaries at multiple levels (passage → section → document).
    - Tree structure — these summaries form a hierarchical tree, where higher nodes represent broader concepts.
    - Retrieval — when a query comes in, search can happen at different levels of the tree:
        - Directly retrieve relevant chunks
        - Or retrieve summaries that lead to deeper chunks

## 4.3 ColBERT
- How it works:

    - Token-level embeddings — instead of compressing a whole document into a single vector, ColBERT keeps embeddings for each token (or subword).

    - Late interaction — at query time, it computes similarity between query tokens and document tokens using MaxSim (max similarity across token pairs).

    - Efficiency trick — stores all token embeddings compressed, so retrieval is still fast and scalable.