# Retrieval Augmented Generation
In this notebook, we'll showcase how to build an AI agent that implements retrieval augmented generation powered by a Vector Database. This will allow you to feed external data sources (particularly large documents that would exceed LLM context windows) into your agent and get responses grounded on your ingested data sources (the source of truth)

In [4]:
import os
import bs4
import requests
import langsmith
from dotenv import load_dotenv

from langchain.tools import Tool
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader

In [5]:
load_dotenv()

# LLM API configuration
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

# LangSmith configuration
LANGSMITH_TRACING = os.getenv("LANGSMITH_TRACING", "true")
LANGSMITH_API_KEY = os.getenv("LANGSMITH_API_KEY")
LANGSMITH_PROJECT = os.getenv("LANGSMITH_PROJECT", "default")
LANGSMITH_WORKSPACE_ID = os.getenv("LANGSMITH_WORKSPACE_ID")

In [6]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001") # Embeddings model
vector_store = InMemoryVectorStore(embeddings) # In-memory vector store to hold document embeddings

In [7]:
# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")

Total characters: 43047


In [8]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 63 sub-documents.


In [9]:
document_ids = vector_store.add_documents(documents=all_splits)
print(document_ids)

['7d27a70d-3cd0-445f-9a0c-52190884ce5d', '8da83da9-8e64-4595-aa20-e327c4e99513', '7e5a6f5d-09a0-4af7-aaf9-2eb1b06c0c2d', '3dd07bb6-092c-468f-9280-789ff2f5a2df', '93bc26c1-c7b7-49c2-aac3-31e76c29dbc9', 'c337bdf1-106e-4aff-aa2d-dfdfdae9e7ec', '9e24761b-dbf7-404e-a61b-5c1b556c8b74', '2b18349d-8ea8-4e89-998b-b51f02bf424e', '8ebb2d26-51b4-40f9-860b-1177f54b97be', 'ef1d8372-1684-4c3e-8f66-e9aca4e715ca', '8ac8fb32-e9ee-43f9-8f40-d91126f24d40', '1731d207-b76d-4eff-a47c-806c9666b55f', '95e443f5-9d77-4162-ae4e-d963e8639296', 'fe671e4b-32b9-4a59-a767-6a01800363f8', '8a365027-8d54-42f7-a972-86b888d992ec', '4b0474ec-c93e-4832-add3-5af881263eea', '7d487f82-8691-4bf6-8d4f-296ac0ccf668', 'd9454301-6a29-45bf-915d-09b57e456ebc', '3b35603c-c44c-4726-939d-2833a90f106f', 'a621040f-bacb-4829-8520-e730d9098ebc', 'c3d7b8d4-c449-4b69-a12d-81bad272b8ab', 'b71200e7-8dce-4250-9be3-972645fd9928', '1e8f5a5e-4af4-4e48-84ee-a31b3bb406f8', '0d5fc988-998b-47f2-8ea3-e9500ea44636', 'b02be6b5-112f-4d66-880c-4fd1f6368dc3',

In [11]:
model = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=5,
)

In [12]:
def retrieve_and_generate(query: str, k: int = 5):
    # Retrieve relevant documents
    results = vector_store.similarity_search(query, k=k)
    
    # Create context from retrieved documents
    context = "\n\n".join([doc.page_content for doc in results])
    
    # Create messages with context
    messages = [
        SystemMessage(content=f'''You are an expert assistant. Use the provided context to answer the user's question accurately and comprehensively. 
        If the answer cannot be found in the context, say so clearly.
        
        Context:
        {context}'''),
        HumanMessage(content=query)
    ]
    
    # Generate response
    response = model.invoke(messages)
    return response.content

# Example usage
user_query = input("Enter your question about the documents: ")
answer = retrieve_and_generate(user_query)
print("\nAnswer based on the documents:")
print(answer)


Answer based on the documents:
Chain of Thought (CoT) is a standard prompting technique used to improve the performance of models on complex tasks. It involves instructing the model to "think step by step," which allows it to utilize more computation during test time. The primary purpose of CoT is to decompose difficult tasks into smaller, simpler, and more manageable steps. This process not only helps in solving complex problems but also provides insight into the model's reasoning process.
