## RAG: Knowledge Base Agent

### Overview

This notebook demonstrates how to create a **knowledge base (KB) agent** using **Retrieval-Augmented Generation (RAG)** techniques in LangGraph. 

- The agent uses vector embeddings to retrieve relevant context from a document and answers user questions by augmenting the prompt with that context.

### Scenario

Suppose You’re building a support assistant for a researcher that upload a scholar paper. The assistant should be able to:

- Search through the documentation
- Retrieve the most relevant sections
- Provide helpful answers grounded in the retrieved information

This approach, aka **Retrieval-Augmented Generation (RAG)** — is a powerful technique for building agents that are accurate, verifiable, and up-to-date.

### Import Libraries

In [11]:
from typing import List
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langgraph.graph import START, END, StateGraph
from langgraph.graph.message import MessagesState
from IPython.display import Image, display

In [12]:
from dotenv import load_dotenv
load_dotenv()

True

### Preparing Data for RAG Pipelines

#### 1. Documents Processing

In [13]:
file_path = "the-era-of-experience.pdf"

In [14]:
loader = PyPDFLoader(file_path)

In [15]:
pages = []
async for page in loader.alazy_load():
    pages.append(page)

In [16]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=200
)

In [17]:
all_splits = text_splitter.split_documents(pages)

#### 2. Vector Representation

In [18]:
embeddings_fn = OpenAIEmbeddings(
    model="text-embedding-3-large"
)

#### 3. Vector Store

In [19]:
vector_store = Chroma(
    collection_name="demo",
    embedding_function=embeddings_fn
)

_ = vector_store.add_documents(documents=all_splits)

### Define State Schema

We define a State Schema for managing:

- User query
- Retrieved documents
- Generated answer

In [20]:
class State(MessagesState):
    question: str
    documents: List[Document]
    answer: str

### RAG Nodes

The agent should:
- fetch relevant document chunks based on the user query
- combine the retrieved documents and use them as context
- invoke the LLM to generate a response

#### 1. Retrieve Node

- Performs a similarity search using the vector store.
- Retrieves documents related to the input question and stores them in state.

In [21]:
def retrieve(state: State):
    question = state["question"]
    retrieved_docs = vector_store.similarity_search(question)
    return {"documents": retrieved_docs}

#### 2. Augment Node

- Uses a `ChatPromptTemplate` with placeholders for `question` and `context`.
- Constructs a system message with relevant document excerpts and user input.

In [22]:
def augment(state: State):
    question = state["question"]
    documents = state["documents"]
    docs_content = "\n\n".join(doc.page_content for doc in documents)

    template = ChatPromptTemplate([
        ("system", "You are an assistant for question-answering tasks."),
        ("human", "Use the following pieces of retrieved context to answer the question. "
                "If you don't know the answer, just say that you don't know. " 
                "Use three sentences maximum and keep the answer concise. "
                "\n# Question: \n-> {question} "
                "\n# Context: \n-> {context} "
                "\n# Answer: "),
    ])

    messages = template.invoke(
        {"context": docs_content, "question": question}
    ).to_messages()

    return {"messages": messages}

#### 3. Generate Node

- Calls the LLM with the constructed prompt to produce an answer.

In [23]:
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.0,
)

def generate(state: State):
    ai_message = llm.invoke(state["messages"])
    return {"answer": ai_message.content, "messages": ai_message}

### RAG Workflow

In [24]:
workflow = StateGraph(State)

workflow.add_node("retrieve", retrieve)
workflow.add_node("augment", augment)
workflow.add_node("generate", generate)

workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "augment")
workflow.add_edge("augment", "generate")
workflow.add_edge("generate", END)

In [26]:
graph = workflow.compile()

### Test a Query

In [27]:
output = graph.invoke(
    {"question": "How many eras for reinforcement learning history?"}
)

In [None]:
print(output["answer"])

The context suggests that there are at least three distinct eras in the history of reinforcement learning: the era of simulation, the era of human data, and the emerging era of experience. Each era focuses on different aspects and challenges of reinforcement learning. However, the exact number of eras may vary depending on the classification criteria used.


In [29]:
for message in output["messages"]:
    message.pretty_print()


You are an assistant for question-answering tasks.

Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. 
# Question: 
-> How many eras for reinforcement learning history? 
# Context: 
-> Why Now?
Learning from experience is not new. Reinforcement learning systems have previously mastered a large
number of complex tasks that were represented in a simulator with a clear reward signal (c.f., approximately,
the “era of simulation” in Figure 1). For example, RL methods equalled or exceeded human performance
5

Reinforcement Learning Methods
Reinforcement learning (RL) has a rich history that is deeply rooted in autonomous learning, where agents
learn for themselves through direct interaction with their environment. Early RL research yielded a suite of
powerful concepts and algorithms. For example, temporal difference learning [35] enabled agents to estimate

### Experiment

Now that you understood how it works, experiment with new things.

- Change the embedding model
- Change the parameters of RecursiveCharacterTextSplitter(chunk_size and chunk_overlap)
- Use your own document
- Add More File Types