# **Assignment2_Part2**

In [85]:
!pip install langgraph langchain langchain-community langchain-core faiss-cpu sentence-transformers



Before writing any agent code, all the required libraries need to be installed. Part II builds on Part I by adding FAISS for vector search and sentence-transformers for generating embeddings, which are the two core components that make the RAG pipeline work.

In [86]:
!sudo apt-get install -y zstd
!curl -fsSL https://ollama.com/install.sh | sh

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zstd is already the newest version (1.4.8+dfsg-3build1).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


Since this is a fresh Colab session, Ollama needs to be installed again from scratch. The zstd package is installed first because the newer version of Ollama requires it for extraction before the main install script can run successfully.

In [87]:
import subprocess, time

subprocess.Popen(["ollama", "serve"])
time.sleep(5)
result = subprocess.run(["ollama", "pull", "mistral"], capture_output=False)
print("Done!", result.returncode)

Done! 0


Once Ollama is installed, the server needs to be started as a background process before any model can be used. The mistral model is then pulled so it is ready for the agent to use throughout Part 2, same as in Part 1.

In [88]:
!pip install -U langchain-ollama



In [89]:
from typing import TypedDict, Optional
from langgraph.graph import StateGraph, END
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

llm = ChatOllama(model="mistral")

All necessary libraries are imported including the new ones needed for Part II. HuggingFaceEmbeddings converts text chunks into vectors, FAISS stores and searches those vectors, and RecursiveCharacterTextSplitter breaks the document into manageable chunks before embedding.

In [90]:
sample_text = """
Artificial Intelligence is the simulation of human intelligence by machines.
Machine Learning is a subset of AI that allows systems to learn from data.
Deep Learning uses neural networks with many layers to learn complex patterns.
Natural Language Processing enables computers to understand human language.
LangGraph is a framework for building stateful multi-agent AI applications.
FAISS is a library for efficient similarity search on dense vectors.
RAG stands for Retrieval Augmented Generation which combines search with LLM generation.
Vector embeddings represent text as numerical arrays that capture semantic meaning.
A stateful agent maintains context and memory across multiple interaction steps.
Conditional edges in LangGraph allow dynamic routing based on agent decisions.
"""

with open("document.txt", "w") as f:
    f.write(sample_text)

print("Document created successfully")

Document created successfully


A sample text document is created and saved locally to serve as the knowledge base for the RAG pipeline. This document covers key AI and LangGraph concepts that the agent will retrieve from when answering document related questions.

In [91]:
# Loading and chunk the document
with open("document.txt", "r") as f:
    raw_text = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20
)

chunks = text_splitter.create_documents([raw_text])
print(f"Total chunks created: {len(chunks)}")

Total chunks created: 5


The document is loaded and split into smaller chunks using RecursiveCharacterTextSplitter. Breaking the text into chunks is necessary because embedding the entire document at once would lose granularity, making it harder to retrieve the most relevant section for a given question.

In [92]:
# Creating embeddings and build FAISS vector store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embeddings)
print("Vector store created")

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Vector store created


Each text chunk is converted into a numerical vector using the sentence-transformers model and stored in a FAISS index. This vector store is what allows the agent to find the most semantically similar chunks to any given question during retrival.

In [93]:
# TODO 1: Define Updated Agent State

class AgentState(TypedDict):
    question: str
    decision: Optional[str]
    tool_output: Optional[str]
    retrieved_docs: Optional[list]
    final_answer: Optional[str]

The AgentState is updated from Part I to include a new retrieved_docs field which stores the documents fetched from the FAISS vector store during retrieval. The tool_input field is removed since Part II handles expression extraction differently through the RAG pipeline routing.

In [94]:
# TODO 2: Create Calculator Tool

import re

@tool
def calculator(expression: str) -> str:
    """
    Evaluate a basic mathematical expression.
    """
    try:
        return str(eval(expression))
    except Exception:
        return "Error in calculation."

The calculator tool from Part I is carried over here unchanged. It takes a mathematical expression as a string and evaluates it using Python's eval() function, returning the result or an error message if the expression is invalid.

In [95]:
# TODO 3: Updated Decision Node

def decision_node(state: AgentState) -> AgentState:
    question = state["question"].lower()

    # Rule-based math detection
    if any(char.isdigit() for char in question) and any(op in question for op in ["+", "-", "*", "/"]):
        decision = "use_tool"

    # Rule-based document keyword detection
    elif any(keyword in question for keyword in [
        "artificial intelligence",
        "machine learning",
        "deep learning",
        "natural language processing",
        "langgraph",
        "faiss",
        "rag",
        "vector embeddings",
        "stateful agent",
        "conditional edges"
    ]):
        decision = "use_rag"

    else:
        decision = "no_tool"

    return {
        **state,
        "decision": decision
    }

The decision node is updated from Part I to now handle three routing paths instead of two. The LLM reads the question and decides whether to use the calculator, retrieve from the document using RAG, or answer directly without any tools.

In [96]:
# TODO 4: Tool Node
def tool_node(state: AgentState) -> AgentState:
    question = state["question"]
    match = re.search(r"\d[\d\s\+\-\*\/\.\(\)]*", question)
    expression = match.group().strip() if match else question
    result = calculator.invoke({"expression": expression})
    return {**state, "tool_output": result}

The tool node is carried over from Part I with the same regex extraction fix. It pulls only the mathematical expression from the question before passing it to the calculator, storing the result in tool_output within the state.

In [97]:
# TODO 5: Retrieval Node

def retrieval_node(state: AgentState) -> AgentState:
    """
    Retrieve relevant documents from vector store.
    """
    question = state["question"]

    # Perform similarity search and return top 3 chunks
    docs = vectorstore.similarity_search(question, k=3)

    return {
        **state,
        "retrieved_docs": docs
    }

The retrieval node performs a similarity search on the FAISS vector store using the user's question. It fetches the top 3 most relevant chunks from the document and stores them in the retrieved_docs field of the state so the answer node can use them to generate a context-aware response.

In [98]:
# TODO 6: Updated Answer Node
def answer_node(state: AgentState) -> AgentState:
    question = state["question"]
    tool_output = state.get("tool_output")
    retrieved_docs = state.get("retrieved_docs")

    if tool_output:
        final_answer = f"The result is: {tool_output}"
    elif retrieved_docs:
        context = "\n\n".join([doc.page_content for doc in retrieved_docs])
        prompt = f"""Answer ONLY using the context below. Do not use outside knowledge.

Context:
{context}

Question: {question}
"""
        response = llm.invoke([HumanMessage(content=prompt)])
        final_answer = response.content
    else:
        response = llm.invoke([HumanMessage(content=question)])
        final_answer = response.content

    return {**state, "final_answer": final_answer}

The answer node now handles three cases. If a calculator result exists it formats it directly, if retrieved documents exist it injects them as context into the LLM prompt to generate an informed answer, and if neither exists it falls back to answering directly with the LLM.

In [99]:
# TODO 7: Conditional Routing Function

def route_decision(state: AgentState):
    """
    Route based on decision.
    """
    if "use_tool" in state["decision"]:
        return "tool_node"
    elif "use_rag" in state["decision"]:
        return "retrieval_node"
    else:
        return "answer_node"

The routing function is updated from Part I to handle three paths instead of two. Using in instead of == makes the routing more robust since Mistral sometimes returns extra words along with the decision keyword, which would break exact string matching.

In [100]:
# TODO 8: Build the Updated Graph

workflow = StateGraph(AgentState)

workflow.add_node("decision_node", decision_node)
workflow.add_node("tool_node", tool_node)
workflow.add_node("retrieval_node", retrieval_node)
workflow.add_node("answer_node", answer_node)

workflow.set_entry_point("decision_node")

workflow.add_conditional_edges(
    "decision_node",
    route_decision,
    {
        "tool_node": "tool_node",
        "retrieval_node": "retrieval_node",
        "answer_node": "answer_node"
    }
)

workflow.add_edge("tool_node", "answer_node")
workflow.add_edge("retrieval_node", "answer_node")
workflow.add_edge("answer_node", END)

app = workflow.compile()

The graph is updated from Part I to include the new retrieval_node. Both tool_node and retrieval_node now connect to answer_node, and the conditional edges handle all three routing paths. The graph is compiled into the final runnable app.

In [101]:
# TODO 9: Run the Agent

# Test 1: Math question uses calculator tool
result1 = app.invoke({
    "question": "What is 39 * 4 + 10?",
    "decision": None,
    "tool_output": None,
    "retrieved_docs": None,
    "final_answer": None
})
print("Test 1 - Math Question:")
print(result1["final_answer"])

# Test 2: Document question using above trained text
result2 = app.invoke({
    "question": "What is LangGraph?",
    "decision": None,
    "tool_output": None,
    "retrieved_docs": None,
    "final_answer": None
})
print("\nTest 2 - Document Question:")
print(result2["final_answer"])

# Test 3: General question answers directly
result3 = app.invoke({
    "question": "What is the capital of India?",
    "decision": None,
    "tool_output": None,
    "retrieved_docs": None,
    "final_answer": None
})
print("\nTest 3 - General Question:")
print(result3["final_answer"])

Test 1 - Math Question:
The result is: 166

Test 2 - Document Question:
 LangGraph is a framework for building stateful multi-agent AI applications, where each agent maintains context and memory across multiple interaction steps. It incorporates conditional edges to allow dynamic routing based on agent decisions.

Test 3 - General Question:
 The capital of India is New Delhi. It serves as the political and administrative center of the country, while Mumbai is its financial hub and Bangalore is known for being a major information technology (IT) hub.


**Changes from Part I:**

The decision node was updated to handle three routing paths instead of two, adding use_rag as a new option alongside use_tool and no_tool. A new retrieval_node was added that performs similarity search on the FAISS vector store and stores the top 3 relevant chunks in the state. The answer node was updated to handle three cases:
* calculator result
* retrieved document context
* direct LLM response.

The routing function was updated to use "in" instead of "==" to handle cases where Mistral returns extra words along with the decision keyword.

**Reflection:**

Adding RAG to the agent made it significantly more capable since it can now answer questions grounded in a specific document rather than relying purely on what the LLM already knows.

The biggest challenge was getting Mistral to strictly follow the context and not fall back to its own knowledge. The routing also needed to be made more flexible since the model does not always return a clean single keyword decision.

**Why use LangGraph instead of a simple agent?**

LangGraph provides a structured and stateful workflow instead of a single prompt-response interaction. It allows clear separation between decision-making, tool execution, retrieval, and answer generation. In Part I, it enabled conditional routing between calculator and direct responses. In Part II, it supported clean integration of RAG with multi-branch routing. Overall, it improves modularity, debugging, and scalability compared to a simple agent.

**What limitations did you observe?**

LLM-based routing can be sensitive and may misclassify general questions as document-based without careful logic. RAG performance depends heavily on embedding similarity and chunk quality, which can affect retrieval accuracy. The model may combine multiple retrieved chunks and produce verbose answers. Additionally, the system maintains state only within a session and does not provide persistent long-term memory by default.