**Introduction**

Retrieval-Augmented Generation (RAG) is a powerful approach that enhances large language models by grounding their responses in real, external knowledge. Instead of relying only on what the model has learned during training, RAG retrieves relevant information from a knowledge base and uses it to generate accurate, reliable answers.

In this project, we use LangChain and LangGraph to build a modular and transparent RAG pipeline. LangChain handles document loading, chunking, embeddings, and retrieval, while LangGraph structures the workflow into clear nodes such as retrieval and generation. Together, they create a flexible, efficient, and scalable system capable of answering questions based on custom knowledge sources. This setup forms a strong foundation for practical applications like chatbots, enterprise search, support assistants, and domain-specific AI tools.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import os
os.chdir('/content/drive/MyDrive/Projects/Tools for Generative AI/RAG System with LangChain and LangGraph')

In [None]:
pip install -U langchain langchain-community langchain-openai langchain-pinecone



**Set your API key**

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "Your API Key"

**Load your KB + Split into chunks**

In [None]:
import json
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

with open("knowledge_base.json", "r") as f:
    items = json.load(f)

docs = [
    Document(page_content=item["text"], metadata={"id": item.get("id", i)})
    for i, item in enumerate(items)
]

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
chunks = splitter.split_documents(docs)
print("✅ Chunks created:", len(chunks))



✅ Chunks created: 10


**Build vector_store**

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)

vector_store.add_documents(chunks)

print("✅ Vector store ready with", len(chunks), "chunks.")


✅ Vector store ready with 10 chunks.


**Build LangGraph**

In [None]:
from typing_extensions import TypedDict, List
from langchain_core.documents import Document
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

# ---- State
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# ---- Nodes
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"], k=5)
    return {"context": retrieved_docs}

def format_context(docs):
    return "\n\n".join([d.page_content for d in docs])

def generate(state: State):
    ctx = format_context(state["context"])
    q = state["question"]
    msg = f"Use ONLY the context to answer.\n\nContext:\n{ctx}\n\nQuestion: {q}\nAnswer:"
    out = llm.invoke(msg).content
    return {"answer": out}

# ---- Graph
graph_obj = StateGraph(State)
graph_obj.add_node("retrieve", retrieve)
graph_obj.add_node("generate", generate)

graph_obj.set_entry_point("retrieve")
graph_obj.add_edge("retrieve", "generate")
graph_obj.add_edge("generate", END)

graph = graph_obj.compile()


**Chat Loop**

In [None]:
while True:
    question = input("Ask a question (or type 'exit'): ")
    if question.lower() == "exit":
        break

    response = graph.invoke({"question": question})
    print("\n✅ ANSWER:\n", response["answer"])


Ask a question (or type 'exit'): Rag

✅ ANSWER:
 RAG, or Retrieval-Augmented Generation, combines document retrieval with large language model (LLM) generation to provide accurate and grounded answers. It involves evaluating metrics such as relevance, groundedness, hallucination rate, and latency, with common metrics including Recall@k and citation accuracy. A vector database is used to store embeddings and enable fast similarity search, which is essential for RAG systems.
Ask a question (or type 'exit'): exit


**Conclusion**

This project successfully demonstrates how Retrieval-Augmented Generation (RAG) can be built using LangChain and LangGraph to create a smarter, more reliable question-answering system. By combining document chunking, embedding-based retrieval, and LLM generation, the system provides accurate, context-grounded answers while reducing hallucinations. LangGraph’s node-based workflow makes the entire pipeline modular, transparent, and easy to extend. Overall, this RAG system forms a strong foundation for building advanced AI assistants, knowledge bots, and enterprise search applications.