### **RAG Workflow ‚Äì StateGraph Summary**

#### üîπ Overview
This code defines a complete **Retrieval-Augmented Generation (RAG)** pipeline using **LangGraph StateGraph**.  
Each node represents a processing step, and edges define the workflow logic.

---

###  Nodes

- **retrieve** ‚Üí Fetch relevant documents  
- **grade_documents** ‚Üí Evaluate document relevance  
- **generate** ‚Üí Produce answer using retrieved context  
- **transform_query** ‚Üí Rewrite query if documents are not relevant  

---

### Workflow Steps

1. **START ‚Üí retrieve**  
2. **retrieve ‚Üí grade_documents**  
3. Conditional Flow:
   - If documents are **relevant** ‚Üí `generate`  
   - If **not relevant** ‚Üí `transform_query ‚Üí retrieve`  
4. After generation:
   - If answer is **useful** ‚Üí `END`  
   - If **not supported** ‚Üí Regenerate  
   - If **not useful** ‚Üí `transform_query ‚Üí retrieve`  

---

###  Key Insight
This workflow creates a **self-correcting and adaptive RAG system** that:
- Improves retrieval quality  
- Validates answer grounding  
- Iteratively refines queries  
- Ensures accurate and context-aware responses  


### **Load environment variables from.env**

In [25]:
# Import the 'os' module to interact with the operating system environment variables
import os

# Import 'load_dotenv' from 'dotenv' to load environment variables from a .env file
from dotenv import load_dotenv

# Load environment variables from a .env file into the environment
load_dotenv()

# Retrieve API keys from environment variables and store them in Python variables
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")      # Google API key
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")      # Tavily API key
GROQ_API_KEY = os.getenv("GROQ_API_KEY")          # Groq API key
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")# LangChain API key

# Set the API keys as environment variables for use in the application
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["TAVILY_API_KEY"] = TAVILY_API_KEY
os.environ["GROQ_API_KEY"] = GROQ_API_KEY
os.environ["LANGCHAIN_API_KEY"] = LANGCHAIN_API_KEY

# Additional LangChain configuration
os.environ["LANGCHAIN_TRACING_V2"] = "true"                     # Enable LangChain tracing (for debugging/monitoring)
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"  # Set custom LangChain API endpoint


### **Import Required Modules**

In [26]:
# Import the pprint module to "pretty-print" Python data structures for easier readability
import pprint  

# Import RecursiveCharacterTextSplitter from langchain_text_splitters.character
# This is used to split large text documents into smaller chunks recursively,
# based on characters, sentences, or paragraphs. Useful for processing text in LLM pipelines.
from langchain_text_splitters.character import RecursiveCharacterTextSplitter  

# Import WebBaseLoader from langchain_community.document_loaders
# This loader allows fetching and loading documents directly from web pages (URLs) into your program.
from langchain_community.document_loaders import WebBaseLoader  

# Import Chroma from langchain_community.vectorstores
# Chroma is a vector database for storing embeddings of documents.
# Useful for semantic search and retrieval in LLM applications.
from langchain_community.vectorstores import Chroma  
from langchain_groq import ChatGroq
import os
from langchain_huggingface import HuggingFaceEmbeddings


### **Load Embedding Model**

In [27]:
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

### **Load LLM model**

In [28]:
llm=ChatGroq(model_name="openai/gpt-oss-120b")

### **Web Document Loader and Vector Store Setup with LangChain**

In [29]:
urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to vectorDB
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embeddings,
)
retriever = vectorstore.as_retriever()

### **Retrieval Grader**

This script grades retrieved documents for relevance to a user's question using an LLM and outputs a binary score: **'yes'** or **'no'**.


In [30]:
### MAIN TITLE: Retrieval Grader
# This script uses an LLM to assess the relevance of retrieved documents 
# to a user's question. The goal is to filter out irrelevant retrievals 
# by assigning a binary score: "yes" (relevant) or "no" (not relevant).

# Import necessary modules
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

# -------------------------
# Define data model for grading
# -------------------------
class GradeDocuments(BaseModel):
    """
    Binary score for relevance check on retrieved documents.
    Fields:
    - binary_score: 'yes' if document is relevant, 'no' if not.
    """
    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )

# -------------------------
# Connect the LLM to the structured output model
# -------------------------
# structured_llm_grader ensures the LLM output follows the GradeDocuments model
structured_llm_grader = llm.with_structured_output(GradeDocuments)

# -------------------------
# Create the grading prompt
# -------------------------
system = """You are a grader assessing relevance of a retrieved document to a user question. 
It does not need to be a stringent test. The goal is to filter out erroneous retrievals. 
If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. 
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""

# Human template specifies input placeholders for document and question
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)

# -------------------------
# Combine prompt and LLM
# -------------------------
# This creates a "retrieval_grader" pipeline
retrieval_grader = grade_prompt | structured_llm_grader

# -------------------------
# Example usage
# -------------------------
question = "agent memory"            # User question
docs = retriever.invoke(question)    # Retrieve documents using some retriever
doc_txt = docs[1].page_content      # Select the second document for grading

# Invoke the grader and print the result
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))


binary_score='yes'


In [31]:
### MAIN TITLE: Generate

# -------------------------
# Import necessary modules
# -------------------------
from langchain_classic import hub
from langchain_core.output_parsers import StrOutputParser

# -------------------------
# Load a prebuilt RAG (Retrieval-Augmented Generation) prompt
# -------------------------
# 'rlm/rag-prompt' is a template for generating answers based on context documents
prompt = hub.pull("rlm/rag-prompt")

# -------------------------
# Define a post-processing function for documents
# -------------------------
# Joins the page_content of each document into a single string separated by newlines
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# -------------------------
# Build the RAG chain
# -------------------------
# Pipeline: prompt -> LLM -> output parser
# StrOutputParser ensures the final output is a plain string
rag_chain = prompt | llm | StrOutputParser()

# -------------------------
# Run the generation
# -------------------------
# Pass the context documents and the user question to the RAG chain
generation = rag_chain.invoke({"context": docs, "question": question})

# Print the generated answer
print(generation)


In LLM‚Äëpowered agents, **short‚Äëterm memory** is the in‚Äëcontext information the model can use during a single prompt, effectively the model‚Äôs immediate context window. **Long‚Äëterm memory** is an external store (often a vector database) that lets the agent retain and retrieve unlimited knowledge across sessions. Together they let the agent recall past actions and relevant facts while solving tasks.


###  **Hallucination Grader**

In [32]:
### MAIN TITLE: Hallucination Grader
# This script grades an LLM-generated answer for hallucinations. 
# It outputs a binary score: 'yes' if the answer is grounded in the provided facts, 
# 'no' if it contains hallucinations or unsupported claims.

# -------------------------
# Define the data model for grading
# -------------------------
from pydantic import BaseModel, Field

class GradeHallucinations(BaseModel):
    """
    Binary score for hallucination presence in the generated answer.
    Fields:
    - binary_score: 'yes' if the answer is grounded in facts, 'no' if hallucinated.
    """
    binary_score: str = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

# -------------------------
# Connect the LLM to the structured output model
# -------------------------
# Ensures the LLM output follows the GradeHallucinations model
structured_llm_grader = llm.with_structured_output(GradeHallucinations)

# -------------------------
# Create the hallucination grading prompt
# -------------------------
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts.
Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""

# Human template specifies input placeholders for facts (documents) and generation
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
    ]
)

# -------------------------
# Combine the prompt and LLM
# -------------------------
# This creates a "hallucination_grader" pipeline
hallucination_grader = hallucination_prompt | structured_llm_grader

# -------------------------
# Run the hallucination grading
# -------------------------
# Pass the set of documents (facts) and the generated answer
result = hallucination_grader.invoke({"documents": docs, "generation": generation})

# Print the grading result
print(result)


binary_score='yes'


### **Answer Grader**

In [33]:
### MAIN TITLE: Answer Grader
# This script grades an LLM-generated answer to check if it properly addresses the user's question.
# It outputs a binary score: 'yes' if the answer resolves the question, 'no' if it does not.

# -------------------------
# Define the data model for grading
# -------------------------
from pydantic import BaseModel, Field

class GradeAnswer(BaseModel):
    """
    Binary score to assess whether the generated answer addresses the user's question.
    Fields:
    - binary_score: 'yes' if the answer resolves the question, 'no' otherwise.
    """
    binary_score: str = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )

# -------------------------
# Connect the LLM to the structured output model
# -------------------------
# Ensures that the LLM output follows the GradeAnswer schema
structured_llm_grader = llm.with_structured_output(GradeAnswer)

# -------------------------
# Create the answer grading prompt
# -------------------------
system = """You are a grader assessing whether an answer addresses / resolves a question.
Give a binary score 'yes' or 'no'. 'Yes' means that the answer resolves the question."""

# Human template specifies input placeholders for the question and generated answer
answer_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
    ]
)

# -------------------------
# Combine the prompt and LLM
# -------------------------
# This creates a "answer_grader" pipeline
answer_grader = answer_prompt | structured_llm_grader

# -------------------------
# Run the answer grading
# -------------------------
# Pass the user question and the generated answer to the grader
result = answer_grader.invoke({"question": question, "generation": generation})

# Print the grading result
print(result)


binary_score='yes'


### **Question Re-writer**

In [34]:
### **Question Re-writer**
# This script rewrites a user question to an optimized version that improves 
# retrieval performance from a vectorstore. It focuses on understanding the semantic intent 
# of the original question.

# -------------------------
# Create the system prompt
# -------------------------
system = """You are a question re-writer that converts an input question to a better version 
that is optimized for vectorstore retrieval. Look at the input and try to reason 
about the underlying semantic intent / meaning."""

# -------------------------
# Create the ChatPromptTemplate
# -------------------------
# The human template specifies the placeholder for the original question
re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        (
            "human",
            "Here is the initial question: \n\n {question} \nFormulate an improved question.",
        ),
    ]
)

# -------------------------
# Build the question rewriter pipeline
# -------------------------
# Pipeline: prompt -> LLM -> string output
# StrOutputParser ensures the output is returned as plain text
question_rewriter = re_write_prompt | llm | StrOutputParser()

# -------------------------
# Run the question rewriter
# -------------------------
# Pass the original question to the rewriter and get the improved question
improved_question = question_rewriter.invoke({"question": question})

# Print the improved question
print(improved_question)


**Improved question:**  
*What is agent memory, and how is it implemented and used in AI or autonomous agents for storing and retrieving information?*


### **Graph State**

In [35]:

### **This code defines a structured data type to represent the state of a RAG pipeline.**
### **It keeps track of the current question, the LLM-generated answer, and the retrieved documents.**

### -------------------------
### Import necessary modules
### -------------------------
from typing import List
from typing_extensions import TypedDict

### -------------------------
### Define GraphState as a TypedDict
### -------------------------
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: The user question (string)
        generation: The answer generated by the LLM (string)
        documents: List of retrieved documents (list of strings)
    """

    question: str
    generation: str
    documents: List[str]


### **Nodes and Edges**


In [36]:
### MAIN TITLE: Nodes and Edges
# This code defines the **nodes** (operations) and **edges** (decisions) of a RAG-style pipeline.
# Each node updates the graph state, and edges determine the next node based on conditions.

from pprint import pprint

# =========================
# NODES
# =========================

def retrieve(state):
    """
    Retrieve relevant documents for the current question.

    Args:
        state (dict): The current graph state, must include 'question'.

    Returns:
        dict: Updated state with 'documents' key containing retrieved documents.
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Retrieve documents from vectorstore or retriever
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}


def generate(state):
    """
    Generate an answer using RAG (retrieval-augmented generation) pipeline.

    Args:
        state (dict): The current graph state, must include 'question' and 'documents'.

    Returns:
        dict: Updated state with 'generation' key containing LLM output.
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # Generate answer using RAG chain
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}


def grade_documents(state):
    """
    Filter retrieved documents to keep only those relevant to the question.

    Args:
        state (dict): The current graph state, must include 'question' and 'documents'.

    Returns:
        dict: Updated state with 'documents' containing only relevant documents.
    """
    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    filtered_docs = []
    for d in documents:
        # Grade each document for relevance
        score = retrieval_grader.invoke({"question": question, "document": d.page_content})
        grade = score.binary_score
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            continue
    return {"documents": filtered_docs, "question": question}


def transform_query(state):
    """
    Rewrites the question to improve retrieval results.

    Args:
        state (dict): The current graph state, must include 'question'.

    Returns:
        dict: Updated state with 'question' replaced by improved question.
    """
    print("---TRANSFORM QUERY---")
    question = state["question"]
    documents = state["documents"]

    # Re-write the question using question rewriter
    better_question = question_rewriter.invoke({"question": question})
    return {"documents": documents, "question": better_question}


# =========================
# EDGES
# =========================

def decide_to_generate(state):
    """
    Decide whether to generate an answer or re-write the question.

    Args:
        state (dict): The current graph state.

    Returns:
        str: Next node to execute ("generate" or "transform_query").
    """
    print("---ASSESS GRADED DOCUMENTS---")
    filtered_documents = state["documents"]

    if not filtered_documents:
        # If no relevant documents remain, re-write the question
        print("---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---")
        return "transform_query"
    else:
        # If there are relevant documents, proceed to generation
        print("---DECISION: GENERATE---")
        return "generate"


def grade_generation_v_documents_and_question(state):
    """
    Check whether the generated answer is grounded in documents and addresses the question.

    Args:
        state (dict): The current graph state, must include 'question', 'documents', and 'generation'.

    Returns:
        str: Decision for next step ("useful", "not useful", or "not supported").
    """
    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    # Check if generation is supported by documents
    score = hallucination_grader.invoke({"documents": documents, "generation": generation})
    grade = score.binary_score

    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check if generation answers the question
        print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"


In [37]:
### MAIN TITLE: RAG Workflow StateGraph
# This code defines the complete RAG pipeline as a state graph using `langgraph`.
# Each node represents an operation (e.g., retrieval, generation, grading),
# and edges define the flow and conditional transitions between nodes.

from langgraph.graph import END, StateGraph, START

# -------------------------
# Initialize the workflow graph
# -------------------------
# 'GraphState' defines the structure of the state that flows through the nodes
workflow = StateGraph(GraphState)

# -------------------------
# Define the nodes
# -------------------------
# Each node is a function that performs a specific operation on the graph state
workflow.add_node("retrieve", retrieve)               # Node for document retrieval
workflow.add_node("grade_documents", grade_documents) # Node for grading document relevance
workflow.add_node("generate", generate)               # Node for RAG generation
workflow.add_node("transform_query", transform_query) # Node for question rewriting

# -------------------------
# Build the graph edges
# -------------------------
# Define the sequence and conditional flow between nodes

# Start the workflow with the 'retrieve' node
workflow.add_edge(START, "retrieve")

# After retrieval, grade the documents
workflow.add_edge("retrieve", "grade_documents")

# Conditional edges based on document grading
# decide_to_generate returns "transform_query" or "generate" depending on document relevance
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "transform_query": "transform_query",
        "generate": "generate",
    },
)

# Re-run retrieval after transforming the query
workflow.add_edge("transform_query", "retrieve")

# Conditional edges based on generation evaluation
# grade_generation_v_documents_and_question returns "useful", "not useful", or "not supported"
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",       # Regenerate if answer not grounded
        "useful": END,                     # End workflow if answer is good
        "not useful": "transform_query",   # Rewrite query if answer doesn't address question
    },
)

# -------------------------
# Compile the workflow
# -------------------------
# Converts the defined nodes and edges into an executable app
app = workflow.compile()


In [38]:
from pprint import pprint

# Run
inputs = {"question": "Explain how the different types of agent memory work?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # Optional: print full state at each node
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('Agent memory comes in two main forms.\u202fShort‚Äëterm memory is the model‚Äôs '
 'in‚Äëcontext window: the current prompt and recent dialogue are fed directly '
 'to the LLM, letting it ‚Äúremember‚Äù information for the immediate turn.\u202f'
 'Long‚Äëterm memory is an external store (often a vector database) that logs '
 'experiences or facts in natural‚Äëlanguage form and can be queried later, '
 'giving the agent essentially unlimited recall across sessions.')


In [39]:
inputs = {"question": "Explain how chain of thought prompting works?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # Optional: print full state at each node
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('Chain‚Äëof‚Äëthought (CoT) prompting asks the model to ‚Äúthink step‚Äëby‚Äëstep,‚Äù '
 'turning a complex query into a sequence of simpler reasoning steps that are '
 'generated in the prompt itself. By explicitly enumerating these intermediate '
 'thoughts, the model can use more test‚Äëtime computation to break the problem '
 'into manageable sub‚Äëtasks and produce a clearer, more accurate answer. This '
 'technique improves performance on difficult tasks by making the model‚Äôs '
 '