### 🧠 What is Self-Reflection in RAG?
Self-reflection = LLM evaluates its own output:
“Is this clear, complete, and accurate?”

#### Self-Reflection in RAG using LangGraph, we’ll design a workflow where the agent:

1. Generates an initial answer using retrieved context
2. Reflects on that answer with a dedicated self-critic LLM step
3. If unsatisfied, it can revise the query, retrieve again, or regenerate the answer

In [1]:
import os
from typing import List
from pydantic import BaseModel

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_community.document_loaders import TextLoader
from langgraph.graph import StateGraph, END

In [3]:
## Load LLM models

from langchain.chat_models import init_chat_model
from dotenv import load_dotenv
load_dotenv()

llm= init_chat_model("openai:gpt-5-nano")
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001BB59FCA900>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001BB59FCB380>, root_client=<openai.OpenAI object at 0x000001BB59FC8050>, root_async_client=<openai.AsyncOpenAI object at 0x000001BB59FCB0E0>, model_name='gpt-5-nano', model_kwargs={}, openai_api_key=SecretStr('**********'))

In [17]:
docs = TextLoader("research_notes.txt", encoding="utf-8").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50).split_documents(docs)
vectorstore = FAISS.from_documents(chunks, OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

In [18]:
## 2. State Definition

class RAGReflectionState(BaseModel):
    question : str
    retrieved_docs: List[Document] = []
    answer: str = ''
    reflection : str = ''
    revised : bool = False
    attempts : int = 0

In [19]:
## 3. Nodes

# a. Retrive
def retrieve_docs(state: RAGReflectionState) -> RAGReflectionState:
    docs = retriever.invoke(state.question)
    return state.model_copy(update={"retrieved_docs": docs})


In [20]:
# b. Generate Answer

def generate_answer(state: RAGReflectionState) -> RAGReflectionState:
    context = "\n\n".join([doc.page_content for doc in state.retrieved_docs])
    prompt = f"""
    Use the following context to answer the question

    Context:
    {context}

    Question:
    {state.question}
    """

    answer = llm.invoke(prompt).content.strip()
    return state.model_copy(update={"answer": answer, "attempts": state.attempts + 1})

In [21]:
# c. Self-Reflect
def reflect_on_answer(state: RAGReflectionState) -> RAGReflectionState:
    
    prompt = f"""
        Reflect on the following answer to see if it fully addresses the question. 
        State YES if it is complete and correct, or NO with an explanation.

        Question: {state.question}

        Answer: {state.answer}

        Respond like:
        Reflection: YES or NO
        Explanation: ...
    """
    
    result = llm.invoke(prompt).content
    is_ok = "reflection: yes" in result.lower()
    return state.model_copy(update={"reflection": result, "revised": not is_ok})

In [22]:
# d. Finalizer
def finalize(state: RAGReflectionState) -> RAGReflectionState:
    return state

In [23]:
## Langgraph DAG

builder = StateGraph(RAGReflectionState)

builder.add_node("retriever", retrieve_docs)
builder.add_node("responder", generate_answer)
builder.add_node("reflector", reflect_on_answer)
builder.add_node("done", finalize)

builder.set_entry_point("retriever")
builder.add_edge("retriever", "responder")
builder.add_edge("responder", "reflector")
builder.add_conditional_edges(
    "reflector", 
    lambda s: "done" if not s.revised or s.attempts >= 2 else "retriever"
)
builder.add_edge("done", END)
graph = builder.compile()

In [None]:
graph

In [26]:
# -------------------------
# 5. Run the Agent
# -------------------------
if __name__ == "__main__":
    user_query = "What are the transformers variants in production deployments?"
    init_state = RAGReflectionState(question=user_query)
    result = graph.invoke(init_state)

    print("\n🧠 Final Answer:\n", result["answer"])
    print("\n🔁 Reflection Log:\n", result["reflection"])
    print("🔄 Total Attempts:", result["attempts"])


🧠 Final Answer:
 - EfficientFormer — deployed on Raspberry Pi 4 (quantized int8 mode works with minimal accuracy loss)
- TinyBERT — used for production classification (support ticket priority tagging)
- LLaMA2 with FlashAttention2 — integrated to reduce context latency (≈50% reduction)

These variants are indicated as active in deployment contexts within the log.

🔁 Reflection Log:
 Reflection: YES
Explanation: The answer lists three transformers variants used in production deployments (EfficientFormer on Raspberry Pi 4 with int8 quantization, TinyBERT for production classification, and LLaMA2 with FlashAttention2 to cut context latency). It also notes these are active in deployment logs. If there are additional variants or details, they may not be captured in the current snippet.
🔄 Total Attempts: 1
