### 🧠 What is Self-Reflection in RAG?
Self-reflection = LLM evaluates its own output:
“Is this clear, complete, and accurate?”

#### Self-Reflection in RAG using LangGraph, we’ll design a workflow where the agent:

1. Generates an initial answer using retrieved context
2. Reflects on that answer with a dedicated self-critic LLM step
3. If unsatisfied, it can revise the query, retrieve again, or regenerate the answer

In [6]:
import os
from typing import List
from pydantic import BaseModel
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from langgraph.graph import StateGraph, END

In [7]:
from langchain_groq import ChatGroq
from dotenv import load_dotenv
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

llm=ChatGroq(model="openai/gpt-oss-120b")

In [9]:
docs = TextLoader(r"research_notes.txt",encoding="utf-8").load()
chunks = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50).split_documents(docs)
embeddings=HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()

In [10]:
# -------------------------
# 2. State Definition
# -------------------------
class RAGReflectionState(BaseModel):
    question: str
    retrieved_docs: List[Document] = []
    answer: str = ""
    reflection: str = ""
    revised: bool = False
    attempts: int = 0

In [11]:
# -------------------------
# 3. Nodes
# -------------------------

# a. Retrieve
def retrieve_docs(state: RAGReflectionState) -> RAGReflectionState:
    docs = retriever.invoke(state.question)
    return state.model_copy(update={"retrieved_docs": docs})

# b. Generate Answer
def generate_answer(state: RAGReflectionState) -> RAGReflectionState:
    
    context = "\n\n".join([doc.page_content for doc in state.retrieved_docs])
    prompt = f"""
                Use the following context to answer the question:

                Context:
                {context}

                Question:
                {state.question}
            """
    answer = llm.invoke(prompt).content.strip()
    return state.model_copy(update={"answer": answer, "attempts": state.attempts + 1})

In [12]:
# c. Self-Reflect
def reflect_on_answer(state: RAGReflectionState) -> RAGReflectionState:
    
    prompt = f"""
            Reflect on the following answer to see if it fully addresses the question. 
            State YES if it is complete and correct, or NO with an explanation.

            Question: {state.question}

            Answer: {state.answer}

            Respond like:
            Reflection: YES or NO
            Explanation: ...
            """
    result = llm.invoke(prompt).content
    is_ok = "reflection: yes" in result.lower()
    return state.model_copy(update={"reflection": result, "revised": not is_ok})

In [13]:
# d. Finalizer
def finalize(state: RAGReflectionState) -> RAGReflectionState:
    return state

In [14]:
# -------------------------
# 4. LangGraph DAG
# -------------------------
builder = StateGraph(RAGReflectionState)

builder.add_node("retriever", retrieve_docs)
builder.add_node("responder", generate_answer)
builder.add_node("reflector", reflect_on_answer)
builder.add_node("done", finalize)

builder.set_entry_point("retriever")

builder.add_edge("retriever", "responder")
builder.add_edge("responder", "reflector")
builder.add_conditional_edges(
    "reflector",
    # direct to END state if not revised or revised more than 2 times
    lambda s: "done" if not s.revised or s.attempts >= 2 else "retriever"
)

builder.add_edge("done", END)
graph = builder.compile()

In [15]:
# -------------------------
# 5. Run the Agent
# -------------------------
if __name__ == "__main__":
    user_query = "What are the transformer variants in production deployments?"
    init_state = RAGReflectionState(question=user_query)
    result = graph.invoke(init_state)

    print("\n🧠 Final Answer:\n", result["answer"])
    print("\n🔁 Reflection Log:\n", result["reflection"])
    print("🔄 Total Attempts:", result["attempts"])


🧠 Final Answer:
 **Transformer variants that have already been moved into production‑grade use (as reported in the July 2024 log)**  

| Variant | How it is being used in production / deployment target | Key deployment‑relevant notes |
|---------|--------------------------------------------------------|--------------------------------|
| **EfficientFormer** | Deployed on edge devices (Raspberry Pi 4) for image‑classification tasks (TinyImageNet). | Fits the 290 MB memory budget at batch‑size 16; quantized int8 runs with negligible accuracy loss (Top‑1 ≈ 92.4 %). |
| **Longformer** | Serves a customer‑support‑log retrieval/analysis service where documents can be up to 8 k tokens. | Streaming latency is >1.2 s; the team is testing chunk‑based hybrid attention to bring that down, indicating it is already in a production‑oriented pipeline. |
| **TinyBERT** | Powers a support‑ticket priority‑tagging classifier. | Achieves 87 % F1; fine‑tuned with a 2‑layer FFN adapter for domain transfer, 