# Week 7: Implementing Self-RAG

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Digital-AI-Finance/agentic-artificial-intelligence/blob/main/L07_Advanced_RAG/L07_Self_RAG.ipynb)

This notebook implements Self-RAG concepts including:
- Adaptive retrieval decisions
- Relevance scoring and critique
- Corrective retrieval actions

In [None]:
# Colab setup
import sys
if 'google.colab' in sys.modules:
    !pip install -q langchain-openai langchain-community chromadb python-dotenv
    from google.colab import userdata
    import os
    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [None]:
import os
from typing import List, Dict, Optional
from dataclasses import dataclass
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from dotenv import load_dotenv

load_dotenv()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
print("Environment ready")

## 1. Create Knowledge Base

In [None]:
# Sample documents about AI agents
documents = [
    Document(page_content="ReAct is a paradigm that combines reasoning and acting in language models. It was proposed by Yao et al. in 2023.", metadata={"source": "react_paper"}),
    Document(page_content="Self-RAG trains LLMs to adaptively retrieve and critique information. It uses special tokens like [Retrieve] and [IsRel].", metadata={"source": "selfrag_paper"}),
    Document(page_content="CRAG evaluates retrieval quality with confidence scores and takes corrective actions when needed.", metadata={"source": "crag_paper"}),
    Document(page_content="LangGraph is a framework for building stateful agents using graph-based architectures.", metadata={"source": "langgraph_docs"}),
    Document(page_content="AutoGen enables multi-agent conversations with code execution capabilities.", metadata={"source": "autogen_docs"}),
    Document(page_content="The capital of France is Paris. It is known for the Eiffel Tower.", metadata={"source": "geography"}),
]

# Create vector store
vectorstore = Chroma.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print(f"Created vector store with {len(documents)} documents")

## 2. Self-RAG Components

In [None]:
@dataclass
class RetrievalDecision:
    """Represents the decision to retrieve or not."""
    should_retrieve: bool
    reasoning: str

@dataclass
class RelevanceScore:
    """Represents relevance evaluation of a document."""
    document: Document
    is_relevant: bool
    score: float
    reasoning: str

@dataclass
class GenerationCritique:
    """Represents critique of generated answer."""
    is_supported: bool
    is_useful: bool
    confidence: float
    reasoning: str

In [None]:
def decide_retrieval(query: str) -> RetrievalDecision:
    """Decide whether retrieval is needed for this query."""
    prompt = f"""Analyze this query and decide if external knowledge retrieval is needed.

Query: {query}

Consider:
- Is this a factual question requiring specific knowledge?
- Can this be answered from general knowledge?
- Does this require recent or domain-specific information?

Respond with:
DECISION: [YES/NO]
REASONING: [Your reasoning]"""
    
    response = llm.invoke(prompt).content
    should_retrieve = "YES" in response.upper().split("DECISION:")[1].split("\n")[0]
    reasoning = response.split("REASONING:")[1].strip() if "REASONING:" in response else ""
    
    return RetrievalDecision(should_retrieve=should_retrieve, reasoning=reasoning)

In [None]:
def evaluate_relevance(query: str, doc: Document) -> RelevanceScore:
    """Evaluate if a retrieved document is relevant to the query."""
    prompt = f"""Evaluate if this document is relevant to the query.

Query: {query}
Document: {doc.page_content}

Rate relevance from 0.0 to 1.0 and explain.

Respond with:
SCORE: [0.0-1.0]
RELEVANT: [YES/NO]
REASONING: [Your reasoning]"""
    
    response = llm.invoke(prompt).content
    
    try:
        score = float(response.split("SCORE:")[1].split("\n")[0].strip())
    except:
        score = 0.5
    
    is_relevant = "YES" in response.upper().split("RELEVANT:")[1].split("\n")[0]
    reasoning = response.split("REASONING:")[1].strip() if "REASONING:" in response else ""
    
    return RelevanceScore(document=doc, is_relevant=is_relevant, score=score, reasoning=reasoning)

In [None]:
def critique_generation(query: str, answer: str, sources: List[Document]) -> GenerationCritique:
    """Critique the generated answer."""
    source_text = "\n".join([d.page_content for d in sources])
    
    prompt = f"""Critique this answer based on the sources.

Query: {query}
Answer: {answer}
Sources: {source_text}

Evaluate:
1. Is the answer supported by the sources?
2. Is the answer useful for the query?
3. Overall confidence (0.0-1.0)?

Respond with:
SUPPORTED: [YES/NO]
USEFUL: [YES/NO]
CONFIDENCE: [0.0-1.0]
REASONING: [Your reasoning]"""
    
    response = llm.invoke(prompt).content
    
    is_supported = "YES" in response.split("SUPPORTED:")[1].split("\n")[0].upper()
    is_useful = "YES" in response.split("USEFUL:")[1].split("\n")[0].upper()
    
    try:
        confidence = float(response.split("CONFIDENCE:")[1].split("\n")[0].strip())
    except:
        confidence = 0.5
    
    reasoning = response.split("REASONING:")[1].strip() if "REASONING:" in response else ""
    
    return GenerationCritique(
        is_supported=is_supported,
        is_useful=is_useful,
        confidence=confidence,
        reasoning=reasoning
    )

## 3. Self-RAG Pipeline

In [None]:
def self_rag(query: str, max_retries: int = 2) -> Dict:
    """Complete Self-RAG pipeline."""
    result = {
        "query": query,
        "retrieval_decision": None,
        "documents": [],
        "relevance_scores": [],
        "answer": None,
        "critique": None,
        "iterations": 0
    }
    
    # Step 1: Decide whether to retrieve
    decision = decide_retrieval(query)
    result["retrieval_decision"] = decision
    print(f"[Retrieve?] {decision.should_retrieve} - {decision.reasoning[:100]}...")
    
    if not decision.should_retrieve:
        # Generate without retrieval
        answer = llm.invoke(f"Answer this question: {query}").content
        result["answer"] = answer
        return result
    
    # Step 2: Retrieve and evaluate
    for iteration in range(max_retries):
        result["iterations"] = iteration + 1
        
        # Retrieve
        docs = retriever.invoke(query)
        result["documents"] = docs
        print(f"[Retrieved] {len(docs)} documents")
        
        # Evaluate relevance
        relevance_scores = [evaluate_relevance(query, doc) for doc in docs]
        result["relevance_scores"] = relevance_scores
        
        relevant_docs = [rs.document for rs in relevance_scores if rs.is_relevant]
        print(f"[Relevant] {len(relevant_docs)}/{len(docs)} documents")
        
        if not relevant_docs:
            print(f"[Retry] No relevant documents, iteration {iteration + 1}")
            continue
        
        # Step 3: Generate with relevant documents
        context = "\n".join([d.page_content for d in relevant_docs])
        answer = llm.invoke(
            f"Based on this context:\n{context}\n\nAnswer: {query}"
        ).content
        result["answer"] = answer
        
        # Step 4: Critique generation
        critique = critique_generation(query, answer, relevant_docs)
        result["critique"] = critique
        print(f"[Critique] Supported: {critique.is_supported}, Useful: {critique.is_useful}, Confidence: {critique.confidence}")
        
        if critique.confidence >= 0.7:
            break
    
    return result

## 4. Test the Pipeline

In [None]:
# Test 1: Query requiring retrieval
result1 = self_rag("What is Self-RAG and how does it work?")
print(f"\nAnswer: {result1['answer'][:200]}...")

In [None]:
# Test 2: Query not requiring retrieval
result2 = self_rag("What is 2 + 2?")
print(f"\nAnswer: {result2['answer']}")

In [None]:
# Test 3: Domain-specific query
result3 = self_rag("Compare ReAct and LangGraph approaches to building agents.")
print(f"\nAnswer: {result3['answer'][:300]}...")

## Summary

This notebook implemented Self-RAG concepts:
1. **Retrieval Decision**: LLM decides whether to retrieve
2. **Relevance Evaluation**: Score and filter retrieved documents
3. **Generation Critique**: Evaluate answer quality
4. **Iterative Refinement**: Retry on low confidence

Key insights:
- Adaptive retrieval reduces noise for simple queries
- Relevance filtering improves context quality
- Self-critique enables error detection