# Simplified Agentic RAG System with Gemini API & LangGraph

This notebook implements a complete Retrieval-Augmented Generation (RAG) system that uses a self-critique loop to refine its answers. 

**Workflow:**
1.  **Retrieve**: Fetches relevant documents from a Pinecone vector database.
2.  **Generate**: Creates an initial answer based on the documents.
3.  **Critique**: The LLM evaluates its own answer for completeness.
4.  **Refine (if needed)**: If the answer is incomplete, it retrieves more information and generates a new, improved answer.

## 1. Installation

First, we install all the necessary libraries.

In [None]:
!pip install --upgrade --quiet langgraph langchain-google-genai pinecone-client langchain-pinecone pydantic

## 2. Configuration & Setup

Import all required modules and configure API keys. This notebook will prompt you to enter your keys securely when you run the cell.

In [None]:
import os
import getpass
import json
import time
from typing import List, TypedDict

# Import LangChain libraries for Google Gemini
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

# Import Pinecone and LangChain integration
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore

# Import LangGraph components
from langgraph.graph import StateGraph, END

# Import LangChain core components
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.documents import Document

print("--- ⚙️ Configuring Environment ---")

# Securely get API keys using getpass, suitable for local terminals
if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google Gemini API Key: ")

if "PINECONE_API_KEY" not in os.environ:
    PINECONE_API_KEY = getpass.getpass("Enter your Pinecone API Key: ")
else:
    PINECONE_API_KEY = os.environ["PINECONE_API_KEY"]
    
# Define a unique name for your Pinecone index
INDEX_NAME = "agentic-rag-kb"
print("✅ API keys are set.")

## 3. Preprocessing & Indexing

Here, we load the `self_critique_loop_dataset.json` file, generate embeddings for its content using the Gemini API, and store the vectors in a Pinecone index for efficient retrieval.

In [None]:
print("\n--- 🧠 Preprocessing & Indexing Knowledge Base ---")

# Load the knowledge base from the JSON file
try:
    with open('self_critique_loop_dataset.json', 'r') as f:
        kb_data = json.load(f)
    print(f"Loaded {len(kb_data)} documents from the knowledge base.")
except FileNotFoundError:
    print("Error: 'self_critique_loop_dataset.json' not found. Make sure it's in the same directory.")
    # In a notebook, we might want to stop execution if the file is missing.
    # For this example, we'll just print and continue.

# Prepare documents for indexing
texts = [doc['answer_snippet'] for doc in kb_data]
metadatas = [{'doc_id': doc['doc_id'], 'question': doc['question']} for doc in kb_data]
ids = [doc['doc_id'] for doc in kb_data]

# Initialize the Gemini Embeddings model
print("Initializing embedding model...")
embeddings_model = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004" # Standard Gemini API embedding model
)

# Initialize Pinecone client and create the index if it doesn't exist
print("Connecting to Pinecone and setting up index...")
pc = Pinecone(api_key=PINECONE_API_KEY)
if INDEX_NAME not in pc.list_indexes().names():
    print(f"Index '{INDEX_NAME}' not found. Creating a new one...")
    pc.create_index(
        name=INDEX_NAME, 
        dimension=768, # Dimension for text-embedding-004
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )
    while not pc.describe_index(INDEX_NAME).status['ready']:
        time.sleep(1)
    print("Index created successfully.")
else:
    print(f"Index '{INDEX_NAME}' already exists. Reusing it.")

# Upsert the data into the Pinecone index
vectorstore = PineconeVectorStore.from_texts(
    texts=texts, 
    embedding=embeddings_model, 
    metadatas=metadatas,
    ids=ids, 
    index_name=INDEX_NAME
)
print("✅ Knowledge base has been successfully indexed in Pinecone.")

## 4. LangGraph Workflow Definition

This is the core of the agent. We define the state, the nodes (functions that perform actions), and the edges (connections and logic) that control the flow of the RAG process.

In [None]:
print("\n--- 🏗️ Building Agentic RAG Workflow ---")

# Define the state that will be passed between nodes in the graph
class GraphState(TypedDict):
    question: str
    documents: List[Document]
    generation: str
    critique: str

# Initialize the LLM and Retriever
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

### DEFINE GRAPH NODES ###

def retrieve_kb(state):
    """Retrieves documents from the vector database based on the question."""
    print("---NODE: RETRIEVE KB---")
    question = state["question"]
    documents = retriever.invoke(question)
    print(f"Retrieved {len(documents)} documents.")
    return {"documents": documents, "question": question}

def generate_answer(state):
    """Generates an answer using the retrieved documents as context."""
    print("---NODE: GENERATE ANSWER---")
    question = state["question"]
    documents = state["documents"]
    context = "\n\n".join([f"Source ID: {doc.metadata['doc_id']}\nContent: {doc.page_content}" for doc in documents])
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a helpful assistant for question-answering tasks.
        - Use only the provided context to answer the user's question.
        - Your answer must be grounded in the facts from the snippets.
        - For every claim or piece of information in your answer, you MUST cite the source using its 'Source ID' in the format [KBxxx].
        - If the context does not contain the answer, state that you cannot answer the question."""),
        ("human", f"Question: {question}\n\nContext:\n{context}")
    ])
    chain = prompt | llm | StrOutputParser()
    generation = chain.invoke({"question": question, "context": context})
    print("Generated Answer:\n", generation)
    return {"generation": generation}

def critique_answer(state):
    """Critiques the generated answer for completeness and factual consistency."""
    print("---NODE: CRITIQUE ANSWER---")
    context = "\n\n".join([f"Source ID: {doc.metadata['doc_id']}\nContent: {doc.page_content}" for doc in state["documents"]])
    critique_prompt = ChatPromptTemplate.from_messages([
        ("system", """You are an expert critic. Your task is to evaluate a generated answer against the provided context.
        - Check if the answer fully addresses the user's question based *only* on the information available in the context.
        - Check for any hallucinations or information in the answer that is not supported by the context.
        - If the answer is complete and factually grounded, respond with the single word: 'COMPLETE'.
        - If the answer is lacking key details from the context or makes unsupported claims, respond with 'REFINE:' followed by a concise, comma-separated list of missing topics or keywords that need to be included."""),
        ("human", f"User Question: {state['question']}\n\nContext:\n{context}\n\nGenerated Answer:\n{state['generation']}")
    ])
    chain = critique_prompt | llm | StrOutputParser()
    critique = chain.invoke({})
    print("Critique Result:", critique)
    return {"critique": critique}

def refine_answer(state):
    """Refines the answer by retrieving one more document and regenerating."""
    print("---NODE: REFINE ANSWER---")
    missing_keywords = state["critique"].replace("REFINE:", "").strip()
    print(f"Refining based on missing keywords: {missing_keywords}")
    refinement_retriever = vectorstore.as_retriever(search_kwargs={'k': 1})
    new_doc = refinement_retriever.invoke(missing_keywords)
    documents = state["documents"]
    documents.extend(new_doc)
    state['documents'] = documents
    refined_state = generate_answer(state)
    return {"generation": refined_state["generation"]}

def decide_to_refine(state):
    """Decision logic to route to refinement or end the process."""
    print("---DECISION: CHECK CRITIQUE---")
    if state["critique"].upper() == "COMPLETE":
        print("Decision: COMPLETE. Ending graph.")
        return "end"
    else:
        print("Decision: REFINE. Proceeding to refinement.")
        return "refine"

### ASSEMBLE THE GRAPH ###
workflow = StateGraph(GraphState)
workflow.add_node("retrieve_kb", retrieve_kb)
workflow.add_node("generate_answer", generate_answer)
workflow.add_node("critique_answer", critique_answer)
workflow.add_node("refine_answer", refine_answer)

workflow.set_entry_point("retrieve_kb")
workflow.add_edge("retrieve_kb", "generate_answer")
workflow.add_edge("generate_answer", "critique_answer")
workflow.add_conditional_edges(
    "critique_answer",
    decide_to_refine,
    {"refine": "refine_answer", "end": END},
)
workflow.add_edge("refine_answer", END)

agentic_rag_app = workflow.compile()
print("✅ LangGraph workflow compiled successfully!")

## 5. Testing the Pipeline

Finally, we run the test queries through our compiled agentic RAG application and print the final, citation-backed answers.

In [None]:
print("\n--- 🚀 Running Test Queries ---")

test_queries = [
    "What are best practices for caching?",
    "How should I set up CI/CD pipelines?",
    "What are performance tuning tips?",
    "How do I version my APIs?",
    "What should I consider for error handling?"
]

for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*50}\nEXECUTING QUERY {i}: '{query}'\n{'='*50}")
    
    inputs = {"question": query}
    final_answer = ""
    
    # stream() allows us to see the outputs from each step as it runs
    for output in agentic_rag_app.stream(inputs):
        # The final answer is the last output from either the 'generate_answer' or 'refine_answer' node
        for key, value in output.items():
            if key == "generate_answer" or key == "refine_answer":
                final_answer = value['generation']
    
    print(f"\n✅ FINAL ANSWER FOR QUERY '{query}':\n\n{final_answer}\n")

print("--- All queries processed. ---")