# Retrieval Augmented Generation (RAG) with LangChain

This notebook demonstrates a RAG implementation using LangChain, OpenRouter (or OpenAI), and vector stores. RAG enhances Large Language Model (LLM) responses by:
1. Retrieving relevant information from a knowledge base
2. Augmenting the prompt with this context
3. Generating a response using an LLM

We'll break down each component and show how they work together.

## 1. Setup and Dependencies

First, we'll import the required libraries and set up our environment. We use:
- `langchain` for the RAG pipeline components
- `bs4` (BeautifulSoup4) for web scraping
- `python-dotenv` for environment variable management
- OpenRouter/OpenAI for embeddings and LLM capabilities

In [None]:
from __future__ import annotations

import os
from typing import List, TypedDict

from dotenv import load_dotenv
import bs4

from langchain import hub
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.vectorstores import InMemoryVectorStore
from langgraph.graph import START, StateGraph

# Load environment variables from .env file
load_dotenv()

## 2. API Configuration

We'll set up the LLM and embeddings clients. We're using OpenRouter in this example, which is API-compatible with OpenAI but allows access to various models. The setup includes:
- Loading API key from environment
- Configuring LLM client with appropriate model and headers
- Setting up embeddings client for vector search

In [None]:
def get_api_key() -> str:
    api_key = os.getenv("OPENROUTER_API_KEY")
    if not api_key:
        raise RuntimeError(
            "OPENROUTER_API_KEY not found. Put it in a .env file, e.g.\n"
            "OPENROUTER_API_KEY=sk-or-... "
        )
    return api_key

api_key = get_api_key()

# Configure LLM
llm = ChatOpenAI(
    model="openai/gpt-4o-mini",
    openai_api_key=api_key,
    openai_api_base="https://openrouter.ai/api/v1",
    # optional but recommended by OpenRouter:
    default_headers={
        "HTTP-Referer": "http://localhost",
        "X-Title": "LangChain RAG Demo",
    },
)

# Configure embeddings
embeddings = OpenAIEmbeddings(
    model="openai/text-embedding-3-large",
    openai_api_key=api_key,
    openai_api_base="https://openrouter.ai/api/v1",
)

## 3. Document Loading and Chunking

For RAG to work effectively, we need to:
1. Load documents from a source (in this case, a web page)
2. Split them into smaller chunks for better retrieval
3. Ensure the chunks have enough context but aren't too large

The `load_and_chunk` function handles this process using:
- `WebBaseLoader` for fetching and parsing web content
- `RecursiveCharacterTextSplitter` for smart document chunking
- BeautifulSoup for robust HTML parsing

In [None]:
def load_and_chunk(url: str) -> List[Document]:
    """Load content from a URL and split it into chunks for RAG."""
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer(name=True)  # Parse whole body robustly
        ),
    )
    docs = loader.load()

    # Split into chunks with some overlap for context preservation
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    return splitter.split_documents(docs)

# Example: Load the LangChain RAG tutorial
url = "https://python.langchain.com/docs/tutorials/rag/"
chunks = load_and_chunk(url)
print(f"Loaded {len(chunks)} chunks from {url}")

# Show a sample chunk
if chunks:
    print("\nSample chunk content:")
    print("=" * 40)
    print(chunks[0].page_content[:300], "...")

## 4. Vector Store Creation

Now we'll create a vector store from our document chunks. This involves:
1. Converting text chunks into vector embeddings
2. Storing them in a vector database (in-memory for this demo)
3. Enabling similarity search for retrieval

The `index_docs` function handles this process using:
- OpenAI's text-embedding-3-large model for embeddings
- LangChain's InMemoryVectorStore for storage and search

In [None]:
# Simple fallback embeddings for testing without API
class SimpleFallbackEmbeddings:
    """Local fallback embeddings that work offline."""
    def __init__(self, dim: int = 16):
        self.dim = dim

    def _text_to_vector(self, text: str):
        import hashlib
        h = hashlib.sha256(text.encode("utf-8")).digest()
        return [(h[i % len(h)] / 255.0) * 2 - 1 for i in range(self.dim)]

    def embed_documents(self, texts: list[str]) -> list[list[float]]:
        return [self._text_to_vector(t or "") for t in texts]

    def embed_query(self, text: str) -> list[float]:
        return self._text_to_vector(text or "")

def index_docs(chunks: List[Document], embeddings: OpenAIEmbeddings):
    """Create a searchable vector store from document chunks."""
    fallback_used = False
    try:
        # Test embeddings with a small sample
        sample_texts = [c.page_content for c in chunks[:2]] if chunks else []
        sample_vectors = embeddings.embed_documents(sample_texts) if sample_texts else []
        if sample_vectors and (not isinstance(sample_vectors, list) or not hasattr(sample_vectors[0], "__len__")):
            raise TypeError(f"Unexpected embeddings response type: {type(sample_vectors)}")
    except Exception as e:
        print(f"WARNING: remote embeddings failed ({repr(e)}). Falling back to local embeddings.")
        embeddings = SimpleFallbackEmbeddings()
        fallback_used = True

    vs = InMemoryVectorStore(embeddings)
    vs.add_documents(chunks)
    return vs

# Create vector store from our chunks
vector_store = index_docs(chunks, embeddings)

# Test similarity search
query = "What is RAG?"
results = vector_store.similarity_search(query, k=2)
print(f"\nTest query: {query}")
print("\nTop 2 most relevant chunks:")
for i, doc in enumerate(results, 1):
    print(f"\n{i}. " + "=" * 40)
    print(doc.page_content[:200], "...")

## 5. RAG Pipeline Construction

Now we'll build the complete RAG pipeline using LangGraph. The pipeline has two main steps:
1. **Retrieve**: Find relevant chunks based on the question
2. **Generate**: Create an answer using the LLM and retrieved context

We use:
- LangGraph for pipeline orchestration
- A slim RAG prompt from LangChain's hub
- TypedDict for type-safe state management

In [None]:
class RAGState(TypedDict):
    """Type-safe state management for the RAG pipeline."""
    question: str
    context: List[Document]
    answer: str

def build_graph(vector_store: InMemoryVectorStore, llm: ChatOpenAI):
    """Build the RAG pipeline as a graph of operations."""
    # Get the slim RAG prompt from the Hub
    prompt = hub.pull("rlm/rag-prompt")

    def retrieve(state: RAGState):
        """Find relevant documents for the question."""
        retrieved = vector_store.similarity_search(state["question"], k=4)
        return {"context": retrieved}

    def generate(state: RAGState):
        """Generate an answer using the LLM and context."""
        ctx_text = "\n\n".join(d.page_content for d in state["context"])
        msgs = prompt.invoke({"question": state["question"], "context": ctx_text})
        out = llm.invoke(msgs)
        return {"answer": out.content}

    # Build and compile the graph
    graph = StateGraph(RAGState)
    graph.add_node("retrieve", retrieve)
    graph.add_node("generate", generate)
    graph.add_edge(START, "retrieve")
    graph.add_edge("retrieve", "generate")
    return graph.compile()

# Create the RAG pipeline
graph = build_graph(vector_store, llm)

## 6. Using the RAG Pipeline

Finally, we'll use our RAG pipeline to answer questions. The pipeline will:
1. Take a question as input
2. Retrieve relevant context from our indexed documents
3. Generate an answer using the LLM and context
4. Show which documents were used as sources

Let's try it with a few example questions!

In [None]:
def ask_rag(question: str):
    """Ask a question using the RAG pipeline and show results."""
    print("=== QUESTION ===")
    print(question)
    
    result = graph.invoke({"question": question})
    
    print("\n=== ANSWER ===")
    print(result["answer"])
    
    print("\n=== TOP CONTEXT SOURCES ===")
    for i, d in enumerate(vector_store.similarity_search(question, k=3), start=1):
        src = d.metadata.get("source") or d.metadata.get("loc") or "unknown"
        print(f"{i}. {src}")

# Try some example questions
questions = [
    "What are the basic steps in RAG?",
    "How does RAG improve LLM responses?",
    "What are some common challenges with RAG systems?",
]

for q in questions:
    ask_rag(q)
    print("\n" + "=" * 80 + "\n")