# Notebook - Langchain RAG Tutorial

**Escuela Colombiana de Ingeniería Julio Garavito**

**Student:** Santiago Botero García

## Step 0: Setup & Imports

Import all required standard and LangChain-specific dependencies used to build a Retrieval-Augmented Generation (RAG) pipeline.

* `os` and `getpass` are used to securely manage environment variables, particularly API keys.
* `bs4` (BeautifulSoup) enables selective HTML parsing when loading web content.
* `WebBaseLoader` loads documents directly from a URL into LangChain’s `Document` format.
* `RecursiveCharacterTextSplitter` breaks large documents into smaller, overlapping chunks suitable for embedding.
* `tool` and `create_agent` are used to expose retrieval logic as a callable tool that an LLM-powered agent can invoke.

This mirrors the LangChain docs’ emphasis on **modular primitives**: loaders, splitters, tools, and agents.


In [None]:
%pip install langchain langchain-text-splitters langchain-community bs4 chromadb

import os
import getpass

import bs4
from langchain.agents import create_agent
from langchain.tools import tool
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

This block ensures the required LLM API key is available before running the pipeline.

* It first checks whether `OPENAI_API_KEY` is already set in the environment.
* If not, it securely prompts the user at runtime without echoing the key to the console.

The LangChain documentation recommends this approach to avoid hard-coding secrets and to keep notebooks and scripts safe for sharing.

In [None]:
if not os.getenv("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

## Step 1: Load and Chunk Documents

### Web Document Loading

This section loads external unstructured data (a blog post) into the RAG pipeline.

* `WebBaseLoader` fetches the webpage and converts it into LangChain `Document` objects.
* `SoupStrainer` limits parsing to relevant HTML sections, reducing noise such as navigation bars or footers.
* This aligns with the LangChain RAG tutorial’s guidance to **control input quality before embedding**.

In [None]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

### Loading Documents into Memory

The webpage content is loaded and normalized into LangChain’s `Document` format, which includes:

* `page_content` (the text)
* `metadata` (source URL and other attributes)

This abstraction allows downstream components (splitters, vector stores, retrievers) to work consistently across different data sources.

In [None]:
docs = loader.load()
print(f"[+] Loaded {len(docs)} docs — characters: {len(docs[0].page_content)}")

### Text Chunking

Large documents are split into smaller overlapping chunks to optimize retrieval and embedding quality.

* `chunk_size=1000` ensures each chunk fits comfortably within model context limits.
* `chunk_overlap=200` preserves semantic continuity between chunks.
* The recursive strategy attempts to split on natural boundaries (paragraphs, sentences) when possible.

This step reflects the RAG principle that **retrieval happens at the chunk level, not the document level**.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
print(f"[+] Split into {len(all_splits)} chunks")

## Step 2: Vector Store & Retrieval Setup

This section introduces the components responsible for semantic search:

* `OpenAIEmbeddings` converts text chunks into high-dimensional vectors.
* `Chroma` is a lightweight, local vector database used to store and search those embeddings.

LangChain’s RAG tutorial highlights that vector stores are **pluggable**, and Chroma is used here for simplicity and fast prototyping.

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

### Creating Embeddings and Vector Store

* Each text chunk is embedded using an OpenAI embedding model.
* The embeddings are stored in Chroma, enabling similarity-based retrieval.
* This forms the knowledge base that the RAG system will query at runtime.

At this point, the pipeline has transformed raw web content into a searchable semantic index.

In [None]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = Chroma.from_documents(all_splits, embedding=embeddings)

print("[+] Created vector store with embeddings")

## Step 3: RAG Tool for the Agent

This function exposes retrieval as a **LangChain tool**, allowing an agent to explicitly decide when and how to perform semantic search based on the user query. It executes a similarity search against the vector store to retrieve the top `k=4` most relevant document chunks, grounding the LLM’s response in external knowledge rather than hidden context. Retrieved documents are then serialized into a structured text format suitable for LLM consumption, while the original documents are returned separately as artifacts. This design enables transparent source inspection, easier debugging, and citation-style workflows, aligning with the tutorial’s pattern of making retrieval a first-class, observable action in the agent loop.


In [None]:
@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve relevant document chunks from the vector store."""
    retrieved_docs = vector_store.similarity_search(query, k=4)
    # Serialize results for the LLM
    serialized = "\n\n".join(
        f"Source: {doc.metadata}\nContent: {doc.page_content}"
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

# Create a list of tools available to the agent
tools = [retrieve_context]

## Step 4: Create the Agent

The system prompt defines the agent’s role and behavior.

* It explicitly instructs the agent to rely on retrieval rather than hallucation.
* This aligns with LangChain’s recommendation to **steer agent behavior via prompts**, not just code.

In [None]:
system_prompt = (
    "You are a retrieval-augmented generation agent. "
    "Use the provided retrieval tool to answer user queries."
)

This creates a LangChain agent that:

* Uses the default chat model configured via environment variables
* Has access to the retrieval tool
* Can reason about when to call the tool versus when to answer directly

This is the core orchestration layer of the RAG system.

In [None]:
agent = create_agent(
    model=None,
    tools=tools,
    system_prompt=system_prompt
)
print("[+] Agent created")

## Step 5: Running the Agent

A sample user query is provided to demonstrate the full RAG loop:

1. The agent receives the question
2. It invokes the retrieval tool
3. Retrieved context is injected
4. The final answer is generated

In [None]:
query = "What is task decomposition?"

The agent is executed in streaming mode, allowing you to observe intermediate reasoning and tool usage step by step.

This reflects the LangChain tutorial’s focus on **inspectability and transparency**, making RAG systems easier to debug and understand.

In [None]:
print(f"\nQuery: {query}\n")
for step in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    # Pretty print each message
    step["messages"][-1].pretty_print()