# Notebook - LangChain RAG Tutorial

**Escuela Colombiana de Ingeniería Julio Garavito**

**Student:** Santiago Botero García

## Step 0: Setup & Imports

This notebook demonstrates a **Retrieval-Augmented Generation (RAG)** pipeline using LangChain,
with support for **OpenAI** and **Google Gemini** as interchangeable LLM providers.


In [None]:
%pip install langchain langchain-classic langchain-text-splitters langchain-community langchain-openai langchain-google-genai bs4 chromadb
import os
import getpass

from dotenv import load_dotenv
from bs4.filter import SoupStrainer
from langchain_classic.chains import RetrievalQA
from langchain_openai import OpenAI as OpenAI_LLM
from langchain_google_genai import ChatGoogleGenerativeAI as Gemini_LLM
from langchain.tools import tool
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

Note: you may need to restart the kernel to use updated packages.


## Step 0: API Key Detection

The pipeline automatically detects which LLM provider is available
based on environment variables:

- `OPENAI_API_KEY` &rarr; OpenAI
- `GOOGLE_API_KEY` &rarr; Gemini


In [46]:
load_dotenv()

LLM_PROVIDER = None

if os.getenv("OPENAI_API_KEY"):
    LLM_PROVIDER = "openai"
elif os.getenv("GOOGLE_API_KEY"):
    LLM_PROVIDER = "gemini"
else:
    raise RuntimeError("No API key found (OPENAI_API_KEY or GOOGLE_API_KEY required)")

print(f"[+] Using provider: {LLM_PROVIDER}")

[+] Using provider: gemini


## Step 1: Load and Chunk Documents

### Web Document Loading

External web content is loaded and split into overlapping chunks
to optimize embedding quality and semantic retrieval.

In [47]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

### Loading Documents into Memory

The webpage content is loaded and normalized into LangChain’s `Document` format, which includes:

* `page_content` (the text)
* `metadata` (source URL and other attributes)

This abstraction allows downstream components (splitters, vector stores, retrievers) to work consistently across different data sources.

In [48]:
docs = loader.load()
print(f"[+] Loaded {len(docs)} docs — characters: {len(docs[0].page_content)}")

[+] Loaded 1 docs — characters: 43047


### Text Chunking

Large documents are split into smaller overlapping chunks to optimize retrieval and embedding quality.

* `chunk_size=1000` ensures each chunk fits comfortably within model context limits.
* `chunk_overlap=200` preserves semantic continuity between chunks.
* The recursive strategy attempts to split on natural boundaries (paragraphs, sentences) when possible.

This step reflects the RAG principle that **retrieval happens at the chunk level, not the document level**.

In [49]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

splits = text_splitter.split_documents(docs)
print(f"[+] Split into {len(splits)} chunks")

[+] Split into 63 chunks


## Step 2: Vector Store & Retrieval Setup

This section introduces the components responsible for semantic search:

* `OpenAIEmbeddings` converts text chunks into high-dimensional vectors.
* `Chroma` is a lightweight, local vector database used to store and search those embeddings.

LangChain’s RAG tutorial highlights that vector stores are **pluggable**, and Chroma is used here for simplicity and fast prototyping.

In [50]:
if LLM_PROVIDER == "openai":
    from langchain_openai import ChatOpenAI, OpenAIEmbeddings

    embeddings = OpenAIEmbeddings(
        model="text-embedding-3-large"
    )

    llm = ChatOpenAI(
        model="gpt-4o-mini",
        temperature=0
    )

elif LLM_PROVIDER == "gemini":
    from langchain_google_genai import (
        ChatGoogleGenerativeAI,
        GoogleGenerativeAIEmbeddings
    )

    embeddings = GoogleGenerativeAIEmbeddings(
        model="gemini-embedding-001"
    )

    llm = ChatGoogleGenerativeAI(
        model="gemini-2.5-flash-lite",
        temperature=0
    )

### Creating Embeddings and Vector Store

* Each text chunk is embedded using an OpenAI embedding model.
* The embeddings are stored in Chroma, enabling similarity-based retrieval.
* This forms the knowledge base that the RAG system will query at runtime.

At this point, the pipeline has transformed raw web content into a searchable semantic index.

In [51]:
vector_store = Chroma.from_documents(
    splits,
    embedding=embeddings
)

print("[+] Vector store created")

[+] Vector store created


## Step 3: RAG Tool for the Retrieval Chain

Retrieval is exposed as a LangChain tool so the Retrieval Chain can explicitly
query the vector store when needed.


In [52]:
@tool
def retrieve_context(query: str) -> str:
    """Retrieve relevant document chunks from the vector store."""
    docs = vector_store.similarity_search(query, k=4)
    return "\n\n".join(d.page_content for d in docs)

## Step 4: Create the Retrieval Chain

The system prompt defines the Retrieval Chain’s role and behavior.

* It explicitly instructs the Retrieval Chain to rely on retrieval rather than hallucation.
* This aligns with LangChain’s recommendation to **steer Retrieval Chain behavior via prompts**, not just code.

In [None]:
system_prompt = (
    "You are a retrieval-augmented generation Retrieval Chain. "
    "Use the provided retrieval tool to answer user queries."
)

This creates a LangChain Retrieval Chain that:

* Uses the default chat model configured via environment variables
* Has access to the retrieval tool
* Can reason about when to call the tool versus when to answer directly

This is the core orchestration layer of the RAG system.

In [54]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vector_store.as_retriever(),
)

print("[+] Retrieval Chain created (multi-provider compatible)")

[+] Retrieval Chain created (multi-provider compatible)


## Step 5: Running the Retrieval Chain

A sample user query is provided to demonstrate the full RAG loop:

1. The Retrieval Chain receives the question
2. It invokes the retrieval tool
3. Retrieved context is injected
4. The final answer is generated

In [55]:
query = "What is task decomposition?"

The Retrieval Chain is executed in streaming mode, allowing you to observe intermediate reasoning and tool usage step by step.

This reflects the LangChain tutorial’s focus on **inspectability and transparency**, making RAG systems easier to debug and understand.

In [56]:
response = qa_chain.run(query)
print(response)

Task decomposition is the process of breaking down a larger task into smaller, more manageable sub-tasks or sub-goals. This can be achieved in several ways:

*   **By LLM with simple prompting:** This involves using straightforward prompts like "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?".
*   **By using task-specific instructions:** For example, providing a prompt like "Write a story outline." for the task of writing a novel.
*   **With human inputs:** Direct human guidance can also be used to decompose tasks.

A distinct approach, known as **LLM+P**, involves using an external classical planner for long-horizon planning. This method uses the Planning Domain Definition Language (PDDL) as an intermediary. The process involves:
1.  The LLM translating the problem into "Problem PDDL".
2.  Requesting a classical planner to generate a PDDL plan based on an existing "Domain PDDL".
3.  The LLM translating the PDDL plan back into natural language.

Essentially, LLM+P out