# RAG (Retrieval-Augmented Generation) Overview

This notebook demonstrates a minimal Retrieval-Augmented Generation (RAG) pipeline using LangChain components and OpenAI models.
The goal is to: ingest content from a web page, split it into chunks, embed those chunks into a vector store, and then build a retriever + prompt + LLM chain to answer questions using retrieved context.

## What this notebook does
- Loads and parses a target web article using `WebBaseLoader` and BeautifulSoup.
- Splits long text into overlapping chunks with `RecursiveCharacterTextSplitter` for robust retrieval.
- Creates embeddings for each chunk using `OpenAIEmbeddings` and stores them in a `Chroma` vector store.
- Builds a retriever from the vector store to fetch context relevant to a query.
- Pulls a prompt template from the LangChain Hub (`rlm/rag-prompt`) and composes a small runnable chain: retriever -> prompt -> LLM -> output parser.
- Demonstrates running the chain with a sample question.

## Notes and next steps
- You can replace `Chroma` with another vector store backend as needed.
- Adjust `chunk_size` / `chunk_overlap` depending on the document lengths and token budget.
- For production, consider caching, error handling, and an async flow for larger corpora.


In [1]:
from langchain import hub  
from langchain.text_splitter import RecursiveCharacterTextSplitter  
from langchain_community.document_loaders import WebBaseLoader  
from langchain_community.vectorstores import Chroma  
from langchain_core.output_parsers import StrOutputParser  
from langchain_core.runnables import RunnablePassthrough  
from langchain_openai import ChatOpenAI, OpenAIEmbeddings 
import yaml
import bs4  
import os

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
# Load credential from config file
with open('configs/config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Set environment variables
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = config['API']['LANGCHAIN']
os.environ['OPENAI_API_KEY'] = config['API']['OPENAI']

In [3]:
# Create a loader that fetches and parses the target web page
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),  # tuple of URLs to load
    bs_kwargs=dict(  # pass BeautifulSoup-specific kwargs to limit parsing
        parse_only=bs4.SoupStrainer(  # only parse these parts of the page to reduce noise
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

# Fetch and return a list of Document objects
docs = loader.load()  

In [4]:
# Split long documents into smaller overlapping chunks suitable for embeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)  # list of smaller Document chunks

# Create embeddings and store them in a vector DB (Chroma)
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())  # uses OpenAI embeddings under the hood

# Create a retriever to fetch relevant docs given a query
retriever = vectorstore.as_retriever()

In [5]:
# Load a prompt template from the LangChain hub
prompt = hub.pull("rlm/rag-prompt")

# Print the prompt template
print(prompt.messages[0].prompt.template)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:


In [6]:
# Configure the chat LLM to use for generation
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Helper to format retrieved documents into a single context string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)  # join each doc with blank line separators

# Wire up retriever -> prompt -> llm -> parser as a runnable chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}  # map inputs: context comes from retriever + formatter, question passes through
    | prompt  # inject prompt template
    | llm  # call the language model
    | StrOutputParser()  # ensure final output is a plain string
)

# Run the chain with a test question (synchronous invocation)
rag_chain.invoke("What is Task Decomposition?")  # returns model output as a string

'Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps, allowing for easier execution and understanding. It can be achieved through methods such as prompting with specific instructions, utilizing external classical planners, or incorporating human inputs. By decomposing tasks, models can effectively manage and interpret the thinking process involved in completing a task.'