# Naive RAG (Retrieval-Augmented Generation) Demo

This notebook provides a compact, step-by-step demonstration of a naive RAG pipeline for learning and testing purposes.
It shows how to: load web content, split it into chunks, compute embeddings, store/retrieve vectors with Chroma, and run a small prompt + LLM chain to answer questions using retrieved context.

## What this notebook contains
- A tiny embedding similarity example (compute embeddings and cosine similarity).
- A web loader example that fetches an article and splits it into chunks.
- Building a Chroma vector store from embeddings and retrieving relevant chunks.
- A simple prompt template and a ChatOpenAI LLM invocation to answer a question using the retrieved context.


In [1]:
from langchain import hub  
from langchain.text_splitter import RecursiveCharacterTextSplitter  
from langchain_community.document_loaders import WebBaseLoader  
from langchain_community.vectorstores import Chroma  
from langchain_core.output_parsers import StrOutputParser  
from langchain_core.runnables import RunnablePassthrough  
from langchain_openai import ChatOpenAI, OpenAIEmbeddings 
from langchain.prompts import ChatPromptTemplate
import numpy as np
import yaml
import bs4  
import os

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
# Load credential from config file
with open('configs/config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Set environment variables
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = config['API']['LANGCHAIN']
os.environ['OPENAI_API_KEY'] = config['API']['OPENAI']

- A tiny embedding similarity example (compute embeddings and cosine similarity).

In [3]:
# Example question and small document for naive similarity demo
question = "What kinds of pets do I like?"  # user query
document = "My favorite pet is a cat."      # small text to compare against the query

# Create an embeddings client and create embeddings for the query and document
embd = OpenAIEmbeddings()  
query_result = embd.embed_query(question)  
document_result = embd.embed_query(document) 
len(query_result) 

# Compute cosine similarity between two vectors
def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)  # numerator: dot product
    norm_vec1 = np.linalg.norm(vec1)  # magnitude of vec1
    norm_vec2 = np.linalg.norm(vec2)  # magnitude of vec2
    return dot_product / (norm_vec1 * norm_vec2)  # normalized cosine similarity

similarity = cosine_similarity(query_result, document_result)  # compute similarity
print("Cosine Similarity:", similarity)  # print the score

Cosine Similarity: 0.880691583503541


- Load and parse a target web article using `WebBaseLoader` and BeautifulSoup.

In [4]:
# Create a loader that fetches and parses the target web page
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),  # tuple of URLs to load
    bs_kwargs=dict(  # pass BeautifulSoup-specific kwargs to limit parsing
        parse_only=bs4.SoupStrainer(  # only parse these parts of the page to reduce noise
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

# Fetch and return a list of Document objects
docs = loader.load()  

- Split long text into overlapping chunks with `RecursiveCharacterTextSplitter`.
- Create embeddings for each chunk using `OpenAIEmbeddings` and store them in a `Chroma` vector store.
- Build a retriever from the vector store to fetch context relevant to a query.




In [5]:
# Split long documents into smaller overlapping chunks suitable for embeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(docs)  # list of smaller Document chunks

# Create embeddings and store them in a vector DB (Chroma)
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())  # uses OpenAI embeddings under the hood

# Create a retriever to fetch relevant docs (return the top 1 result)
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [6]:
# Retrieve the most relevant documents for a question
docs = retriever.get_relevant_documents("What is Task Decomposition?")  # list[Document]
print(docs)

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.')]


  docs = retriever.get_relevant_documents("What is Task Decomposition?")  # list[Document]


- Build a prompt template that restricts answers to the provided context

In [7]:
# Build a simple template that restricts answers to the provided context (alternative prebuilt prompt from hub.pull("rlm/rag-prompt"))
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

# Create a prompt object from the template
prompt = ChatPromptTemplate.from_template(template) 

# Display the prompt object
print(prompt.messages[0].prompt.template)

Answer the question based only on the following context:
{context}

Question: {question}



- Compose a small runnable chain: retriever -> prompt -> LLM -> output parser.
- Demonstrate running the chain with a sample question.

In [8]:
# Configure the chat LLM to use for generation
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Wire up retriever -> prompt -> llm -> parser as a runnable chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}  # map inputs: context comes from retriever + formatter, question passes through
    | prompt  # inject prompt template
    | llm  # call the language model
    | StrOutputParser()  # ensure final output is a plain string
)

# Run the chain with a test question (synchronous invocation)
rag_chain.invoke("What is Task Decomposition?")  # returns model output as a string

'Task decomposition is the process of breaking down large tasks into smaller, manageable subgoals in order to efficiently handle complex tasks.'