# Naive RAG (Retrieval-Augmented Generation) Demo

This notebook provides a compact, step-by-step demonstration of a naive RAG pipeline for educational and testing purposes.
It shows how to: load web content, split it into chunks, compute embeddings, store/retrieve vectors with Chroma, and run a small prompt + LLM chain to answer questions using retrieved context.

## What you'll find here
- A tiny embedding similarity example (compute embeddings and cosine similarity).
- A web loader example that fetches an article and splits it into chunks.
- Building a Chroma vector store from embeddings and retrieving relevant chunks.
- A simple prompt template and a ChatOpenAI LLM invocation to answer a question using the retrieved context.

## Quick tips
- Adjust `chunk_size`/`chunk_overlap` for your dataset and token limits.
- This notebook is intentionally simple — for production usage add error handling, batching, and caching.


In [None]:
from langchain import hub  
from langchain.text_splitter import RecursiveCharacterTextSplitter  
from langchain_community.document_loaders import WebBaseLoader  
from langchain_community.vectorstores import Chroma  
from langchain_core.output_parsers import StrOutputParser  
from langchain_core.runnables import RunnablePassthrough  
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate
import numpy as np
import yaml
import bs4  
import os

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
# Load credential from config file
with open('configs/config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Set environment variables
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = config['API']['LANGCHAIN']
os.environ['OPENAI_API_KEY'] = config['API']['OPENAI']

In [3]:
# Example question and small document for naive similarity demo
question = "What kinds of pets do I like?"  # user query
document = "My favorite pet is a cat."      # small text to compare against the query

In [4]:
# Create an embeddings client and create embeddings for the query and document
embd = OpenAIEmbeddings()  # OpenAI-based embeddings wrapper
query_result = embd.embed_query(question)  # embedding vector for the question
document_result = embd.embed_query(document)  # embedding vector for the document
len(query_result)  # display length of the resulting vector

1536

In [5]:
# Compute cosine similarity between two vectors
def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)  # numerator: dot product
    norm_vec1 = np.linalg.norm(vec1)  # magnitude of vec1
    norm_vec2 = np.linalg.norm(vec2)  # magnitude of vec2
    return dot_product / (norm_vec1 * norm_vec2)  # normalized cosine similarity

similarity = cosine_similarity(query_result, document_result)  # compute similarity
print("Cosine Similarity:", similarity)  # print the score

Cosine Similarity: 0.880691583503541


In [6]:
# Create a loader that fetches and parses the target web page
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),  # tuple of URLs to load
    bs_kwargs=dict(  # pass BeautifulSoup-specific kwargs to limit parsing
        parse_only=bs4.SoupStrainer(  # only parse these parts of the page to reduce noise
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

# Fetch and return a list of Document objects
docs = loader.load()  

In [7]:
# Split long documents into smaller overlapping chunks suitable for embeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(docs)  # list of smaller Document chunks

# Create embeddings and store them in a vector DB (Chroma)
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())  # uses OpenAI embeddings under the hood

# Create a retriever to fetch relevant docs (return the top 1 result)
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [8]:
# Retrieve the most relevant documents for a question
docs = retriever.get_relevant_documents("What is Task Decomposition?")  # list[Document]

  docs = retriever.get_relevant_documents("What is Task Decomposition?")  # list[Document]


In [9]:
# build a simple template that restricts answers to the provided context
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

# create a prompt object from the template
prompt = ChatPromptTemplate.from_template(template) 

# display the prompt object
print(prompt.messages[0].prompt.template)

Answer the question based only on the following context:
{context}

Question: {question}



In [10]:
# Configure and run the chat model with the prompt
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)  # deterministic replies
chain = prompt | llm  # compose prompt -> llm
response = chain.invoke({"context":docs,"question":"What is Task Decomposition?"})  # run the chain
print(response.content)

Task decomposition is the process of breaking down large tasks into smaller, manageable subgoals in order to efficiently handle complex tasks.
