### Vector databases 

FAISS → “Fast local vector search, no server”

Chroma → “Beginner-friendly persistent vector DB”

Pinecone → “Managed cloud vector DB for production”

Weaviate → “Hybrid semantic + keyword search”

Milvus → “Distributed vector DB for massive scale”

Qdrant → “High-performance, self-hosted option”

In [6]:
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from dotenv import load_dotenv
load_dotenv() 


# Check API key
assert os.getenv("OPENAI_API_KEY"), "Set OPENAI_API_KEY"

# -------------------------
# 1. Documents (Loader step)
# -------------------------
docs = [
    Document(page_content="RAG retrieves relevant documents before answering."),
    Document(page_content="Text splitters break large documents into chunks."),
    Document(page_content="FAISS is a local vector database used for similarity search."),
]

# -------------------------
# 2. Splitter
# -------------------------
splitter = RecursiveCharacterTextSplitter(
    chunk_size=120,
    chunk_overlap=30
)
chunks = splitter.split_documents(docs)

# -------------------------
# 3. Embeddings + Vector DB
# -------------------------
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)

# -------------------------
# 4. Retriever
# -------------------------
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# -------------------------
# 5. Prompt + LLM
# -------------------------
prompt = ChatPromptTemplate.from_template(
    """Answer the question using ONLY the context below.

Context:
{context}

Question:
{question}
"""
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# -------------------------
# 6. RAG Pipeline (LCEL)
# -------------------------
rag_chain = (
    {
        "context": retriever,
        "question": lambda x: x
    }
    | prompt
    | llm
    | StrOutputParser()
)

# -------------------------
# 7. Ask a question
# -------------------------
response = rag_chain.invoke("Why do we use text splitters in RAG?")
print(response)


Text splitters are used in RAG to break large documents into chunks, which helps in retrieving relevant documents before answering.


In [11]:
# RAG with chromaDB and metadata insertion to vector store

import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# -------------------------
# 0) API Key
# -------------------------
assert os.getenv("OPENAI_API_KEY"), "Set OPENAI_API_KEY"

# -------------------------
# 1) Loader step (simulated) with METADATA
# -------------------------
docs = [
    Document(
        page_content="RAG retrieves relevant documents before answering using an LLM.",
        metadata={"source": "notes", "topic": "rag", "level": "beginner", "doc_id": "D1"}
    ),
    Document(
        page_content="Chroma is a vector database that can run locally and persist embeddings.",
        metadata={"source": "notes", "topic": "vector_db", "product": "chroma", "doc_id": "D2"}
    ),
    Document(
        page_content="Chunk overlap helps preserve meaning across chunk boundaries in long documents.",
        metadata={"source": "lecture", "topic": "chunking", "level": "beginner", "doc_id": "D3"}
    ),
]

# -------------------------
# 2) Splitter (keeps metadata on each chunk)
# -------------------------
splitter = RecursiveCharacterTextSplitter(chunk_size=140, chunk_overlap=30)
chunks = splitter.split_documents(docs)

# OPTIONAL: add chunk index metadata (useful for tracing)
for i, d in enumerate(chunks):
    d.metadata["chunk_id"] = f"C{i+1}"

# -------------------------
# 3) Embeddings
# -------------------------
embeddings = OpenAIEmbeddings()

# -------------------------
# 4) Vector DB: Chroma (with persistence)
# -------------------------
persist_dir = "./chroma_store"  # folder created locally
collection_name = "rag_demo"

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name=collection_name,
    persist_directory=persist_dir
)

# Persist to disk (so you can restart kernel and reuse)
vectorstore.persist()

# -------------------------
# 5) Retriever (basic)
# -------------------------
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# -------------------------
# 6) Prompt + LLM
# -------------------------
prompt = ChatPromptTemplate.from_template(
    """Answer using ONLY the context below.

Context:
{context}

Question:
{question}
"""
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# -------------------------
# 7) RAG chain (LCEL)
# -------------------------
rag_chain = (
    {"context": retriever, "question": lambda x: x}
    | prompt
    | llm
    | StrOutputParser()
)

print(rag_chain.invoke("What is Chroma used for in RAG?"))

  vectorstore.persist()


Chroma is a vector database that can run locally and persist embeddings, which can be used in RAG to store and retrieve relevant documents.


In [13]:
# Example: Retrieve only documents where topic="rag"

filtered_retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 2,
        "filter": {"topic": "rag"}   # metadata filter
    }
)

filtered_rag_chain = (
    {"context": filtered_retriever, "question": lambda x: x}
    | prompt
    | llm
    | StrOutputParser()
)

print(filtered_rag_chain.invoke("Explain RAG in one line."))


RAG retrieves relevant documents before answering using an LLM.


In [15]:
# Example: Retrieve only docs from source="lecture"

lecture_retriever = vectorstore.as_retriever(
    search_kwargs={"k": 2, "filter": {"source": "lecture"}}
)

lecture_chain = (
    {"context": lecture_retriever, "question": lambda x: x}
    | prompt
    | llm
    | StrOutputParser()
)

print(lecture_chain.invoke("Why do we use overlap?"))


We use overlap to help preserve meaning across chunk boundaries in long documents.
