# Phase 3: Building the RAG System

In this notebook, you'll learn how to:
1. Store embeddings in a vector database (ChromaDB)
2. Retrieve relevant documents based on a query
3. Build a complete RAG chain that answers questions using your documents

In [None]:
# Add the project root to the path
import sys
sys.path.insert(0, '../..')

## 1. Creating a Vector Store

ChromaDB is a simple, local vector database that stores embeddings and allows fast similarity search.

In [None]:
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
from src.document_loader import load_and_split

# Load and split documents
chunks = load_and_split()

# Initialize embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")

# Create vector store from documents
# This embeds all chunks and stores them
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="learning_demo"
)

print(f"Created vector store with {len(chunks)} chunks")

## 2. Similarity Search

The vector store can find documents similar to a query by comparing embedding vectors.

In [None]:
# Search for documents similar to a query
query = "What is supervised learning?"
results = vectorstore.similarity_search(query, k=3)

print(f"Query: {query}")
print(f"\nFound {len(results)} relevant chunks:\n")

for i, doc in enumerate(results):
    print(f"--- Result {i+1} ---")
    print(doc.page_content[:300])
    print()

In [None]:
# Search with scores to see similarity values
results_with_scores = vectorstore.similarity_search_with_score(query, k=3)

print("Results with similarity scores:")
print("(Lower score = more similar for ChromaDB)\n")

for doc, score in results_with_scores:
    print(f"Score: {score:.4f}")
    print(f"Content: {doc.page_content[:100]}...\n")

## 3. Creating a Retriever

A retriever is a wrapper around the vector store that's designed to work with LangChain chains.

In [None]:
# Create a retriever from the vector store
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Use the retriever
docs = retriever.invoke("What are the types of machine learning?")

print(f"Retrieved {len(docs)} documents")
for doc in docs:
    print(f"- {doc.page_content[:80]}...")

## 4. Building the RAG Chain

Now we combine the retriever with an LLM to create a complete RAG system.

In [None]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

# Initialize the LLM
llm = ChatOllama(model="llama3.2:3b")

# Create the RAG prompt
rag_prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant. Answer the question based ONLY on the following context.
If the context doesn't contain the answer, say "I don't have enough information."

Context:
{context}

Question: {question}

Answer:""")

# Helper function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build the RAG chain
rag_chain = (
    RunnableParallel(
        context=retriever | format_docs,
        question=RunnablePassthrough()
    )
    | rag_prompt
    | llm
    | StrOutputParser()
)

print("RAG chain created!")

In [None]:
# Test the RAG chain
question = "What is supervised learning and what are some examples?"

print(f"Question: {question}")
print("\n" + "="*50 + "\n")

answer = rag_chain.invoke(question)
print(f"Answer:\n{answer}")

In [None]:
# Try more questions
questions = [
    "What is reinforcement learning used for?",
    "What is overfitting?",
    "What is the capital of France?"  # Not in our documents!
]

for q in questions:
    print(f"Q: {q}")
    print(f"A: {rag_chain.invoke(q)}")
    print("-" * 50)

## 5. Using Our Project Modules

Our `src/` modules wrap all this functionality for easy reuse.

In [None]:
from src.vectorstore import get_or_create_vectorstore
from src.rag_chain import create_rag_chain, query_rag

# Get or create the vector store (uses default persist location)
vs = get_or_create_vectorstore(chunks)

# Create RAG chain
chain = create_rag_chain(vs)

# Query using our helper function
answer = query_rag(chain, "What are the three types of machine learning?", verbose=True)

## Key Takeaways

1. **Vector Stores** (ChromaDB) store embeddings for fast similarity search
2. **Similarity Search** finds documents semantically similar to a query
3. **Retrievers** are chain-compatible wrappers around vector stores
4. **RAG Chain** = Retriever → Format Context → Prompt → LLM → Parser
5. The LLM answers are **grounded** in the retrieved documents

## Next Steps

In the final phase, you'll add:
- A CLI interface for easy interaction
- Conversation memory for follow-up questions
- Source citations to show which documents were used