# LangChain RAG with ChromaDB - Interactive Tutorial

This notebook demonstrates how to use the ChromaDB RAG system interactively.

## Setup

First, let's import the necessary modules and set up our environment.

In [None]:
import os
from dotenv import load_dotenv
from rag_chromadb import ChromaDBRAG

# Load environment variables
load_dotenv()

# Check if API key is set
if not os.getenv("OPENAI_API_KEY"):
    print("⚠️  Warning: OPENAI_API_KEY not found. Please set it in .env file")
else:
    print("✅ OpenAI API key found")

## Step 1: Initialize the RAG System

In [None]:
# Initialize ChromaDB RAG
rag = ChromaDBRAG(
    persist_directory="./notebook_chroma_db",
    collection_name="notebook_collection"
)
print("RAG system initialized!")

## Step 2: Load and Process Documents

In [None]:
# Load documents from sample_documents directory
documents = rag.load_documents("./sample_documents", glob_pattern="**/*.txt")

In [None]:
# Split documents into chunks
chunks = rag.split_documents(documents, chunk_size=500, chunk_overlap=50)
print(f"Total chunks: {len(chunks)}")

## Step 3: Create Vector Store

In [None]:
# Create and persist the vector store
rag.create_vectorstore(chunks)

## Step 4: Similarity Search

Let's search for documents similar to a query without using the LLM.

In [None]:
# Perform similarity search
query = "What is machine learning?"
results = rag.similarity_search(query, k=3)

print(f"Query: {query}\n")
for i, doc in enumerate(results, 1):
    print(f"--- Result {i} ---")
    print(doc.page_content[:300] + "...\n")

## Step 5: Set Up Question Answering

In [None]:
# Set up the QA chain
rag.setup_qa_chain(llm_model="gpt-3.5-turbo", temperature=0)

## Step 6: Ask Questions

In [None]:
# Ask a question
question = "What is artificial intelligence?"
response = rag.query(question)

print(f"Question: {question}\n")
print(f"Answer: {response['result']}\n")
print(f"Sources: {len(response['source_documents'])} documents")

In [None]:
# Try more questions
questions = [
    "What are the types of machine learning?",
    "What is deep learning?",
    "What are neural networks used for?"
]

for q in questions:
    print(f"\n{'='*50}")
    print(f"Q: {q}")
    response = rag.query(q)
    print(f"A: {response['result']}")

## Step 7: View Source Documents

Let's see what documents were used to generate the answer.

In [None]:
question = "Explain supervised learning"
response = rag.query(question)

print(f"Question: {question}\n")
print(f"Answer: {response['result']}\n")
print("\nSource Documents:")
for i, doc in enumerate(response['source_documents'], 1):
    print(f"\n--- Source {i} ---")
    print(doc.page_content[:200] + "...")

## Step 8: Using Custom Prompts

In [None]:
custom_prompt = """You are a helpful AI teacher. Use the following context to answer the question.
If you don't know the answer, say so clearly. Provide a simple explanation suitable for beginners.

Context: {context}

Question: {question}

Answer:"""

question = "What is a neural network?"
response = rag.query_with_custom_prompt(question, custom_prompt)

print(f"Question: {question}\n")
print(f"Answer: {response['result']}")

## Step 9: Loading Existing Vector Store

If you've already created a vector store, you can load it instead of creating a new one.

In [None]:
# Create a new RAG instance
rag_reload = ChromaDBRAG(persist_directory="./notebook_chroma_db")

# Load the existing vector store
rag_reload.load_vectorstore()

# Set up QA chain
rag_reload.setup_qa_chain()

# Query immediately
response = rag_reload.query("What are the applications of AI?")
print(response['result'])

## Experiment on Your Own!

Try asking your own questions or loading your own documents.

In [None]:
# Your experiments here
my_question = "Your question here"
response = rag.query(my_question)
print(response['result'])