# RAG (Retrieval-Augmented Generation) Demo

This notebook demonstrates how to use the RAG pipeline to:
1. Ingest documents and create vector embeddings
2. Query the index to retrieve relevant documents
3. Use an LLM to generate answers based on retrieved context

## Setup

In [None]:
import sys
from pathlib import Path

# Add project root to path
PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")

In [None]:
# Install required packages if needed
# !pip install sentence-transformers faiss-cpu openai

## 1. Document Ingestion

Ingest documents from `data/docs/` and create vector index.

In [None]:
from services.rag.ingest_vectors import VectorIngester

# Initialize ingester
ingester = VectorIngester(
    docs_dir=PROJECT_ROOT / "data" / "docs",
    index_dir=PROJECT_ROOT / "data" / "vector_index",
    model_name="all-MiniLM-L6-v2",
    chunk_size=512,
    chunk_overlap=50,
)

# Run ingestion
num_chunks = ingester.ingest()
print(f"\n‚úì Ingested {num_chunks} document chunks")

## 2. Query the Index

Retrieve relevant documents for a query.

In [None]:
from services.rag.retriever import retrieve

# Test query
query = "What is machine learning?"
results = retrieve(query, top_k=3)

print(f"Query: {query}\n")
print("=" * 60)
for i, result in enumerate(results, 1):
    print(f"\n[{i}] Score: {result['score']:.4f}")
    print(f"    Source: {result['source']}")
    print(f"    Content: {result['content'][:200]}...")

In [None]:
# Another query
query = "How does RAG work?"
results = retrieve(query, top_k=3)

print(f"Query: {query}\n")
print("=" * 60)
for i, result in enumerate(results, 1):
    print(f"\n[{i}] Score: {result['score']:.4f}")
    print(f"    Source: {result['source']}")
    print(f"    Content: {result['content'][:200]}...")

## 3. RAG with LLM

Combine retrieval with an LLM to generate grounded answers.

In [None]:
def build_rag_prompt(query: str, context_docs: list, max_context_length: int = 2000) -> str:
    """
    Build a RAG prompt with retrieved context.
    
    Args:
        query: User's question
        context_docs: Retrieved documents
        max_context_length: Maximum context characters
        
    Returns:
        Formatted prompt string
    """
    # Build context section
    context_parts = []
    total_length = 0
    
    for doc in context_docs:
        content = doc['content']
        if total_length + len(content) > max_context_length:
            content = content[:max_context_length - total_length]
        context_parts.append(f"[Source: {doc['source']}]\n{content}")
        total_length += len(content)
        if total_length >= max_context_length:
            break
    
    context = "\n\n---\n\n".join(context_parts)
    
    prompt = f"""You are a helpful assistant. Answer the user's question based on the provided context.
If the context doesn't contain relevant information, say so.

CONTEXT:
{context}

USER QUESTION: {query}

ANSWER:"""
    
    return prompt

### Option A: Using OpenAI API

In [None]:
import os

def rag_query_openai(query: str, top_k: int = 3) -> str:
    """
    RAG query using OpenAI API.
    
    Requires: OPENAI_API_KEY environment variable
    """
    try:
        from openai import OpenAI
    except ImportError:
        return "OpenAI package not installed. Run: pip install openai"
    
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        return "OPENAI_API_KEY not set. Set it with: export OPENAI_API_KEY=your_key"
    
    # Retrieve documents
    results = retrieve(query, top_k=top_k)
    
    # Build prompt
    prompt = build_rag_prompt(query, results)
    
    # Query LLM
    client = OpenAI(api_key=api_key)
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "user", "content": prompt}
        ],
        max_tokens=500,
        temperature=0.7,
    )
    
    return response.choices[0].message.content

# Example (uncomment to run if you have an API key)
# answer = rag_query_openai("What are the types of machine learning?")
# print(answer)

### Option B: Using Local LLM (Ollama)

In [None]:
import requests

def rag_query_ollama(
    query: str, 
    top_k: int = 3, 
    model: str = "llama2",
    base_url: str = "http://localhost:11434"
) -> str:
    """
    RAG query using Ollama local LLM.
    
    Requires: Ollama running locally with a model pulled
    """
    # Retrieve documents
    results = retrieve(query, top_k=top_k)
    
    # Build prompt
    prompt = build_rag_prompt(query, results)
    
    try:
        response = requests.post(
            f"{base_url}/api/generate",
            json={
                "model": model,
                "prompt": prompt,
                "stream": False,
            },
            timeout=60,
        )
        response.raise_for_status()
        return response.json().get("response", "No response")
    except requests.exceptions.ConnectionError:
        return "Ollama not running. Start it with: ollama serve"
    except Exception as e:
        return f"Error: {e}"

# Example (uncomment to run if you have Ollama)
# answer = rag_query_ollama("What are the components of the enterprise AI system?")
# print(answer)

### Option C: Simple Template-Based Response (No LLM)

In [None]:
def rag_query_simple(query: str, top_k: int = 3) -> str:
    """
    Simple RAG query without LLM - just formats retrieved results.
    Useful for testing the retrieval pipeline.
    """
    results = retrieve(query, top_k=top_k)
    
    if not results:
        return f"No relevant documents found for: {query}"
    
    response = f"Based on the knowledge base, here's what I found about '{query}':\n\n"
    
    for i, doc in enumerate(results, 1):
        response += f"**Finding {i}** (Score: {doc['score']:.2f}, Source: {doc['source']})\n"
        response += f"{doc['content'][:300]}...\n\n"
    
    return response

# Test
answer = rag_query_simple("What is RAG and how does it work?")
print(answer)

## 4. Interactive RAG Demo

In [None]:
# Interactive query function
def ask(question: str, use_llm: str = "simple") -> None:
    """
    Interactive query interface.
    
    Args:
        question: Your question
        use_llm: "simple", "openai", or "ollama"
    """
    print(f"\nüîç Question: {question}\n")
    print("=" * 60)
    
    if use_llm == "openai":
        answer = rag_query_openai(question)
    elif use_llm == "ollama":
        answer = rag_query_ollama(question)
    else:
        answer = rag_query_simple(question)
    
    print(f"\nüí° Answer:\n{answer}")

In [None]:
# Try some questions!
ask("What are the key algorithms in machine learning?")

In [None]:
ask("How does the feature store work in this system?")

In [None]:
ask("What are the benefits of using RAG?")

## 5. API Usage Example

Query the retriever API (requires running the server first).

In [None]:
# Start the API server in a terminal:
# uvicorn services.rag.retriever:app --port 8002 --reload

import requests

def query_api(query: str, top_k: int = 3, base_url: str = "http://localhost:8002"):
    """Query the retriever API."""
    try:
        response = requests.post(
            f"{base_url}/retrieve",
            json={"query": query, "top_k": top_k},
            timeout=10,
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.ConnectionError:
        return {"error": "API not running. Start it with: uvicorn services.rag.retriever:app --port 8002"}

# Example (uncomment when API is running)
# result = query_api("What is supervised learning?")
# print(result)

## Summary

This notebook demonstrated:

1. **Document Ingestion**: Loading docs, chunking, and creating embeddings
2. **Vector Search**: Finding relevant documents using semantic similarity
3. **RAG Pipeline**: Combining retrieval with LLM for grounded responses
4. **API Integration**: Using the FastAPI retriever endpoint

### Next Steps

- Add more documents to `data/docs/`
- Fine-tune chunk size and overlap for your content
- Experiment with different embedding models
- Connect to a production LLM for better responses