# Retrieval Augmented Generation (RAG) with Gemini

This notebook demonstrates how to build a simple RAG system using:
- **FAISS** for fast vector similarity search
- **TF-IDF** for document vectorization
- **Google Gemini** for generating answers based on retrieved context

**Free Tier:** No credit card required - just a Google account!

## Step 1: Install Dependencies

In [None]:
%pip install google-generativeai python-dotenv faiss-cpu scikit-learn

## Step 2: Set Up the Knowledge Base and Retriever

In [None]:
import faiss
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer

# Sample documents (small knowledge base)
documents = [
    "Python is a programming language known for its simplicity and readability.",
    "The capital of France is Paris, known for the Eiffel Tower.",
    "Gemini is Google's most capable AI model, available through Google AI Studio.",
    "The Great Wall of China is a historic landmark spanning over 13,000 miles.",
    "Machine learning is a subset of artificial intelligence that learns from data.",
    "DocuSign is a company that provides electronic signature technology.",
    "RAG stands for Retrieval Augmented Generation, a technique to enhance LLM responses."
]

# Convert documents into TF-IDF vectors
vectorizer = TfidfVectorizer()
document_vectors = vectorizer.fit_transform(documents).toarray()

# Index the document vectors using FAISS
index = faiss.IndexFlatL2(document_vectors.shape[1])
index.add(np.array(document_vectors).astype(np.float32))

print(f"Indexed {len(documents)} documents")
print(f"Vector dimension: {document_vectors.shape[1]}")

In [None]:
def retrieve(query, top_n=1):
    """Retrieve the most relevant documents for a query."""
    query_vector = vectorizer.transform([query]).toarray()
    distances, indices = index.search(query_vector.astype(np.float32), top_n)
    return [documents[i] for i in indices[0]]

# Test retrieval
test_query = "What is the capital of France?"
retrieved = retrieve(test_query)
print(f"Query: {test_query}")
print(f"Retrieved: {retrieved}")

## Step 3: Set Up Gemini Client

In [None]:
import os
from dotenv import load_dotenv
import google.generativeai as genai

# Load environment variables
load_dotenv()

# Configure the Gemini API
api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=api_key)

# Create model with system instruction for RAG
model = genai.GenerativeModel(
    "gemini-2.0-flash",
    system_instruction="You are an AI assistant that answers questions based only on the provided context. If the context doesn't contain enough information to answer, say so."
)

print(f"API key configured: {'Yes' if api_key else 'No - please set GOOGLE_API_KEY'}")

## Step 4: Create the RAG Pipeline

In [None]:
def rag_pipeline(query, top_n=1):
    """
    Run the full RAG pipeline: retrieve context and generate answer.
    """
    # Step 1: Retrieve relevant documents
    retrieved_docs = retrieve(query, top_n=top_n)
    context = " ".join(retrieved_docs)
    print(f"Retrieved Document(s): {context}")
    print("-" * 50)
    
    # Step 2: Create the prompt with context
    user_prompt = f"""Based on the following context, answer the question.

Context: {context}

Question: {query}

Answer:"""
    
    # Step 3: Generate answer using Gemini
    generation_config = genai.GenerationConfig(
        temperature=0.3,
        max_output_tokens=200
    )
    
    response = model.generate_content(
        user_prompt,
        generation_config=generation_config
    )
    
    return response.text

## Step 5: Test the RAG System

In [None]:
# Test 1: Question with relevant context
query1 = "What is the capital of France?"
print(f"Question: {query1}\n")
response1 = rag_pipeline(query1)
print(f"\nAnswer: {response1}")

In [None]:
# Test 2: Question about technology
query2 = "What is Python?"
print(f"Question: {query2}\n")
response2 = rag_pipeline(query2)
print(f"\nAnswer: {response2}")

In [None]:
# Test 3: Question about Gemini
query3 = "What is Gemini?"
print(f"Question: {query3}\n")
response3 = rag_pipeline(query3)
print(f"\nAnswer: {response3}")

In [None]:
# Test 4: Question NOT in knowledge base
query4 = "What is the population of Tokyo?"
print(f"Question: {query4}\n")
response4 = rag_pipeline(query4)
print(f"\nAnswer: {response4}")

## Step 6: Compare RAG vs Direct Query

In [None]:
def direct_query(query):
    """Ask Gemini directly without RAG context."""
    general_model = genai.GenerativeModel("gemini-2.0-flash")
    response = general_model.generate_content(query)
    return response.text

# Compare responses
test_question = "What does DocuSign do?"

print("=" * 50)
print("DIRECT QUERY (no RAG)")
print("=" * 50)
print(direct_query(test_question))

print("\n" + "=" * 50)
print("RAG QUERY (with retrieved context)")
print("=" * 50)
print(rag_pipeline(test_question))

## Summary

| Component | Purpose |
|-----------|----------|
| **TF-IDF** | Converts text to numerical vectors |
| **FAISS** | Fast similarity search for retrieval |
| **Gemini** | Generates natural language answers |
| **System Instruction** | Constrains answers to provided context |

### API Comparison:

| Feature | OpenAI | Claude | Gemini |
|---------|--------|--------|--------|
| Generate | `ChatCompletion.create()` | `messages.create()` | `generate_content()` |
| System msg | In messages | `system` param | `system_instruction` |
| Free tier | No | No | **Yes!** |