## Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install google-generativeai python-dotenv numpy -q

## Step 2: Import Libraries

In [None]:
import os
from dotenv import load_dotenv
import google.generativeai as genai
import numpy as np
from typing import Dict, List, Tuple

## Step 3: Configure API Key

In [None]:
# Option 1: Direct input
GOOGLE_API_KEY = "YOUR_API_KEY_HERE"

genai.configure(api_key=GOOGLE_API_KEY)
print("‚úÖ API configured successfully")

In [None]:
# Option 2: Using Colab Secrets (uncomment to use)
# from google.colab import userdata
# GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=GOOGLE_API_KEY)
# print("‚úÖ API configured successfully using Colab Secrets")

## üìö What is RAG?

**Retrieval-Augmented Generation (RAG)** is a technique that:
1. **Retrieves** relevant information from a knowledge base
2. **Augments** the AI prompt with that information
3. **Generates** a response based on the retrieved context

### Why RAG?
- ‚úÖ Provides up-to-date information
- ‚úÖ Reduces hallucinations
- ‚úÖ Grounds responses in facts
- ‚úÖ Works with private/custom data
- ‚úÖ More accurate domain-specific answers

### Basic RAG Pipeline:
```
Query ‚Üí Retrieve Relevant Docs ‚Üí Combine with Query ‚Üí AI Generation ‚Üí Answer
```

## 1. Create Knowledge Base

Start with a simple dictionary-based knowledge base.

In [None]:
def create_knowledge_base() -> Dict[str, str]:
    """Create a simple knowledge base"""
    return {
        "Python Basics": "Python is a high-level programming language known for its simplicity and readability. It was created by Guido van Rossum in 1991.",
        "Machine Learning": "ML is a subset of AI that enables systems to learn from data without explicit programming. It uses algorithms to find patterns.",
        "Neural Networks": "Neural networks are computing systems inspired by biological neural networks in animal brains. They consist of interconnected nodes (neurons).",
        "Deep Learning": "Deep learning uses neural networks with multiple layers to learn from large amounts of data. It excels at image, speech, and text processing.",
        "Natural Language Processing": "NLP is a field of AI focused on enabling computers to understand, interpret, and generate human language.",
        "Computer Vision": "Computer vision enables computers to derive meaningful information from digital images and videos. It's used in facial recognition, self-driving cars."
    }

# Create the knowledge base
kb = create_knowledge_base()

print("üìö Knowledge Base Created:")
print("="*60)
for i, (topic, content) in enumerate(kb.items(), 1):
    print(f"\n{i}. {topic}:")
    print(f"   {content}")

## 2. Simple Text Embeddings

Convert text to numerical vectors for similarity comparison.

In [None]:
def simple_embedding(text: str, seed: int = None) -> np.ndarray:
    """Create a simple embedding (for demo purposes)"""
    # In production, use proper embeddings like sentence-transformers
    # This is just for demonstration
    if seed is not None:
        np.random.seed(hash(text) % 10000)
    return np.random.rand(10)

# Demo: Create embeddings
text1 = "What is Python?"
text2 = "Tell me about machine learning"

emb1 = simple_embedding(text1, seed=42)
emb2 = simple_embedding(text2, seed=42)

print("üî¢ Text Embeddings Demo:")
print("="*60)
print(f"\nText 1: {text1}")
print(f"Embedding: {emb1[:5]}... (showing first 5 dimensions)")
print(f"\nText 2: {text2}")
print(f"Embedding: {emb2[:5]}... (showing first 5 dimensions)")
print(f"\nEmbedding shape: {emb1.shape}")

### ‚ö†Ô∏è Note on Embeddings

This demo uses **random embeddings** for simplicity. In production:
- Use **Google's Embedding API** (`models/embedding-001`)
- Or use **sentence-transformers** models
- Or use **OpenAI embeddings**

Real embeddings capture semantic meaning!

## 3. Similarity Search

Find the most relevant document for a query.

In [None]:
def calculate_similarity(emb1: np.ndarray, emb2: np.ndarray) -> float:
    """Calculate cosine similarity between embeddings"""
    return np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))

def find_similar(query: str, knowledge_base: Dict[str, str], top_k: int = 1) -> List[Tuple[str, str, float]]:
    """Find most similar documents to query"""
    query_emb = simple_embedding(query, seed=42)
    
    similarities = []
    for topic, content in knowledge_base.items():
        content_emb = simple_embedding(content, seed=42)
        similarity = calculate_similarity(query_emb, content_emb)
        similarities.append((topic, content, similarity))
    
    # Sort by similarity (highest first)
    similarities.sort(key=lambda x: x[2], reverse=True)
    return similarities[:top_k]

# Demo: Find similar documents
query = "What is Python?"

print(f"üîç Similarity Search Demo:")
print("="*60)
print(f"\nQuery: {query}\n")

results = find_similar(query, kb, top_k=3)

print("Top 3 Most Relevant Documents:")
for i, (topic, content, score) in enumerate(results, 1):
    print(f"\n{i}. {topic} (similarity: {score:.4f})")
    print(f"   {content}")

## 4. Basic RAG Pipeline

Combine retrieval with generation.

In [None]:
def rag_query(query: str, knowledge_base: Dict[str, str], show_context: bool = True) -> str:
    """Execute RAG query"""
    # Step 1: Retrieve relevant context
    results = find_similar(query, knowledge_base, top_k=1)
    topic, relevant_context, score = results[0]
    
    if show_context:
        print(f"üìÑ Retrieved Context:")
        print(f"   Topic: {topic}")
        print(f"   Similarity: {score:.4f}")
        print(f"   Content: {relevant_context}")
        print()
    
    # Step 2: Create prompt with context
    model = genai.GenerativeModel('gemini-2.0-flash')
    prompt = f"""Context: {relevant_context}

Question: {query}

Answer based on the context provided:"""
    
    # Step 3: Generate response
    response = model.generate_content(prompt)
    return response.text

# Demo: RAG query
query = "What is Python?"

print(f"ü§ñ RAG Query Demo:")
print("="*60)
print(f"\n‚ùì Query: {query}\n")

answer = rag_query(query, kb)
print(f"‚úÖ Answer:\n{answer}")

## 5. Multiple RAG Queries

In [None]:
queries = [
    "What is Python?",
    "Tell me about machine learning",
    "How do neural networks work?",
    "Explain deep learning"
]

print("üîÑ Multiple RAG Queries:")
print("="*60)

for i, query in enumerate(queries, 1):
    print(f"\n{'='*60}")
    print(f"Query {i}/{len(queries)}: {query}")
    print('='*60)
    answer = rag_query(query, kb, show_context=True)
    print(f"\n‚úÖ Answer:\n{answer}\n")

## 6. RAG vs Non-RAG Comparison

See the difference between RAG and regular generation.

In [None]:
def compare_rag_vs_non_rag(query: str, knowledge_base: Dict[str, str]):
    """Compare RAG vs non-RAG responses"""
    model = genai.GenerativeModel('gemini-2.0-flash')
    
    print(f"‚ùì Query: {query}")
    print("="*60)
    
    # Non-RAG response
    print("\nüîµ WITHOUT RAG (No Context):")
    print("-"*60)
    response_no_rag = model.generate_content(query)
    print(response_no_rag.text)
    
    # RAG response
    print("\nüü¢ WITH RAG (Context-Based):")
    print("-"*60)
    results = find_similar(query, knowledge_base, top_k=1)
    topic, context, score = results[0]
    print(f"Retrieved: {topic} (score: {score:.4f})\n")
    
    prompt_rag = f"""Context: {context}

Question: {query}

Answer based on the context provided:"""
    response_rag = model.generate_content(prompt_rag)
    print(response_rag.text)

# Run comparison
print("‚öñÔ∏è RAG vs Non-RAG Comparison\n")
compare_rag_vs_non_rag("What is Python?", kb)

## 7. Multi-Document Retrieval

Retrieve multiple relevant documents.

In [None]:
def rag_multi_doc(query: str, knowledge_base: Dict[str, str], top_k: int = 2) -> str:
    """RAG with multiple documents"""
    # Retrieve top K documents
    results = find_similar(query, knowledge_base, top_k=top_k)
    
    print(f"üìö Retrieved {top_k} Documents:")
    contexts = []
    for i, (topic, content, score) in enumerate(results, 1):
        print(f"\n{i}. {topic} (score: {score:.4f})")
        print(f"   {content}")
        contexts.append(f"[{topic}]: {content}")
    
    # Combine contexts
    combined_context = "\n\n".join(contexts)
    
    # Generate response
    model = genai.GenerativeModel('gemini-2.0-flash')
    prompt = f"""Context from multiple sources:
{combined_context}

Question: {query}

Answer based on the contexts provided:"""
    
    response = model.generate_content(prompt)
    return response.text

# Demo
query = "How does AI learn?"

print(f"üîç Multi-Document RAG Demo:")
print("="*60)
print(f"\n‚ùì Query: {query}\n")

answer = rag_multi_doc(query, kb, top_k=3)
print(f"\n‚úÖ Answer:\n{answer}")

## 8. Custom Knowledge Base

Create your own knowledge base.

In [None]:
# Create a custom knowledge base about your company/product
my_knowledge_base = {
    "Product Overview": "Our AI chatbot helps businesses automate customer support. It integrates with existing systems and handles 80% of common queries.",
    "Pricing": "We offer three tiers: Starter ($99/mo), Professional ($299/mo), and Enterprise (custom pricing). All include 24/7 support.",
    "Integration": "Our platform integrates with Slack, Discord, Teams, and custom APIs. Setup takes less than 30 minutes with our guided onboarding.",
    "Security": "All data is encrypted at rest and in transit. We're SOC 2 Type II certified and GDPR compliant. Data is never used for training.",
    "Support": "Free tier includes email support. Paid tiers get priority support, dedicated account managers, and custom training sessions."
}

print("üè¢ Custom Knowledge Base:")
print("="*60)
for topic in my_knowledge_base.keys():
    print(f"  ‚Ä¢ {topic}")

# Test queries
customer_queries = [
    "How much does it cost?",
    "Can you integrate with Slack?",
    "Is my data secure?"
]

print("\nüí¨ Customer Support Bot Demo:")
print("="*60)

for query in customer_queries:
    print(f"\nüë§ Customer: {query}")
    answer = rag_query(query, my_knowledge_base, show_context=False)
    print(f"ü§ñ Bot: {answer}")
    print("-"*60)

## 9. Document Analysis

Analyze what's in your knowledge base.

In [None]:
def analyze_knowledge_base(kb: Dict[str, str]):
    """Analyze knowledge base statistics"""
    print("üìä Knowledge Base Analysis:")
    print("="*60)
    
    print(f"\nüìö Total Documents: {len(kb)}")
    
    # Word counts
    word_counts = {topic: len(content.split()) for topic, content in kb.items()}
    avg_words = sum(word_counts.values()) / len(word_counts)
    
    print(f"\nüìù Average Words per Document: {avg_words:.1f}")
    print("\nDocument Lengths:")
    for topic, count in sorted(word_counts.items(), key=lambda x: x[1], reverse=True):
        print(f"  ‚Ä¢ {topic}: {count} words")
    
    # Character counts
    total_chars = sum(len(content) for content in kb.values())
    print(f"\nüìè Total Characters: {total_chars:,}")
    
    return {
        'total_docs': len(kb),
        'avg_words': avg_words,
        'total_chars': total_chars
    }

# Analyze both knowledge bases
analyze_knowledge_base(kb)
print("\n" + "="*60 + "\n")
analyze_knowledge_base(my_knowledge_base)

## 10. Interactive RAG Chat

In [None]:
def rag_chat_session(knowledge_base: Dict[str, str], queries: List[str]):
    """Interactive RAG chat with history"""
    model = genai.GenerativeModel('gemini-2.0-flash')
    chat = model.start_chat(history=[])
    
    print("üí¨ RAG Chat Session:")
    print("="*60)
    print(f"Knowledge Base: {len(knowledge_base)} documents loaded\n")
    
    for i, query in enumerate(queries, 1):
        print(f"\n[Turn {i}]")
        print(f"üë§ You: {query}")
        
        # Retrieve context
        results = find_similar(query, knowledge_base, top_k=1)
        topic, context, score = results[0]
        
        # Add context to query
        context_query = f"""[Retrieved Context: {context}]

User question: {query}

Answer based on the context:"""
        
        response = chat.send_message(context_query)
        print(f"ü§ñ Assistant: {response.text}")
        print("-"*60)

# Demo chat
chat_queries = [
    "What is Python?",
    "Can you give me an example?",
    "What makes it different from other languages?"
]

rag_chat_session(kb, chat_queries)

## üéâ Summary

You've learned how to:
- ‚úÖ Create a knowledge base
- ‚úÖ Generate text embeddings
- ‚úÖ Perform similarity search
- ‚úÖ Build a basic RAG pipeline
- ‚úÖ Compare RAG vs non-RAG responses
- ‚úÖ Retrieve multiple documents
- ‚úÖ Create custom knowledge bases
- ‚úÖ Analyze document collections
- ‚úÖ Build RAG chat sessions

## üí° Key Concepts:

### RAG Pipeline:
```
1. User Query
   ‚Üì
2. Embed Query
   ‚Üì
3. Search Knowledge Base (similarity)
   ‚Üì
4. Retrieve Top K Documents
   ‚Üì
5. Combine Query + Context
   ‚Üì
6. Generate Response with AI
   ‚Üì
7. Return Answer
```

### Embeddings:
- Convert text to numerical vectors
- Capture semantic meaning
- Enable similarity comparison
- Essential for retrieval

### Similarity Search:
- Compare query embedding with document embeddings
- Use cosine similarity
- Return top K most similar
- Fast with proper indexing

## üìã RAG Template:

```python
# 1. Create knowledge base
kb = {...}

# 2. Find relevant context
context = find_similar(query, kb)

# 3. Create prompt with context
prompt = f"""Context: {context}
Question: {query}
Answer:"""

# 4. Generate response
response = model.generate_content(prompt)
```

## üéØ Best Practices:

1. **Chunking**: Break large documents into smaller chunks
2. **Quality Data**: Ensure knowledge base is accurate
3. **Proper Embeddings**: Use production-grade models
4. **Top K**: Retrieve 1-3 most relevant docs
5. **Context Length**: Stay within model limits
6. **Citations**: Track which docs were used
7. **Refresh**: Update knowledge base regularly
8. **Fallback**: Handle no-match scenarios

## ‚ö†Ô∏è Limitations of This Demo:

This demo uses **random embeddings** for simplicity. For production:

### Use Real Embeddings:
```python
# Google's Embedding API
result = genai.embed_content(
    model="models/embedding-001",
    content=text
)
embedding = result['embedding']
```

### Or Sentence Transformers:
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode(text)
```

## üöÄ Real-World RAG Use Cases:

1. **Customer Support**: Answer based on documentation
2. **Internal Knowledge**: Company wikis, policies
3. **Legal**: Search case law, contracts
4. **Medical**: Research papers, patient data
5. **Education**: Course materials, textbooks
6. **Code**: Search codebases, documentation
7. **E-commerce**: Product catalogs, reviews
8. **News**: Search articles, archives

## üìä RAG Metrics:

Evaluate your RAG system:
- **Retrieval Accuracy**: Did it find the right docs?
- **Answer Relevance**: Is the answer on-topic?
- **Faithfulness**: Is it based on the context?
- **Context Utilization**: Did it use the retrieved info?
- **Response Quality**: Is it clear and helpful?

## üîÑ RAG Improvements:

1. **Better Embeddings**: Use semantic models
2. **Vector Databases**: Pinecone, Weaviate, ChromaDB
3. **Hybrid Search**: Combine keyword + semantic
4. **Reranking**: Score and reorder results
5. **Query Expansion**: Rephrase for better retrieval
6. **Metadata Filtering**: Filter by date, category
7. **Multi-hop**: Chain multiple retrievals

## Next Steps:
1. Experiment with your own documents
2. Try different retrieval strategies
3. Measure RAG performance
4. Move on to lesson 10 (RAG with Pinecone vector database)