## Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install google-generativeai python-dotenv pinecone-client sentence-transformers -q

## Step 2: Import Libraries

In [None]:
import os
from dotenv import load_dotenv
import google.generativeai as genai
from typing import List, Dict, Tuple
import time

## Step 3: Configure API Keys

In [None]:
# Option 1: Direct input
GOOGLE_API_KEY = "YOUR_GOOGLE_API_KEY_HERE"
# PINECONE_API_KEY = "YOUR_PINECONE_API_KEY_HERE"  # Optional: For real Pinecone

genai.configure(api_key=GOOGLE_API_KEY)
print("‚úÖ Google API configured successfully")

In [None]:
# Option 2: Using Colab Secrets (uncomment to use)
# from google.colab import userdata
# GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
# PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
# genai.configure(api_key=GOOGLE_API_KEY)
# print("‚úÖ API configured successfully using Colab Secrets")

## üìä What is a Vector Database?

A **vector database** is a specialized database for storing and searching embeddings.

### Key Features:
- ‚úÖ **Fast Similarity Search**: Find similar vectors in milliseconds
- ‚úÖ **Scalability**: Handle millions/billions of vectors
- ‚úÖ **Metadata Filtering**: Combine semantic + traditional search
- ‚úÖ **Real-time Updates**: Add/delete vectors on the fly
- ‚úÖ **Production-Ready**: Built for high-traffic applications

### Popular Vector Databases:
- **Pinecone**: Fully managed, easy to use
- **Weaviate**: Open-source, flexible
- **ChromaDB**: Lightweight, embeddable
- **Qdrant**: Fast, open-source
- **Milvus**: Scalable, open-source

### Why Vector DB vs Basic Search?
```
Basic In-Memory Search:
- Limited to small datasets
- Slow with many documents
- No persistence
- Linear search O(n)

Vector Database:
- Millions of documents
- Fast approximate search
- Persistent storage
- Sublinear search O(log n)
```

## 1. Simple Vector Database Simulator

First, let's create a simple vector DB for demonstration.

In [None]:
class SimpleVectorDB:
    """Simple in-memory vector database for demonstration"""
    
    def __init__(self, name: str = "demo-index"):
        self.name = name
        self.documents = []
        self.embeddings = []
        self.metadata = []
        print(f"‚úÖ Created vector database: {name}")
    
    def add_document(self, doc_id: str, text: str, embedding: List[float], metadata: Dict = None):
        """Add document with embedding"""
        self.documents.append({"id": doc_id, "text": text})
        self.embeddings.append(embedding)
        self.metadata.append(metadata or {})
        print(f"  ‚úì Added: {doc_id}")
    
    def search(self, query_embedding: List[float], top_k: int = 3, filter_metadata: Dict = None) -> List[Dict]:
        """Search for similar documents"""
        import numpy as np
        
        # Calculate similarities (simplified)
        similarities = []
        for i, doc_emb in enumerate(self.embeddings):
            # Apply metadata filter if specified
            if filter_metadata:
                match = all(self.metadata[i].get(k) == v for k, v in filter_metadata.items())
                if not match:
                    continue
            
            # Cosine similarity
            query_arr = np.array(query_embedding)
            doc_arr = np.array(doc_emb)
            similarity = np.dot(query_arr, doc_arr) / (np.linalg.norm(query_arr) * np.linalg.norm(doc_arr))
            similarities.append((i, similarity))
        
        # Sort by similarity
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Return top K
        results = []
        for i, score in similarities[:top_k]:
            results.append({
                "id": self.documents[i]["id"],
                "text": self.documents[i]["text"],
                "score": float(score),
                "metadata": self.metadata[i]
            })
        return results
    
    def stats(self):
        """Get database statistics"""
        print(f"\nüìä Database Stats:")
        print(f"  Name: {self.name}")
        print(f"  Documents: {len(self.documents)}")
        print(f"  Embedding dimension: {len(self.embeddings[0]) if self.embeddings else 0}")

# Create database
db = SimpleVectorDB("tech-docs")
print("\nüéâ Vector database ready!")

## 2. Generate Real Embeddings

Use Google's embedding model for semantic understanding.

In [None]:
def get_embedding(text: str) -> List[float]:
    """Get embedding using Google's API"""
    result = genai.embed_content(
        model="models/embedding-001",
        content=text,
        task_type="retrieval_document"
    )
    return result['embedding']

def get_query_embedding(text: str) -> List[float]:
    """Get embedding for search query"""
    result = genai.embed_content(
        model="models/embedding-001",
        content=text,
        task_type="retrieval_query"
    )
    return result['embedding']

# Test embeddings
print("üî¢ Testing Google Embeddings:\n")
test_text = "What is machine learning?"
embedding = get_embedding(test_text)

print(f"Text: {test_text}")
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
print("\n‚úÖ Embeddings working!")

## 3. Create Knowledge Base with Embeddings

In [None]:
def create_vector_store():
    """Create vector store with documents"""
    
    documents = [
        {
            "id": "doc1",
            "text": "Python is a versatile programming language used for web development, data science, and automation. It's known for its simplicity and readability.",
            "metadata": {"category": "programming", "language": "python", "level": "beginner"}
        },
        {
            "id": "doc2",
            "text": "Machine learning algorithms can learn patterns from data to make predictions. They improve automatically through experience without explicit programming.",
            "metadata": {"category": "ai", "topic": "machine-learning", "level": "intermediate"}
        },
        {
            "id": "doc3",
            "text": "Neural networks consist of layers of interconnected nodes that process information. They're inspired by biological neural networks in animal brains.",
            "metadata": {"category": "ai", "topic": "neural-networks", "level": "advanced"}
        },
        {
            "id": "doc4",
            "text": "Deep learning models require large amounts of data and computational power. They use neural networks with multiple hidden layers to learn complex patterns.",
            "metadata": {"category": "ai", "topic": "deep-learning", "level": "advanced"}
        },
        {
            "id": "doc5",
            "text": "Natural language processing enables computers to understand human language. NLP powers chatbots, translation, and sentiment analysis.",
            "metadata": {"category": "ai", "topic": "nlp", "level": "intermediate"}
        },
        {
            "id": "doc6",
            "text": "JavaScript is the programming language of the web. It runs in browsers and enables interactive web pages and dynamic user interfaces.",
            "metadata": {"category": "programming", "language": "javascript", "level": "beginner"}
        },
        {
            "id": "doc7",
            "text": "Data visualization helps communicate insights from data. Tools like matplotlib, seaborn, and plotly create charts and graphs.",
            "metadata": {"category": "data-science", "topic": "visualization", "level": "beginner"}
        }
    ]
    
    print("üìö Adding documents to vector database:\n")
    print("="*60)
    
    for doc in documents:
        # Get embedding
        embedding = get_embedding(doc["text"])
        
        # Add to database
        db.add_document(
            doc_id=doc["id"],
            text=doc["text"],
            embedding=embedding,
            metadata=doc["metadata"]
        )
        time.sleep(0.1)  # Rate limiting
    
    print("\n‚úÖ Vector store created!")
    db.stats()

# Create the vector store
create_vector_store()

## 4. Semantic Search

In [None]:
def semantic_search(query: str, top_k: int = 3, filter_metadata: Dict = None):
    """Perform semantic search"""
    print(f"üîç Searching for: '{query}'")
    if filter_metadata:
        print(f"üìã Filters: {filter_metadata}")
    print("="*60)
    
    # Get query embedding
    query_embedding = get_query_embedding(query)
    
    # Search
    results = db.search(query_embedding, top_k=top_k, filter_metadata=filter_metadata)
    
    print(f"\nüìÑ Found {len(results)} results:\n")
    for i, result in enumerate(results, 1):
        print(f"{i}. [{result['id']}] (score: {result['score']:.4f})")
        print(f"   {result['text']}")
        print(f"   Metadata: {result['metadata']}")
        print()
    
    return results

# Test semantic search
semantic_search("What is Python used for?", top_k=3)

## 5. RAG with Vector Database

In [None]:
def rag_with_vector_db(query: str, top_k: int = 2, filter_metadata: Dict = None, show_context: bool = True) -> str:
    """RAG query using vector database"""
    
    # Step 1: Retrieve
    query_embedding = get_query_embedding(query)
    results = db.search(query_embedding, top_k=top_k, filter_metadata=filter_metadata)
    
    if show_context:
        print("üìö Retrieved Context:")
        for i, result in enumerate(results, 1):
            print(f"\n  {i}. [{result['id']}] (score: {result['score']:.4f})")
            print(f"     {result['text'][:100]}...")
        print()
    
    # Step 2: Combine contexts
    context = "\n\n".join([f"[{r['id']}]: {r['text']}" for r in results])
    
    # Step 3: Generate
    model = genai.GenerativeModel('gemini-2.0-flash')
    prompt = f"""Use the following context to answer the question.

Context:
{context}

Question: {query}

Answer based on the context:"""
    
    response = model.generate_content(prompt)
    return response.text

# Test RAG
print("ü§ñ RAG Query Demo:")
print("="*60)
query = "What is Python used for?"
print(f"\n‚ùì Query: {query}\n")

answer = rag_with_vector_db(query, top_k=2)
print(f"‚úÖ Answer:\n{answer}")

## 6. Multiple RAG Queries

In [None]:
queries = [
    "How do neural networks work?",
    "What is machine learning?",
    "Explain natural language processing",
    "What programming languages are popular?"
]

print("üîÑ Multiple RAG Queries:")
print("="*60)

for i, query in enumerate(queries, 1):
    print(f"\n{'='*60}")
    print(f"Query {i}/{len(queries)}: {query}")
    print('='*60)
    answer = rag_with_vector_db(query, top_k=2, show_context=False)
    print(f"\n‚úÖ Answer: {answer}\n")

## 7. Metadata Filtering

Filter search results by metadata.

In [None]:
# Search only in AI category
print("üéØ Filtered Search Demo:\n")
print("="*60)

query = "How does learning work?"
print(f"\n‚ùì Query: {query}")
print("üìã Filter: category='ai'\n")

results = semantic_search(query, top_k=3, filter_metadata={"category": "ai"})

In [None]:
# Search only beginner-level content
print("\n" + "="*60)
print("\n‚ùì Query: Tell me about programming")
print("üìã Filter: level='beginner'\n")

results = semantic_search(
    "Tell me about programming",
    top_k=3,
    filter_metadata={"level": "beginner"}
)

In [None]:
# RAG with filtering
print("\n" + "="*60)
print("\nü§ñ RAG with Metadata Filter\n")

query = "Explain AI concepts"
print(f"‚ùì Query: {query}")
print("üìã Filter: category='ai', level='advanced'\n")

answer = rag_with_vector_db(
    query,
    top_k=2,
    filter_metadata={"category": "ai", "level": "advanced"},
    show_context=True
)
print(f"\n‚úÖ Answer:\n{answer}")

## 8. Semantic Search Quality Test

In [None]:
# Test semantic understanding
test_queries = [
    ("How do I make websites interactive?", "JavaScript"),
    ("What helps computers understand speech?", "NLP"),
    ("How can I visualize my data?", "Data visualization"),
    ("What's good for beginners in coding?", "Python or JavaScript")
]

print("üß™ Semantic Understanding Test:")
print("="*60)

for query, expected in test_queries:
    print(f"\n‚ùì Query: {query}")
    print(f"üéØ Expected: {expected}")
    
    query_embedding = get_query_embedding(query)
    results = db.search(query_embedding, top_k=1)
    
    if results:
        print(f"‚úÖ Found: [{results[0]['id']}] (score: {results[0]['score']:.4f})")
        print(f"   {results[0]['text'][:80]}...")
    print("-"*60)

## 9. Performance Comparison

In [None]:
import time

def benchmark_search(num_queries: int = 5):
    """Benchmark search performance"""
    test_queries = [
        "What is Python?",
        "How does machine learning work?",
        "Explain neural networks",
        "What is data visualization?",
        "Tell me about web development"
    ]
    
    print("‚ö° Performance Benchmark:")
    print("="*60)
    
    total_time = 0
    for i, query in enumerate(test_queries[:num_queries], 1):
        start = time.time()
        query_embedding = get_query_embedding(query)
        results = db.search(query_embedding, top_k=3)
        elapsed = time.time() - start
        total_time += elapsed
        
        print(f"\nQuery {i}: {elapsed*1000:.2f}ms")
        print(f"  Found {len(results)} results")
    
    avg_time = (total_time / num_queries) * 1000
    print(f"\nüìä Average search time: {avg_time:.2f}ms")
    print(f"üìä Total time: {total_time*1000:.2f}ms")

benchmark_search()

## 10. Complete RAG Chatbot

In [None]:
def rag_chatbot(queries: List[str], top_k: int = 2):
    """Interactive RAG chatbot"""
    model = genai.GenerativeModel('gemini-2.0-flash')
    chat = model.start_chat(history=[])
    
    print("ü§ñ RAG Chatbot Session:")
    print("="*60)
    print(f"Vector Database: {db.name}")
    print(f"Documents loaded: {len(db.documents)}\n")
    
    for i, query in enumerate(queries, 1):
        print(f"\n[Turn {i}]")
        print(f"üë§ You: {query}")
        
        # Retrieve context
        query_embedding = get_query_embedding(query)
        results = db.search(query_embedding, top_k=top_k)
        context = "\n".join([r['text'] for r in results])
        
        # Create contextualized query
        context_query = f"""[Context from knowledge base: {context}]

User question: {query}

Answer based on the context:"""
        
        response = chat.send_message(context_query)
        print(f"ü§ñ Assistant: {response.text}")
        print("-"*60)

# Demo chatbot
chat_queries = [
    "What is Python?",
    "Can you give me more details?",
    "What about machine learning?",
    "How are they related?"
]

rag_chatbot(chat_queries, top_k=2)

## 11. Real Pinecone Integration (Optional)

For production use, integrate with real Pinecone.

In [None]:
# Uncomment to use real Pinecone

# from pinecone import Pinecone, ServerlessSpec

# def setup_pinecone():
#     """Setup real Pinecone vector database"""
#     
#     # Initialize Pinecone
#     pc = Pinecone(api_key=PINECONE_API_KEY)
#     
#     # Create index
#     index_name = "tech-docs"
#     
#     if index_name not in pc.list_indexes().names():
#         pc.create_index(
#             name=index_name,
#             dimension=768,  # Google embedding dimension
#             metric="cosine",
#             spec=ServerlessSpec(
#                 cloud="aws",
#                 region="us-east-1"
#             )
#         )
#     
#     # Connect to index
#     index = pc.Index(index_name)
#     print(f"‚úÖ Connected to Pinecone index: {index_name}")
#     return index

# def upsert_to_pinecone(index, documents):
#     """Upload documents to Pinecone"""
#     vectors = []
#     
#     for doc in documents:
#         embedding = get_embedding(doc['text'])
#         vectors.append((
#             doc['id'],
#             embedding,
#             {"text": doc['text'], **doc['metadata']}
#         ))
#     
#     index.upsert(vectors=vectors)
#     print(f"‚úÖ Uploaded {len(vectors)} vectors to Pinecone")

# def search_pinecone(index, query, top_k=3):
#     """Search Pinecone index"""
#     query_embedding = get_query_embedding(query)
#     results = index.query(
#         vector=query_embedding,
#         top_k=top_k,
#         include_metadata=True
#     )
#     return results['matches']

print("‚ÑπÔ∏è To use real Pinecone:")
print("1. Get API key from pinecone.io")
print("2. Uncomment the code above")
print("3. Set PINECONE_API_KEY variable")
print("4. Run the setup functions")

## üéâ Summary

You've learned how to:
- ‚úÖ Understand vector databases
- ‚úÖ Create embeddings with Google API
- ‚úÖ Build a simple vector database
- ‚úÖ Perform semantic search
- ‚úÖ Implement production RAG pipeline
- ‚úÖ Use metadata filtering
- ‚úÖ Build RAG chatbots
- ‚úÖ Benchmark performance
- ‚úÖ Integrate with real Pinecone

## üí° Key Concepts:

### Vector Database Architecture:
```
Documents ‚Üí Embeddings ‚Üí Vector Index
                              ‚Üì
Query ‚Üí Embedding ‚Üí Similarity Search ‚Üí Top K Results
```

### Embedding Model:
- **Google's embedding-001**: 768 dimensions
- **Task types**: retrieval_document, retrieval_query
- **Captures semantic meaning**

### Similarity Metrics:
- **Cosine similarity**: Best for text
- **Euclidean distance**: For numeric data
- **Dot product**: Fast approximation

## üìã Production RAG Template:

```python
# 1. Setup
from pinecone import Pinecone
pc = Pinecone(api_key=API_KEY)
index = pc.Index("my-index")

# 2. Index documents
for doc in documents:
    embedding = get_embedding(doc['text'])
    index.upsert([(doc['id'], embedding, doc['metadata'])])

# 3. Search
query_emb = get_query_embedding(query)
results = index.query(vector=query_emb, top_k=3)

# 4. Generate
context = get_context_from_results(results)
answer = model.generate_content(f"Context: {context}\nQ: {query}")
```

## üéØ Best Practices:

### 1. Chunking Strategy:
- Break documents into 200-500 word chunks
- Overlap chunks by 10-20%
- Preserve context boundaries

### 2. Metadata Design:
```python
metadata = {
    "category": "docs",
    "source": "file.pdf",
    "date": "2024-01-01",
    "author": "name",
    "page": 5
}
```

### 3. Search Optimization:
- Use appropriate top_k (2-5 for most cases)
- Filter by metadata when possible
- Cache frequently searched queries
- Use async for batch processing

### 4. Quality Control:
- Monitor retrieval accuracy
- Track answer relevance
- Log failed queries
- A/B test retrieval strategies

### 5. Cost Optimization:
- Batch embed documents
- Cache embeddings
- Use appropriate index size
- Monitor API usage

## üìä Metrics to Track:

### Retrieval Metrics:
- **Precision@K**: Relevant docs in top K
- **Recall@K**: Found relevant docs / total relevant
- **MRR**: Mean reciprocal rank
- **NDCG**: Normalized discounted cumulative gain

### Generation Metrics:
- **Faithfulness**: Based on retrieved context
- **Answer relevance**: Addresses the query
- **Context utilization**: Uses retrieved info
- **Hallucination rate**: Made-up information

### System Metrics:
- **Latency**: End-to-end response time
- **Throughput**: Queries per second
- **Cost**: Per query cost
- **Uptime**: System availability

## üöÄ Advanced Techniques:

### 1. Hybrid Search:
Combine semantic + keyword search
```python
semantic_results = vector_search(query)
keyword_results = keyword_search(query)
final_results = rerank(semantic_results + keyword_results)
```

### 2. Reranking:
Re-score results with cross-encoder
```python
results = initial_search(query, top_k=20)
reranked = cross_encoder.rank(query, results)
final = reranked[:5]
```

### 3. Query Expansion:
Generate multiple query variants
```python
variants = [
    original_query,
    rephrase(original_query),
    expand(original_query)
]
results = [search(v) for v in variants]
combined = deduplicate(results)
```

### 4. Multi-hop Retrieval:
Iterative retrieval for complex queries
```python
context1 = search(query)
refined_query = generate_followup(query, context1)
context2 = search(refined_query)
final_answer = generate(query, context1 + context2)
```

## ‚ö†Ô∏è Common Pitfalls:

1. **Too many results**: More isn't always better
2. **Wrong embedding model**: Match to your domain
3. **No metadata**: Miss filtering opportunities
4. **Large chunks**: Lose precision
5. **No monitoring**: Can't improve
6. **Stale data**: Update your index
7. **No fallback**: Handle no-result cases

## üîß Troubleshooting:

### Poor Retrieval Quality:
- Try different embedding models
- Adjust chunk size
- Add more metadata
- Use hybrid search

### Slow Performance:
- Reduce dimensionality
- Use approximate search
- Batch requests
- Cache results

### High Costs:
- Optimize chunk size
- Batch embeddings
- Use cheaper tiers
- Cache embeddings

## üåü Production Checklist:

- [ ] Use production embedding model
- [ ] Set up real vector database (Pinecone, etc.)
- [ ] Implement proper chunking
- [ ] Add comprehensive metadata
- [ ] Set up monitoring
- [ ] Implement rate limiting
- [ ] Add error handling
- [ ] Cache frequent queries
- [ ] Set up logging
- [ ] Test retrieval quality
- [ ] Document API
- [ ] Plan for updates

## üéì What's Next?

Continue learning:
1. Try with your own documents
2. Experiment with different embedding models
3. Build a domain-specific RAG system
4. Implement advanced techniques (reranking, hybrid search)
5. Deploy to production
6. Monitor and iterate

## üéä Congratulations!

You've completed all 10 lessons! You now know:
1. ‚úÖ Model preparation and API setup
2. ‚úÖ Text chat and generation
3. ‚úÖ Image understanding and VQA
4. ‚úÖ Video frame analysis
5. ‚úÖ Streaming responses
6. ‚úÖ Conversation memory
7. ‚úÖ Model configurations
8. ‚úÖ System instructions and personas
9. ‚úÖ Basic RAG
10. ‚úÖ Production RAG with vector databases

You're now ready to build production AI applications! üöÄ