In [None]:
from IPython.display import HTML, display

# Vectors, Embeddings, and Smarter RAG
<hr/>

## 1. Limitations of LLMs Requiring Smarter RAG
**Definition / Idea:**  
Large Language Models (LLMs) are powerful, but they don’t “know everything.” They rely on training data up to a cutoff date and may hallucinate when asked about specific or niche information.  

**Need:**  
- LLMs lack **up-to-date knowledge**.  
- They struggle with **domain-specific data** (e.g., your company’s documents).  
- They may give **confident but wrong answers**.  

**Solution:**  
Retrieval-Augmented Generation (RAG) → Combine LLMs with external knowledge sources for factual, grounded responses.  

**Example:**  
- Asking GPT about your company’s HR policies → It won’t know.  
- With RAG, it can retrieve the document and then answer.

<img src="./data/rag_pipeline_flow.png" alt="RAG Pipeline flow" width="600"> 
<hr/>

## 2. Vectors
**Definition / Idea:**  
A **vector** is simply a list of numbers that represent data in a way computers can compare.  

**Need:**  
We can’t compare text directly (computers don’t “understand” words), but vectors let us compare **meaning** mathematically.  

**Example:**  
- Word “dog” → [0.13, 0.55, -0.22, …]  
- Word “cat” → [0.14, 0.52, -0.20, …]  
- These vectors will be **closer** to each other than “dog” and “car.”

In [27]:
display(HTML('''<!doctypehtml><html lang=en><meta charset=UTF-8><meta content="width=device-width,initial-scale=1"name=viewport><title>Vector Embeddings Scatter Plot</title><style>body{font-family:'Segoe UI',Tahoma,Geneva,Verdana,sans-serif;margin:0;padding:40px;display:flex;justify-content:center;align-items:center;min-height:100vh}.chart-container{border-radius:12px;background:#f8f9fa;padding:30px;box-shadow:0 4px 20px rgba(0,0,0,.1);max-width:700px;width:100%}.chart-title{text-align:center;color:#2c3e50;font-size:1.5em;font-weight:600;margin-bottom:30px}svg{width:100%;height:500px}.axis{stroke:#34495e;stroke-width:2}.grid-line{stroke:#ecf0f1;stroke-width:1}.point{cursor:pointer;transition:all .2s ease}.label{font-family:'Segoe UI',sans-serif;font-size:16px;font-weight:600;pointer-events:none}.axis-label{font-family:'Segoe UI',sans-serif;font-size:14px;fill:#7f8c8d;font-weight:500}.similarity-line{stroke:#e74c3c;stroke-width:2;stroke-dasharray:8,4;opacity:.7}.distance-line{stroke:#95a5a6;stroke-width:1.5;stroke-dasharray:4,4;opacity:.5}</style><div class=chart-container><div class=chart-title>Word Vectors in 2D Space</div><svg viewBox="0 0 600 450"><defs><pattern height=40 id=grid patternUnits=userSpaceOnUse width=40><path class=grid-line d="M 40 0 L 0 0 0 40"fill=none /></pattern></defs><rect fill=url(#grid) height=100% opacity=0.3 width=100% /><line class=axis x1=60 x2=540 y1=390 y2=390 /><line class=axis x1=60 x2=60 y1=390 y2=60 /><text class=axis-label text-anchor=middle x=300 y=425>Vector Dimension 1</text><text class=axis-label text-anchor=middle x=25 y=225 transform="rotate(-90, 25, 225)">Vector Dimension 2</text><line class=similarity-line x1=180 x2=220 y1=150 y2=170 /><line class=distance-line x1=200 x2=420 y1=160 y2=320 /><circle class=point cx=180 cy=150 fill=#e74c3c r=10 /><text class=label text-anchor=middle x=180 y=130 fill=#2c3e50>🐕 dog</text><circle class=point cx=220 cy=170 fill=#e74c3c r=10 /><text class=label text-anchor=middle x=220 y=195 fill=#2c3e50>🐱 cat</text><circle class=point cx=420 cy=320 fill=#3498db r=10 /><text class=label text-anchor=middle x=420 y=305 fill=#2c3e50>🚗 car</text><text class=axis-label text-anchor=middle x=200 y=115 fill=#e74c3c font-weight=600>Similar meaning = close together</text><text class=axis-label text-anchor=middle x=310 y=240 fill=#95a5a6 font-weight=600>Different meaning = far apart</text></svg></div>'''))

<hr/>

## 3. Vectors and Information Retrieval
**Idea:**  
We can represent documents, sentences, or chunks of text as vectors. Then we retrieve information by finding the **closest vectors** to a query.  

**Need:**  
Traditional keyword search only matches exact words. Vector search finds documents that are **semantically related**, even if different words are used.  

**Example:**  
- Query: “How to fix a bike chain”  
- Traditional search → looks for “fix” + “bike chain.”  
- Vector search → finds also “repair bicycle gear system.”  

In [13]:
display(HTML('''<!doctypehtml><html lang=en><meta charset=UTF-8><meta content="width=device-width,initial-scale=1"name=viewport><title>Keyword vs Vector Search</title><style>body{font-family:'Segoe UI',Tahoma,Geneva,Verdana,sans-serif;margin:0;padding:20px;min-height:20vh}.container{max-width:800px;margin:0 auto;background:#fff;border-radius:20px;padding:30px;box-shadow:0 20px 40px rgba(0,0,0,.1)}.header{text-align:center;margin-bottom:40px}.header h1{color:#2c3e50;font-size:2.5em;margin-bottom:10px}.subtitle{color:#7f8c8d;font-size:1.2em}.comparison-container{display:grid;grid-template-columns:1fr 1fr;gap:40px;margin-top:30px}.search-method{background:#f8f9fa;border-radius:15px;padding:25px;box-shadow:0 8px 20px rgba(0,0,0,.1)}.method-title{text-align:center;font-size:1.8em;font-weight:700;margin-bottom:20px;padding:15px;border-radius:10px}.keyword-title{background:linear-gradient(135deg,#ff6b6b 0,#ee5a52 100%);color:#fff}.vector-title{background:linear-gradient(135deg,#4ecdc4 0,#44a08d 100%);color:#fff}.query-box{background:#fff;border:2px solid #3498db;border-radius:10px;padding:15px;margin-bottom:20px;text-align:center;font-weight:700;color:#2c3e50}.flow-step{margin:15px 0;padding:15px;background:#fff;border-radius:8px;border-left:4px solid #3498db;box-shadow:0 2px 8px rgba(0,0,0,.1)}.step-title{font-weight:700;color:#2c3e50;margin-bottom:8px}.step-content{color:#7f8c8d;font-size:.95em}.results-section{margin-top:20px}.result-item{padding:12px;margin:8px 0;border-radius:8px;font-size:.9em}.keyword-result{background:#ffe6e6;border-left:4px solid #ff6b6b}.vector-result{background:#e6f7f5;border-left:4px solid #4ecdc4}.matched-words{background:#fff3cd;padding:2px 6px;border-radius:4px;font-weight:700}.semantic-match{background:#d4edda;padding:2px 6px;border-radius:4px;font-weight:700}.arrow{text-align:center;font-size:1.5em;color:#3498db;margin:10px 0}.vector-space{background:#fff;border-radius:10px;padding:20px;margin:15px 0;border:2px dashed #4ecdc4}.vector-dots{display:flex;justify-content:space-around;align-items:center;height:80px;position:relative}.vector-dot{width:12px;height:12px;border-radius:50%;position:relative}.query-dot{background:#3498db;box-shadow:0 0 15px rgba(52,152,219,.5)}.close-dot{background:#4ecdc4;box-shadow:0 0 10px rgba(78,205,196,.5)}.far-dot{background:#95a5a6}.vector-label{position:absolute;top:-25px;left:50%;transform:translateX(-50%);font-size:.8em;font-weight:700;color:#2c3e50;white-space:nowrap}.similarity-line{position:absolute;height:2px;background:linear-gradient(to right,#3498db,#4ecdc4);top:50%;transform:translateY(-50%)}.insight-box{background:linear-gradient(135deg,#74b9ff 0,#0984e3 100%);color:#fff;padding:25px;border-radius:15px;margin-top:30px;text-align:center}.insight-box h3{margin-top:0;font-size:1.4em}</style><div class=container><div class=header><h1>Keyword vs Vector Search</h1><p class=subtitle>Understanding the difference between exact matching and semantic similarity</div><div class=comparison-container><div class=search-method><div class="method-title keyword-title">🔍 Traditional Keyword Search</div><div class=query-box>Query: "How to fix a bike chain"</div><div class=flow-step><div class=step-title>Step 1: Text Matching</div><div class=step-content>Looks for exact word matches: "fix", "bike", "chain"</div></div><div class=arrow>↓</div><div class=flow-step><div class=step-title>Step 2: Index Lookup</div><div class=step-content>Searches document index for these specific terms</div></div><div class=arrow>↓</div><div class=results-section><div class=step-title>Results Found:</div><div class="result-item keyword-result">✅ "How to <span class=matched-words>fix</span> your <span class=matched-words>bike chain</span> quickly"</div><div class="result-item keyword-result">✅ "Common <span class=matched-words>bike chain</span> problems and how to <span class=matched-words>fix</span> them"</div><div class="result-item keyword-result"style=opacity:.5>❌ "Repair bicycle gear system" (no exact matches)</div></div></div><div class=search-method><div class="method-title vector-title">🧠 Vector Semantic Search</div><div class=query-box>Query: "How to fix a bike chain"</div><div class=flow-step><div class=step-title>Step 1: Convert to Vector</div><div class=step-content>Transform query into numerical vector representation</div></div><div class=arrow>↓</div><div class=vector-space><div class=step-title style=text-align:center;margin-bottom:15px>Step 2: Vector Space Comparison</div><div class=vector-dots><div class="vector-dot query-dot"><div class=vector-label>Query</div></div><div class=similarity-line style=left:16%;width:30%></div><div class="vector-dot close-dot"><div class=vector-label>Similar Doc</div></div><div class="vector-dot far-dot"><div class=vector-label>Different Doc</div></div></div></div><div class=arrow>↓</div><div class=results-section><div class=step-title>Results Found:</div><div class="result-item vector-result">✅ "How to <span class=semantic-match>fix</span> your <span class=semantic-match>bike chain</span> quickly"</div><div class="result-item vector-result">✅ "<span class=semantic-match>Repair bicycle gear system</span>" (semantically similar!)</div><div class="result-item vector-result">✅ "Bicycle maintenance and <span class=semantic-match>troubleshooting</span>"</div></div></div></div><div class=insight-box><h3>💡 Key Difference</h3><p><strong>Keyword search</strong> only finds documents with exact word matches, potentially missing relevant content that uses different terminology.<p><strong>Vector search</strong> finds documents that are semantically related, even when they use completely different words like "repair" instead of "fix" or "bicycle" instead of "bike".</div></div>'''))

<hr/>

## 4. Vector Similarities
**Definition:**  
Vector similarity measures how “close” two vectors are.  

**Types:**  
- **Cosine similarity:** Angle between vectors  
- **Euclidean distance:** Straight-line distance  
- **Dot product:** Weighted overlap  

**Need:**  
Similarity metrics allow ranking documents by relevance.  

**Example:**  
- Query: “What is AI?”  
- Candidate Docs → Calculate similarity scores → Pick top-k closest

In [18]:
display(HTML('''<!doctypehtml><html lang=en><meta charset=UTF-8><meta content="width=device-width,initial-scale=1"name=viewport><title>Vector Similarity Measures</title><style>body{font-family:'Segoe UI',Tahoma,Geneva,Verdana,sans-serif;margin:0;padding:20px}.container{max-width:800px;margin:0 auto;background:#fff;border-radius:20px;padding:30px;box-shadow:0 20px 40px rgba(0,0,0,.1)}.header{text-align:center;margin-bottom:40px}.header h1{color:#2c3e50;font-size:2.5em;margin-bottom:10px}.subtitle{color:#7f8c8d;font-size:1.2em}.definition-box{background:linear-gradient(135deg,#74b9ff 0,#0984e3 100%);color:#fff;padding:25px;border-radius:15px;margin-bottom:30px;text-align:center}.definition-box h3{margin-top:0;font-size:1.4em}.similarity-types{display:grid;grid-template-columns:repeat(auto-fit,minmax(400px,1fr));gap:25px;margin-bottom:40px}.similarity-card{background:#f8f9fa;border-radius:15px;padding:25px;box-shadow:0 8px 20px rgba(0,0,0,.1)}.card-title{font-size:1.3em;font-weight:700;margin-bottom:15px;color:#2c3e50;text-align:center}.cosine-title{border-left:5px solid #e74c3c;padding-left:15px}.euclidean-title{border-left:5px solid #f39c12;padding-left:15px}.dot-title{border-left:5px solid #27ae60;padding-left:15px}.visual-container{background:#fff;border-radius:10px;padding:20px;margin:15px 0;border:2px solid #ecf0f1}.formula{background:#2c3e50;color:#fff;padding:10px;border-radius:8px;font-family:'Courier New',monospace;text-align:center;margin:10px 0;font-size:.9em}.example-section{background:#fff;border-radius:15px;padding:25px;margin-top:30px;box-shadow:0 8px 20px rgba(0,0,0,.1)}.example-title{font-size:1.5em;font-weight:700;color:#2c3e50;text-align:center;margin-bottom:20px}.query-box{background:#e3f2fd;border:2px solid #2196f3;border-radius:10px;padding:15px;text-align:center;font-weight:700;color:#1976d2;margin-bottom:20px}.documents-grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(200px,1fr));gap:15px;margin:20px 0}.document-card{background:#f8f9fa;border-radius:10px;padding:15px;text-align:center;border:2px solid #dee2e6;transition:all .3s ease}.document-card:hover{transform:translateY(-2px);box-shadow:0 4px 12px rgba(0,0,0,.15)}.doc-title{font-weight:700;color:#2c3e50;margin-bottom:10px;font-size:.9em}.similarity-score{font-size:1.2em;font-weight:700;padding:8px;border-radius:6px;margin:8px 0}.high-sim{background:#d4edda;color:#155724}.medium-sim{background:#fff3cd;color:#856404}.low-sim{background:#f8d7da;color:#721c24}.ranking{background:#e9ecef;padding:5px 10px;border-radius:15px;font-size:.8em;font-weight:700;color:#495057}svg{width:100%;height:300px}.vector-arrow{marker-end:url(#arrowhead);stroke-width:3}.query-vector{stroke:#3498db}.doc1-vector{stroke:#e74c3c}.doc2-vector{stroke:#f39c12}.doc3-vector{stroke:#95a5a6}.angle-arc{fill:none;stroke-width:2;opacity:.7}.distance-line{stroke:#34495e;stroke-width:2;stroke-dasharray:5,5;opacity:.6}.vector-label{font-size:12px;font-weight:700;text-anchor:middle}.insight-box{background:linear-gradient(135deg,#55efc4 0,#00b894 100%);color:#fff;padding:25px;border-radius:15px;margin-top:30px}.insight-box h3{margin-top:0;font-size:1.3em}</style><div class=container><div class=header><h1>Vector Similarity Measures</h1><p class=subtitle>How to measure "closeness" between vectors</div><div class=definition-box><h3>📏 Definition</h3><p>Vector similarity measures how "close" two vectors are, allowing us to rank documents by relevance to a query.</div><div class=similarity-types><div class=similarity-card><div class="card-title cosine-title">📐 Cosine Similarity</div><p><strong>Measures:</strong> Angle between vectors<p><strong>Range:</strong> -1 to 1 (1 = identical direction)<div class=formula>cos(θ) = (A·B) / (|A|×|B|)</div><div class=visual-container><svg viewBox="0 0 300 200"><defs><marker id=arrowhead1 markerHeight=7 markerWidth=10 orient=auto refX=9 refY=3.5><polygon fill=#e74c3c points="0 0, 10 3.5, 0 7"/></marker></defs><circle cx=50 cy=150 fill=#2c3e50 r=3 /><line stroke=#3498db stroke-width=3 x1=50 x2=180 y1=150 y2=80 marker-end=url(#arrowhead1) /><line stroke=#e74c3c stroke-width=3 x1=50 x2=200 y1=150 y2=100 marker-end=url(#arrowhead1) /><path d="M 80 150 A 30 30 0 0 0 85 135"fill=none stroke=#f39c12 stroke-width=2 /><text x=115 y=110 class=vector-label fill=#3498db>Query</text><text x=140 y=95 class=vector-label fill=#e74c3c>Doc</text><text x=95 y=145 class=vector-label fill=#f39c12>θ</text><text x=150 y=180 style=font-size:10px;text-anchor:middle;fill:#7f8c8d>Small angle = High similarity</text></svg></div></div><div class=similarity-card><div class="card-title euclidean-title">📏 Euclidean Distance</div><p><strong>Measures:</strong> Straight-line distance<p><strong>Range:</strong> 0 to ∞ (0 = identical)<div class=formula>d = √[(x₁-x₂)² + (y₁-y₂)²]</div><div class=visual-container><svg viewBox="0 0 300 200"><defs><marker id=arrowhead2 markerHeight=7 markerWidth=10 orient=auto refX=9 refY=3.5><polygon fill=#f39c12 points="0 0, 10 3.5, 0 7"/></marker></defs><defs><pattern height=20 id=smallGrid patternUnits=userSpaceOnUse width=20><path d="M 20 0 L 0 0 0 20"fill=none stroke=#ecf0f1 /></pattern></defs><rect fill=url(#smallGrid) height=100% opacity=0.5 width=100% /><circle cx=80 cy=120 fill=#3498db r=6 /><circle cx=180 cy=80 fill=#e74c3c r=6 /><line stroke=#f39c12 stroke-width=3 x1=80 x2=180 y1=120 y2=80 stroke-dasharray=5,5 /><text x=80 y=140 class=vector-label fill=#3498db>Query</text><text x=180 y=70 class=vector-label fill=#e74c3c>Doc</text><text x=130 y=95 class=vector-label fill=#f39c12>Distance</text><text x=150 y=180 style=font-size:10px;text-anchor:middle;fill:#7f8c8d>Short distance = High similarity</text></svg></div></div><div class=similarity-card><div class="card-title dot-title">⚖️ Dot Product</div><p><strong>Measures:</strong> Weighted overlap<p><strong>Range:</strong> -∞ to ∞ (higher = more similar)<div class=formula>A·B = Σ(aᵢ × bᵢ)</div><div class=visual-container><svg viewBox="0 0 300 200"><defs><marker id=arrowhead3 markerHeight=7 markerWidth=10 orient=auto refX=9 refY=3.5><polygon fill=#27ae60 points="0 0, 10 3.5, 0 7"/></marker></defs><circle cx=50 cy=150 fill=#2c3e50 r=3 /><line stroke=#3498db stroke-width=3 x1=50 x2=160 y1=150 y2=90 marker-end=url(#arrowhead3) /><line stroke=#e74c3c stroke-width=3 x1=50 x2=180 y1=150 y2=70 marker-end=url(#arrowhead3) /><line stroke=#27ae60 stroke-width=2 x1=160 x2=140 y1=90 y2=110 stroke-dasharray=3,3 /><text x=105 y=115 class=vector-label fill=#3498db>Query</text><text x=125 y=75 class=vector-label fill=#e74c3c>Doc</text><text x=150 y=180 style=font-size:10px;text-anchor:middle;fill:#7f8c8d>Magnitude × Cosine = Dot Product</text></svg></div></div></div><div class=example-section><div class=example-title>🔍 Example: "What is AI?" Query Ranking</div><div class=query-box>Query: "What is AI?"</div><div class=documents-grid><div class=document-card><div class=doc-title>📄 "Introduction to Artificial Intelligence"</div><div class="similarity-score high-sim">Cosine: 0.95</div><div class="similarity-score high-sim">Distance: 0.12</div><div class="similarity-score high-sim">Dot Product: 8.7</div><div class=ranking>🥇 Rank #1</div></div><div class=document-card><div class=doc-title>📄 "Machine Learning Fundamentals"</div><div class="similarity-score medium-sim">Cosine: 0.78</div><div class="similarity-score medium-sim">Distance: 0.31</div><div class="similarity-score medium-sim">Dot Product: 6.4</div><div class=ranking>🥈 Rank #2</div></div><div class=document-card><div class=doc-title>📄 "Cooking Recipes for Beginners"</div><div class="similarity-score low-sim">Cosine: 0.05</div><div class="similarity-score low-sim">Distance: 2.8</div><div class="similarity-score low-sim">Dot Product: 0.3</div><div class=ranking>🥉 Rank #3</div></div></div><div style=text-align:center;margin-top:20px;color:#7f8c8d><strong>Result:</strong> Documents ranked by similarity → Return top-k most relevant documents</div></div><div class=insight-box><h3>💡 Key Insights</h3><p><strong>Cosine Similarity:</strong> Best for text where document length doesn't matter - focuses on direction/meaning.<p><strong>Euclidean Distance:</strong> Good when both direction and magnitude matter - considers absolute differences.<p><strong>Dot Product:</strong> Considers both similarity and magnitude - longer documents get higher scores.<p><strong>Usage in RAG:</strong> Calculate similarity between query vector and all document vectors, then retrieve the top-k most similar documents.</div></div>'''))

<hr/>

## 5. Embeddings
**Definition:**  
An **embedding** is the vector representation of data (text, image, etc.) produced by a model.  

**Need:**  
Embeddings turn unstructured data into a numerical form that’s machine-comparable.  

**Example:**  
- Text “AI is powerful” → embedding model → [0.12, 0.88, -0.34, …]  
- Used for semantic search, clustering, classification.
<hr/>

## 6. Embedding Models and Their Differences
**Definition:**  
Different models generate embeddings differently → trade-offs in accuracy, size, and domain specialization.  

**Types of differences:**  
- **Dimensionality:** Some models give 384D vectors, others 1536D.  
- **Training data:** General-purpose vs domain-specific.  
- **Speed vs Accuracy:** Larger embeddings capture more nuance but are slower.  

**Examples:**  
- **OpenAI text-embedding-3-small** → smaller, cheaper, faster.  
- **OpenAI text-embedding-3-large** → more accurate for nuanced meanings.  
- **Sentence-BERT** → optimized for sentence-level tasks.  

**Diagram idea:**  
| Model                   | Dimensions | Speed | Cost | Accuracy |
|--------------------------|------------|-------|------|----------|
| OpenAI 3-small           | 1536       | Fast  | Low  | Good     |
| OpenAI 3-large           | 3072       | Medium| Med  | Higher   |
| Sentence-BERT (SBERT)    | 768        | Fast  | Free | Medium   |

<hr/>

## 7. Vector Databases
**Definition:**  
A **vector database** is a specialized system to store and search vectors efficiently.  

**Need:**  
- Billions of vectors = can’t brute-force search.  
- Need indexing (like Approximate Nearest Neighbor - ANN search).  
- Need metadata filtering (e.g., “find documents about finance”).  

**Example use cases:**  
- Semantic search across millions of documents.  
- Image similarity search (find “similar shoes”).  
- Recommendation systems.  

In [29]:
display(HTML("""<!doctypehtml><html lang=en><meta charset=UTF-8><meta content="width=device-width,initial-scale=1"name=viewport><title>Vector Databases</title><style>body{font-family:'Segoe UI',Tahoma,Geneva,Verdana,sans-serif;margin:0;padding:20px;min-height:100vh}.container{max-width:800px;margin:0 auto;background:#fff;border-radius:20px;padding:30px;box-shadow:0 20px 40px rgba(0,0,0,.1)}.header{text-align:center;margin-bottom:40px}.header h1{color:#2c3e50;font-size:2.5em;margin-bottom:10px}.subtitle{color:#7f8c8d;font-size:1.2em}.concept-grid{display:grid;grid-template-columns:1fr 1fr;gap:30px;margin-bottom:40px}.definition-box{background:linear-gradient(135deg,#74b9ff 0,#0984e3 100%);color:#fff;padding:25px;border-radius:15px;box-shadow:0 8px 20px rgba(116,185,255,.3)}.definition-box h3{margin-top:0;font-size:1.4em}.need-box{background:linear-gradient(135deg,#fd79a8 0,#e84393 100%);color:#fff;padding:25px;border-radius:15px;box-shadow:0 8px 20px rgba(253,121,168,.3)}.need-box h3{margin-top:0;font-size:1.4em}.need-box ul{margin:0;padding-left:20px}.need-box li{margin-bottom:8px}.comparison-section{background:#f8f9fa;border-radius:15px;padding:25px;margin-bottom:30px}.comparison-title{text-align:center;font-size:1.5em;font-weight:700;color:#2c3e50;margin-bottom:20px}.comparison-grid{display:grid;grid-template-columns:1fr 1fr;gap:20px}.search-method{background:#fff;border-radius:12px;padding:20px;box-shadow:0 4px 12px rgba(0,0,0,.1)}.method-header{text-align:center;padding:10px;border-radius:8px;font-weight:700;margin-bottom:15px}.brute-force{background:#ffebee;color:#c62828}.ann-search{background:#e8f5e8;color:#2e7d32}.method-details{color:#555;line-height:1.6}.performance-bar{background:#ecf0f1;height:8px;border-radius:4px;margin:10px 0;overflow:hidden}.perf-fill{height:100%;transition:width .3s ease}.slow{background:#e74c3c;width:20%}.fast{background:#27ae60;width:95%}.process-flow{background:#fff;border-radius:15px;padding:25px;margin-bottom:30px;box-shadow:0 8px 20px rgba(0,0,0,.1)}.flow-title{text-align:center;font-size:1.5em;font-weight:700;color:#2c3e50;margin-bottom:30px}.flow-container{display:flex;align-items:center;justify-content:space-between;flex-wrap:wrap;gap:15px}.flow-step{background:#f8f9fa;border:2px solid #dee2e6;border-radius:12px;padding:20px;text-align:center;min-width:150px;position:relative;transition:all .3s ease}.flow-step:hover{transform:translateY(-3px);box-shadow:0 6px 20px rgba(0,0,0,.15)}.step-icon{font-size:2em;margin-bottom:10px}.step-title{font-weight:700;color:#2c3e50;margin-bottom:5px}.step-desc{color:#6c757d;font-size:.9em}.flow-arrow{font-size:1.5em;color:#3498db;margin:0 10px}.use-cases{display:grid;grid-template-columns:repeat(auto-fit,minmax(300px,1fr));gap:20px;margin-bottom:30px}.use-case{background:#fff;border-radius:12px;padding:20px;box-shadow:0 4px 12px rgba(0,0,0,.1);border-left:5px solid;transition:transform .3s ease}.use-case:hover{transform:translateY(-3px)}.semantic-search{border-left-color:#3498db}.image-search{border-left-color:#e74c3c}.recommendations{border-left-color:#f39c12}.use-case-title{font-weight:700;color:#2c3e50;margin-bottom:10px;font-size:1.1em}.use-case-desc{color:#6c757d;line-height:1.5}.scale-demo{background:linear-gradient(135deg,#667eea 0,#764ba2 100%);color:#fff;border-radius:15px;padding:25px;text-align:center;margin-bottom:30px}.scale-demo h3{margin-top:0;font-size:1.4em}.scale-numbers{display:flex;justify-content:space-around;margin:20px 0}.scale-item{text-align:center}.scale-number{font-size:2em;font-weight:700;display:block}.scale-label{font-size:.9em;opacity:.9}.metadata-section{background:#fff3e0;border-radius:12px;padding:20px;margin-top:20px;border-left:5px solid #ff9800}.metadata-title{font-weight:700;color:#e65100;margin-bottom:10px}.filter-example{background:#fff;border-radius:8px;padding:15px;margin:10px 0;border:1px solid #ffcc02;font-family:'Courier New',monospace;color:#d84315}</style><div class=container><div class=header><h1>Vector Databases</h1><p class=subtitle>Specialized systems for efficient vector storage and search</div><div class=concept-grid><div class=definition-box><h3>🗄️ Definition</h3><p>A <strong>vector database</strong> is a specialized system designed to store and search vectors efficiently at massive scale.</div><div class=need-box><h3>⚡ Why We Need Vector Databases</h3><ul><li>Billions of vectors = can't brute-force search<li>Need efficient indexing (ANN search)<li>Metadata filtering capabilities<li>Real-time performance requirements</ul></div></div><div class=scale-demo><h3>📊 Scale Challenge</h3><div class=scale-numbers><div class=scale-item><span class=scale-number>1B+</span> <span class=scale-label>Vectors stored</span></div><div class=scale-item><span class=scale-number>&lt;100ms</span> <span class=scale-label>Search time needed</span></div><div class=scale-item><span class=scale-number>1000s</span> <span class=scale-label>Concurrent queries</span></div></div><p>Traditional databases can't handle this scale for vector similarity search!</div><div class=comparison-section><div class=comparison-title>🚀 Brute Force vs ANN Search</div><div class=comparison-grid><div class=search-method><div class="method-header brute-force">❌ Brute Force Search</div><div class=method-details><p><strong>Process:</strong> Compare query to every single vector<p><strong>Accuracy:</strong> 100% (finds exact matches)<p><strong>Speed:</strong> Very slow for large datasets<div class=performance-bar><div class="perf-fill slow"></div></div><p><strong>Complexity:</strong> O(n) - linear with data size</div></div><div class=search-method><div class="method-header ann-search">✅ ANN (Approximate Nearest Neighbor)</div><div class=method-details><p><strong>Process:</strong> Use smart indexing to skip irrelevant vectors<p><strong>Accuracy:</strong> ~95-99% (good enough for most cases)<p><strong>Speed:</strong> Very fast, even with billions of vectors<div class=performance-bar><div class="perf-fill fast"></div></div><p><strong>Complexity:</strong> O(log n) - logarithmic scaling</div></div></div></div><div class=process-flow><div class=flow-title>🔍 Vector Database Query Flow</div><div class=flow-container><div class=flow-step><div class=step-icon>🎯</div><div class=step-title>Query Vector</div><div class=step-desc>User's search converted to vector</div></div><div class=flow-arrow>→</div><div class=flow-step><div class=step-icon>🗂️</div><div class=step-title>ANN Index</div><div class=step-desc>Smart search through optimized index</div></div><div class=flow-arrow>→</div><div class=flow-step><div class=step-icon>🏆</div><div class=step-title>Top-k Similar</div><div class=step-desc>Find k most similar vectors</div></div><div class=flow-arrow>→</div><div class=flow-step><div class=step-icon>📄</div><div class=step-title>Return Documents</div><div class=step-desc>Retrieve original documents</div></div></div></div><div class=metadata-section><div class=metadata-title>🏷️ Metadata Filtering</div><p>Vector databases can combine similarity search with traditional filters:<div class=filter-example>SELECT * FROM documents WHERE vector_similarity(query_vector, doc_vector) > 0.8 AND category = 'finance' AND date > '2023-01-01'</div><p>This allows queries like: "Find documents about finance that are similar to my query"</div><div class=use-cases><div class="use-case semantic-search"><div class=use-case-title>📚 Semantic Search</div><div class=use-case-desc>Search across millions of documents by meaning, not just keywords. Find relevant content even when different words are used.</div></div><div class="use-case image-search"><div class=use-case-title>🖼️ Image Similarity</div><div class=use-case-desc>"Find similar shoes" - compare visual features encoded as vectors to find products with similar appearance or style.</div></div><div class="use-case recommendations"><div class=use-case-title>🎯 Recommendation Systems</div><div class=use-case-desc>User preferences and item features as vectors. Find similar users or recommend similar products based on past behavior.</div></div></div><div style="background:linear-gradient(135deg,#55efc4 0,#00b894 100%);color:#fff;padding:25px;border-radius:15px;text-align:center"><h3 style=margin-top:0>💡 Key Insight</h3><p>Vector databases make the impossible possible: searching through billions of high-dimensional vectors in real-time. They're the backbone that enables RAG, semantic search, and AI-powered applications to work at scale.</div></div>"""))

<hr/>

## 8. Pinecone, PGVector, and FAISS
**Overview:**  
Different vector database technologies:  

1. **Pinecone** (Managed SaaS)  
   - Cloud-native, fully managed.  
   - Easy to integrate, scalable.  
   - Best for production without ops overhead.  

2. **PGVector** (Postgres extension)  
   - Store vectors inside PostgreSQL.  
   - Good for projects already using Postgres.  
   - Not as fast as specialized DBs for huge scale.  

3. **FAISS** (Facebook AI Similarity Search)  
   - Open-source library by Meta.  
   - Very fast on GPUs.  
   - Requires DevOps effort to scale.  

**Example use cases:**  
- **Pinecone** → AI-powered search in SaaS.  
- **PGVector** → Add semantic search to existing Postgres app.  
- **FAISS** → Large-scale research prototypes, GPU acceleration.  

**Comparison Table:**  

| Feature      | Pinecone        | PGVector           | FAISS              |
|--------------|----------------|-------------------|--------------------|
| Type         | Managed cloud   | DB extension       | Library            |
| Scale        | Enterprise      | Small–medium       | Huge (GPU-ready)   |
| Ease of use  | Easy            | Moderate           | Harder             |
| Best for     | SaaS, Prod apps | Postgres projects  | Custom ML systems  |

<hr/>

## 9. Agentic RAG
**Definition / Idea:**  
Agentic RAG extends standard RAG by giving the LLM **agency**: the ability to plan, reason, and decide **what to retrieve** and **how to use it**, instead of just retrieving top-k results.  

**Need:**  
- Standard RAG may bring irrelevant or redundant chunks.  
- Agentic RAG allows the LLM to dynamically:  
  - Reformulate queries  
  - Decide which sources to query  
  - Iterate retrieval until it has enough context  

**Example:**  
- Query: “Summarize the last quarter’s financial trends.”  
- Standard RAG → Pulls fixed top-k chunks.  
- Agentic RAG → LLM first searches for revenue data, then expenses, then compares, and finally writes a summary.  

<img src="./data/agentic_rag.webp" alt="Agentic RAG" width="800">

<hr/>

## 10. Chunking Techniques for RAG
**Definition / Idea:**  
Chunking is the process of breaking large documents into smaller pieces before embedding them.  

**Need:**  
- LLMs have token limits → entire documents can’t fit.  
- Smaller chunks improve retrieval precision.  
- But chunks that are too small may lose context.  

**Common Techniques:**  
1. **Fixed-size chunks** → e.g., 500 tokens each.  
2. **Sliding window** → overlap between chunks to preserve context.  
3. **Semantic chunking** → split by meaning (e.g., headings, paragraphs).  

**Example:**  
- PDF with 30 pages → split into sections per heading → embed each section → store in vector DB.