# üöÄ Multi-Vector Image Retrieval Optimization

Welcome! This is an **open-ended project** where you'll use **Jupyter AI** to implement vector optimization techniques. Jupyter AI is an AI-powered coding assistant built directly into JupyterLab - simply describe what you want to build in the chat panel, and it generates working code for you to run.

---

**Goal:** Optimize multi-vector search using **Scalar Quantization** with optional **HNSW** for speed

> ### üéØ What You'll Achieve:
> **75% memory reduction** + ‚ö° **Faster, more efficient search** - with **zero accuracy loss!**

**What's pre-built:** Baseline ColPali search with 5 paper pages already indexed

**Your task:** Implement scalar quantization and see the memory/speed trade-offs!

---

![Jupyter AI Chat](images/jupyter_chat_bordered.png)

> üí° **New to JupyterAI?** Learn more at [JupyterAI: Coding in Notebooks](https://www.deeplearning.ai/short-courses/jupyter-ai-coding-in-notebooks/)

> ‚ö†Ô∏è **Session expires in 2 hours** - download `project.ipynb` regularly!

---

### ‚ö° IMPORTANT: Attach Files to Jupyter AI

**Before asking Jupyter AI for help, attach these files to your prompts:**
- `project.ipynb` - Your code and progress
- `spec.md` - Optimization challenge details  
- `docs.md` - Qdrant quantization API reference

This gives Jupyter AI the context it needs to generate correct code!

## Setup: Load Baseline

Run the cell below to:
1. Load 5 pre-computed ColPali embeddings from "Attention is All You Need" paper
2. Create Qdrant collection and index images
3. Load ColPali model
4. Run baseline search for: `"transformer architecture diagram"`

You'll see results immediately!

> üí° **Want to verify the search results?** All page images are available in `attention_paper/` folder (page-0.png through page-9.png). Check them to see if you agree with the ranking!

In [1]:
from baseline_helper import BaselineSetup

# Run complete baseline setup
baseline = BaselineSetup()
query_embedding, baseline_metrics = baseline.setup_all()

print("\n‚úÖ Baseline ready! Now choose your optimization below...")

üì¶ Loading pre-computed embeddings...
‚úÖ Loaded 5 images
   Total vectors: 5120
   Vector dimension: 128
   Memory (float32): 2.50 MB

üóÑÔ∏è  Creating baseline collection...
‚úÖ Indexed 5 images

ü§ñ Loading ColPali model...


preprocessor_config.json:   0%|          | 0.00/489 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/68.0 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

## üéØ Your Task: Implement Scalar Quantization

**Choose one approach:**
- **Option 1: Memory Optimization** (Scalar Quantization only)
- **Option 2: Memory + Speed** (Scalar Quantization + HNSW)

---

### Example Prompts for Jupyter AI

Copy ONE of these prompts to the Jupyter AI chatbot:

<details>
<summary><strong>Option 1: Scalar Quantization Only (Recommended - Start Here)</strong></summary>

```
Implement scalar quantization optimization for ColPali multi-vector search:

1. Import necessary Qdrant models (check docs.md for ScalarQuantization imports)
2. Create a new collection called "optimized" with scalar quantization enabled (INT8)
3. Use the same multi-vector configuration as baseline (MaxSim)
4. Disable HNSW for exact search: hnsw_config=HnswConfigDiff(m=0)
5. Index all 5 images from baseline.embeddings_df using their original embeddings
6. Run the same query as baseline using baseline.processor and baseline.model
7. Measure search time and store results
8. Calculate memory metrics (quantized uses 1 byte per dim vs baseline's 4 bytes)
9. Use print_comparison() from baseline_helper to show results

Expected results:
- 75% memory reduction (2.50 MB ‚Üí 0.62 MB)
- Same or better accuracy (exact search)
- Similar speed (brute-force MaxSim)

Remember: Qdrant handles quantization internally, just send original float32 embeddings!
```
</details>

<details>
<summary><strong>Option 2: Scalar Quantization + HNSW (Advanced - Try Second)</strong></summary>

```
Implement scalar quantization with HNSW for combined memory + speed optimization:

1. Import necessary Qdrant models (check docs.md for ScalarQuantization imports)
2. Create a new collection called "optimized_hnsw" with scalar quantization enabled (INT8)
3. Use the same multi-vector configuration as baseline (MaxSim)
4. Enable HNSW for approximate search: hnsw_config=HnswConfigDiff(m=16, ef_construct=100)
5. Index all 5 images from baseline.embeddings_df using their original embeddings
6. Run the same query as baseline using baseline.processor and baseline.model
7. Measure search time and store results
8. Calculate memory metrics (same as scalar-only: 1 byte per dim)
9. Use print_comparison() from baseline_helper to show results

Expected results:
- 75% memory reduction (same as scalar-only)
- Potentially faster on large datasets (note: 5 images is too small to see HNSW benefits)
- Slightly lower accuracy (approximate search trade-off)

Key insight: HNSW shows speed benefits on 1000+ vectors, not 5 images!
```
</details>

---

**After copying a prompt:** Paste it in Jupyter AI chat, then paste the generated code in the cell below.

In [None]:
# üéØ IMPLEMENT YOUR OPTIMIZATION HERE
# Paste the code generated by Jupyter AI



## üéâ Congratulations!

You've completed the multi-vector optimization challenge!

### What You Learned

- **Scalar Quantization**: 75% memory reduction by compressing float32 ‚Üí int8
- **Trade-offs**: Memory vs Speed vs Accuracy in vector search
- **HNSW**: Approximate search for speed (benefits appear on large datasets)
- **Stacking optimizations**: Quantization + HNSW for memory AND speed

### Key Takeaways

**Option 1 (Scalar Only):**
- ‚úÖ 75% memory reduction
- ‚úÖ Perfect accuracy (exact search)
- ‚úÖ Simple implementation
- Best for: Memory-constrained environments

**Option 2 (Scalar + HNSW):**
- ‚úÖ 75% memory reduction (same)
- ‚ö° Faster on large datasets (1000+ vectors)
- ‚ö†Ô∏è Slight accuracy trade-off (approximate search)
- Best for: Large-scale production systems

### Next Steps

1. **Compare both approaches**: Try both Option 1 and Option 2 to see the HNSW parameter difference
2. **Production tip**: Start with Scalar-only, add HNSW when scaling to 1000+ documents

### üî¨ Extensions to Try (After Session)

<details>
<summary><strong>Experiment with Different Queries</strong></summary>

Test how ColPali handles different semantic concepts:

```python
# In Cell 2, modify the query:
query_embedding, baseline_metrics = baseline.setup_all(
    query="attention mechanism"  # Try different queries!
)
```

**Try these queries:**
- `"attention mechanism"` - Focus on specific component
- `"encoder decoder stacks"` - More specific architecture detail
- `"multi head attention"` - Another key component
- `"figure 1"` - Meta query (searching by reference)
- `"model architecture"` - More general query

**What to observe:**
- Which pages rank highest for each query?
- Do text-heavy pages rank higher than diagram pages?
- How does quantization affect accuracy for different queries?
</details>

<details>
<summary><strong>Use More Pages from the Paper</strong></summary>

Index all 10 available pages instead of just 5:

```python
# In Cell 2, modify to use all 10 pages:
query_embedding, baseline_metrics = baseline.setup_all(
    available_pages=list(range(10))  # All 10 pages instead of default [0,1,2,3,4]
)
```

**Expected changes:**
- Memory: 2.50 MB ‚Üí 5.00 MB (baseline)
- Memory: 0.62 MB ‚Üí 1.25 MB (optimized)
- Still 75% reduction!
- More diverse search results
- HNSW benefits become slightly more noticeable

**What to observe:**
- Does the ranking change with more pages?
- How does search time scale with 2x the vectors?
- Is quantization accuracy still perfect?
</details>

---

### üì¶ Package as a Python Script

Want to turn your notebook into a reusable `.py` script? **Jupyter AI can help!** Ask it to:
- Extract your optimization code from Cell 4
- Package it as a standalone Python script
- Add command-line arguments for different queries and page ranges

This makes your optimization code production-ready!

---

## üìã Optional Feedback Survey ¬∂

We'd love to hear about your experience! Your feedback helps us create more valuable educational experiences.

**[Take the short survey ‚Üí](https://rebrand.ly/xudvvw2)**

This optional survey asks about:

- Project quality and engagement
- What you found most valuable
- How we can improve future projects

Thank you for your time! üôè

---

> ‚ö†Ô∏è **Session expires in 2 hours** - download your completed `project.ipynb` now!

> üìö **Learn more**: Check out the full [Qdrant Optimization course](https://www.deeplearning.ai/short-courses/)