<a href="https://colab.research.google.com/github/abhijeet3922/vision-RAG/blob/main/1_search_with_colpali_late_interaction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Vision-Augmented Search**

This notebook demostrate how to perform query search over a PDF which includes search over all the infographics provided in PDF.
- [Installing Libraries](#)
- [Loading Visual Language Model (VLM): ColPali](#)
- [Read PDF as Images](#)
- [Convert PDF pages to Vision Embedding](#)
- [Convert Query into Embedding](#)
- [Retrieve the best matched Page](#)

### Install Libraries

In [None]:
!pip install colpali-engine==0.3.2
!pip install pdf2image
!sudo apt-get install poppler-utils

### Load Visual Language Model (ColPali)

In [None]:
import torch
from colpali_engine.models import ColPali, ColPaliProcessor

model_name = "vidore/colpali-v1.3"

model = ColPali.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",  # or "mps" if on Apple Silicon
).eval()

processor = ColPaliProcessor.from_pretrained(model_name)

### Read PDF data
- convert it to images (page by page)

In [None]:
from pdf2image import convert_from_path
images = convert_from_path('/content/google-alphabet-2024.pdf')

In [None]:
print("Number of pages:", len(images))

### Convert Pages into Image Embeddings

In [None]:
from torch.utils.data import DataLoader
dataloader = DataLoader(images, batch_size=1, shuffle=False, collate_fn=lambda x: processor.process_images(x).to(model.device))

In [None]:
from tqdm import tqdm

dataset = []
for batch in tqdm(dataloader):
  with torch.no_grad():
    batch = {k: v.to(model.device) for k,v in batch.items()}
    embeddings = model(**batch)
  dataset.extend(list(torch.unbind(embeddings.to("cpu").to(torch.float32))))

### Convert User Query into Embeddings

In [None]:
query = ["What is the revenue from Google Cloud for 2023 and 2024 ?"]

batch_queries = processor.process_queries(query).to(model.device)
with torch.no_grad():
  query_embeddings = model(**batch_queries)

### Retriever: Find Best Matching Page
- Uses in-memroy late interaction

In [None]:
import numpy as np

def score(query_embedding, dataset):
  scores = processor.score_multi_vector(query_embedding, dataset)
  scores = np.array(scores)
  matched_pages = scores.flatten().argsort()[::-1]
  return scores, matched_pages

In [None]:
scores, matched_pages = score(list(torch.unbind(query_embeddings.to("cpu").to(torch.float32))),dataset)

In [None]:
scores, matched_pages

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(images[matched_pages[0]])
ax.axis("off")
plt.show()