# CLIP Multimodal RAG — Demo

This notebook runs the **multimodal RAG** pipeline from the `clip_multimodal_rag` package: it indexes a PDF (text + images) with **CLIP**, retrieves relevant chunks with **FAISS**, and generates answers with **Google Gemini**.

Run the cells in order. Ensure `.env` contains `GOOGLE_API_KEY` and the PDF path below exists.




In [1]:
# Ensure package is importable when notebook is run from project root
import sys
from pathlib import Path
root = Path.cwd()
if str(root / "src") not in sys.path:
    sys.path.insert(0, str(root / "src"))

from dotenv import load_dotenv
load_dotenv()

from clip_multimodal_rag import (
    CLIPEmbedder,
    PDFProcessor,
    MultimodalRetriever,
    MultimodalRAGPipeline,
)
from clip_multimodal_rag.config import get_google_api_key, CLIP_MODEL_ID, GEMINI_MODEL

  from .autonotebook import tqdm as notebook_tqdm


True

In [2]:
# Require Google API key for Gemini
get_google_api_key()
print("GOOGLE_API_KEY found.")

In [3]:
# Load CLIP and process PDF
embedder = CLIPEmbedder(model_id=CLIP_MODEL_ID)
processor = PDFProcessor(chunk_size=500, chunk_overlap=100, embedder=embedder)

pdf_path = "CV_HZolfaghari_new.pdf"  # Change to your PDF
docs, embeddings, image_store = processor.process(pdf_path)
print(f"Indexed {len(docs)} chunks (text + images).")

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


CLIP model loaded!


In [4]:
# Build retriever and RAG pipeline
retriever = MultimodalRetriever(embedder, docs, embeddings)
pipeline = MultimodalRAGPipeline(
    embedder, retriever, image_store,
    gemini_model=GEMINI_MODEL,
    top_k=5,
)
print("Pipeline ready. Use pipeline.query('Your question') to run RAG.")

In [5]:
# Inspect indexed documents (optional)
print(f"Sample text chunk: {docs[0].page_content[:150]}...")
print(f"Image store keys: {list(image_store.keys())[:5]}...")

In [6]:
# Example: single query
answer = pipeline.query("Summarize the main findings from the document.")
print("Answer:", answer)

Document('CV_HZolfaghari_new.pdf')

In [7]:
# Run multiple example queries
queries = [
    "Summarize the main findings from the document",
    "What visual elements are present in the document?",
]
for q in queries:
    print("Query:", q)
    print("-" * 50)
    print(pipeline.query(q, verbose=True))
    print("=" * 70)

[Document(metadata={'page': 0, 'type': 'text'}, page_content='Hossein Zolfaghari\nAI/ML Engineer\nParis, France\nOpen to relocation\n\x83 (+33) 0753142705\n# hossein.xolf@gmail.com\nï LinkedIn\n§ GitHub\nAbout Me\nMachine Learning Engineer with 5+ years of experience building and deploying AI solutions across cloud and edge\ndevices. I specialize in multimodal deep learning, Generative AI, and MLOps, with a strong focus on delivering models\nthat reliably move from prototype to production. I enjoy designing scalable pipelines, improving model performance, and'),
 Document(metadata={'page': 0, 'type': 'text'}, page_content='collaborating with teams to turn complex ideas into practical, high-impact systems. I stay aligned with the latest AI\nadvancements while keeping solutions simple, efficient, and production-ready.\nWork Experiences\n• Machine Learning Engineer\n2023 – 2025\nSESA\nPerpignan, France\n– Developed a hybrid multimodal machine learning model for large time-series datasets 

In [8]:
# Add your own queries here
# pipeline.query("Your question here")

In [9]:
# Optional: save/load index (see package docs for persistence)
# retriever._store.save_local("index"); FAISS.load_local("index", _FakeEmbeddings(...))

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


In [14]:
# Done. Use pipeline.query("...") for more questions.


Query: Summarize the main findings from the document
--------------------------------------------------

Retrieved 5 documents:
  - Text from page 1: Languages
• Languages: English: Fluent, French: Intermediate, Persian: Native
  - Text from page 1: • AWS | 2022, AWS Cloud Practitioner.
• Udemy | 2020, Data Science and ML in Python.
• Coursera | 20...
  - Text from page 1: Techincal Skills
• Generative AI & LLMs: Transformers, Multimodal
AI, Agentic AI, LangChain, RAG, Cr...
  - Text from page 0: collaborating with teams to turn complex ideas into practical, high-impact systems. I stay aligned w...
  - Text from page 0: feature engineering. The approach improved forecast accuracy by 5% over the previous model.
– Create...


Answer: The document describes an experienced Machine Learning and AI Engineer with a strong background in developing and deploying high-impact, production-ready AI systems.

Key findings include:
*   **Extensive Technical Expertise:** Proficient in Generative AI, 