# RAG ENHANCED VISUAL QUESTION ANSWERING

This notebook is meant for educational purposes only.
This notebook accompanies the blog post A Simple Framework for RAG Enhanced Visual Question Answering. Refer to the post and the code for further information.




## Install and Import

In [1]:
!pip install git+https://github.com/GabrieleSgroi/vision_rag

Collecting git+https://github.com/GabrieleSgroi/vision_rag
  Cloning https://github.com/GabrieleSgroi/vision_rag to /tmp/pip-req-build-xjs2nxe8
  Running command git clone --filter=blob:none --quiet https://github.com/GabrieleSgroi/vision_rag /tmp/pip-req-build-xjs2nxe8
  Resolved https://github.com/GabrieleSgroi/vision_rag to commit b41c851e2b68b14e4dbfca0660b9167c88835d5f
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: vision_rag
  Building wheel for vision_rag (pyproject.toml) ... [?25l[?25hdone
  Created wheel for vision_rag: filename=vision_rag-0.0.1-py3-none-any.whl size=6056 sha256=161c712a6896df8a5c23cb8c46775f7436e262a230bf152fcc0ac1c45f52ceb8
  Stored in directory: /tmp/pip-ephem-wheel-cache-01hnvp8_/wheels/4c/c9/34/5cb41ca272f53005346a064602a613689acd4477a1e13c5091
Successfully built vision_rag
Installing collect

In [2]:
from vrag.pipeline import VisualQuestionRAG
from vrag.wiki_retrieval import WikipediaRetriever
from vrag.cfg import EmbeddingModelCfg
from vrag.cfg import RetrievalCfg, KeywordGenerationCfg, EmbeddingModelCfg

In [3]:
from vrag.cfg import ModelCfg
from vrag.cfg import EmbeddingModelCfg

cfg = ModelCfg(model="microsoft/Phi-3.5-vision-instruct",
               model_kwargs={'trust_remote_code':True, '_attn_implementation': 'eager'},
               processor_kwargs={'trust_remote_code':True, 'num_crops':16},
               device_map="auto")

## Instantiate the model

In [4]:
retriever = WikipediaRetriever()
model = VisualQuestionRAG(retriever, cfg=cfg)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# Generation

In [5]:
question = "How much does this species weight?"
img = "https://images.unsplash.com/photo-1589656966895-2f33e7653819?utm_medium=medium&w=700&q=50&auto=format"

In [6]:
answer, keywords, selected_passages = model(question=question, img_or_img_path=img)

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


Model answer

In [7]:
answer

'The weight of this species, the polar bear, ranges from 300-800 kg (660-1,760 lb) for males and 150-300 kg (330-660 lb) for females.'

Search keywords


In [8]:
keywords

'polar bear, weight'

Retrieved passages

In [9]:
for p in selected_passages:
  print(f"""From page {p.page} ({p.url}): \n "{p.passage}".""")

From page Polar bear (https://en.wikipedia.org/wiki/Polar_bear): 
 "Males are generally 200–250 cm (6.6–8.2 ft) long with a weight of 300–800 kg (660–1,760 lb). Females are smaller at 180–200 cm (5.9–6.6 ft) with a weight of 150–300 kg (330–660 lb). Sexual dimorphism in the species is particularly high compared with most other mammals. Male polar bears also have".
From page Polar bear (https://en.wikipedia.org/wiki/Polar_bear): 
 "== Notes ==


== References ==


== Bibliography ==


== External links ==
Polar Bears International website
ARKive—images and movies of the polar bear (Ursus maritimus)".
From page Polar bear (https://en.wikipedia.org/wiki/Polar_bear): 
 "weight of 150–300 kg (330–660 lb). Sexual dimorphism in the species is particularly high compared with most other mammals. Male polar bears also have proportionally larger heads than females. The weight of polar bears fluctuates during the year, as they can bulk up on fat and increase their mass by".
From page List of ursid

## Compare with base version of the model

In [10]:
from transformers.image_utils import load_image


model.generate(question, load_image(img), max_new_tokens=128)

'Polar bears can weigh between 900 to 1,600 pounds (408 to 727 kilograms).'