# ðŸ¤– Project: Visual Question Answering (VQA) System
This notebook demonstrates how to use a pretrained BLIP model for Visual Question Answering. We input an image and a question, and the model predicts the answer.

In [None]:
!pip install transformers accelerate timm Pillow

In [None]:
from transformers import BlipProcessor, BlipForQuestionAnswering
from PIL import Image
import requests
import torch

In [None]:
# Load BLIP processor and model
processor = BlipProcessor.from_pretrained('Salesforce/blip-vqa-base')
model = BlipForQuestionAnswering.from_pretrained('Salesforce/blip-vqa-base')
model.eval()

In [None]:
# Load an image from URL
img_url = 'https://raw.githubusercontent.com/salesforce/BLIP/main/demo/BLIP_demo.jpg'
image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
image.show()

In [None]:
# Define a question
question = 'What is the man doing?'
inputs = processor(image, question, return_tensors='pt')

In [None]:
# Get the answer
with torch.no_grad():
    out = model(**inputs)
    answer = processor.tokenizer.decode(out.logits.argmax(-1).squeeze(), skip_special_tokens=True)
    print(f'Q: {question}\nA: {answer}')

## âœ… Summary
We used a BLIP model from Hugging Face to perform VQA. You can further extend this by:
- Using your own dataset with custom Q&A.
- Creating a Flask or Gradio app interface.
- Combining with OCR or multilingual support.