### pipeline example: Visual question answering `vqa`

![](./static/multi-modal/visual-question-answering.png)

> answer a question about the image, given an image and a question
>
> https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.VisualQuestionAnsweringPipeline

In [1]:
from transformers import pipeline
my_visual_question_answerer = pipeline(model="dandelin/vilt-b32-finetuned-vqa")

my_visual_question_answerer.task

  from .autonotebook import tqdm as notebook_tqdm
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


'visual-question-answering'

![](./output/astronaut_rides_horse.png)

In [2]:
# 

my_visual_question_answerer(question="What animal is the person riding?", image="./output/astronaut_rides_horse.png") 

[{'score': 0.9961353540420532, 'answer': 'horse'},
 {'score': 0.011806492693722248, 'answer': 'pony'},
 {'score': 0.011430073529481888, 'answer': 'yes'},
 {'score': 0.002674927469342947, 'answer': 'donkey'},
 {'score': 0.0014789114939048886, 'answer': 'human'}]

![](./traffic-captcha/full.png)

In [3]:
# traffic cam captcha example

for i in range(1, 17):
    result = my_visual_question_answerer(question="Does this image contain a traffic light?", image=f"./traffic-captcha/square-{i}.png")
    answer = result[0]["answer"]
    score = round(result[0]["score"] * 100, 2)

    if answer == "yes":
        print(f"square-{i}.png - ({score}% confidence)")


square-5.png - (99.91% confidence)
square-6.png - (99.91% confidence)
square-7.png - (76.56% confidence)
square-14.png - (86.83% confidence)
