# Hugging Face Multimodel Inference (Visual question answering) with vilt-b32-finetuned-vqa


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

---




## Overview

This notebook demonstrates how to deploy and run inference for Hugging Face Multimodal vilt-b32-finetuned-vqa for visual question answering on Amazon SageMaker.


Visual Question Answering (VQA) is a task where a model answers questions about an image. The input consists of an image and a textual question about the image. The output is the model's answer to the question, bridging the gap between computer vision and natural language understanding.

Vilt-b32-finetuned-vqa is a Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2.
Please visit the model card on HuggingFace [here](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) for more information.


## Setup

### Install or update the SageMaker Python SDK
First, we need to make sure the latest version of the SageMaker Python SDK is installed.

In [None]:
!pip install --upgrade pip --quiet
!pip install "sagemaker>=2.48.0" --upgrade

### Setup Python Modules and roles
Then, we import the SageMaker python SDK and instantiate a `sagemaker_session` which we use to determine the current region and execution role.

In [None]:
from datetime import datetime
from pathlib import Path
from uuid import uuid4

import sagemaker
import boto3
import json
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
from botocore.exceptions import NoCredentialsError

session = sagemaker.Session()
role = sagemaker.get_execution_role()
s3 = boto3.client("s3")
runtime = boto3.client("sagemaker-runtime")
region = session.boto_region_name

print(f"Role: {role}")
print(f"Region: {region}")

## Create the Hugging Face model
Next we configure the `HuggingFaceModel` object by specifying a unique model name, transformers_version, pytorch_version, py_version, and the execution role for the endpoint. Additionally, we specify some environment variables including the `HF_MODEL_ID` which corresponds to the model in the HuggingFace Hub, and the `HF_TASK` which configures the inference task to be performed.

In [None]:
HF_MODEL_ID = "dandelin/vilt-b32-finetuned-vqa"
model_name = str(Path(HF_MODEL_ID.split("/")[-1]))
suffix = f"{str(uuid4())[:5]}-{datetime.now().strftime('%d%b%Y')}"

# define model name, endpoint_name
model_name = f"{model_name}-{suffix}"
endpoint_name = model_name

hub = {"HF_MODEL_ID": HF_MODEL_ID, "HF_TASK": "visual-question-answering"}

huggingface_model = HuggingFaceModel(
    name=model_name,
    transformers_version="4.26.0",
    pytorch_version="1.13.1",
    py_version="py39",
    env=hub,
    role=role,
)

## Creating a SageMaker Endpoint
Next we deploy the model by invoking the deploy() function. Here we use a ml.m5.xlarge instance with 4 vCPUs and 16 GiB of memory. 

In [None]:
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=endpoint_name,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
)

## Run Inference
To run inference for visual question answering model, we first need to prepare the input for inference. The input consists of an image and a question (text string). The image can be stored in S3 and supplied through S3 presigned url.

Please replace BUCKET_NAME, IMAGE_NAME, QUESTION_INPUT with your input S3 bucket, image name, and question.

In [None]:
try:
    signed_url = s3.generate_presigned_url(
        "get_object",
        Params={"Bucket": "BUCKET_NAME", "Key": "IMAGE_NAME"},
        ExpiresIn=3600,
    )
except NoCredentialsError:
    print("Credentials not available")

input_data = {"inputs": {"image": signed_url, "question": "QUESTION_INPUT"}}

Next we can call the Sagemaker endpoint we created in this notebook, and provide the image url and question for inference.

In [None]:
result = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(input_data),
)

inference_result = json.loads(result["Body"].read().decode())
if inference_result:
    print("Inference Result:")
    print(json.dumps(inference_result, indent=2))
else:
    print("Failed to get inference result.")

## Cleanup
After you've finished testing the endpoint, it's important to delete the `model` and `endpoint` resources to avoid incurring charges.

In [None]:
try:
    print(f"Deleting model: {model_name}")
    predictor.delete_model()
except Exception as e:
    print(f"{e}")

try:
    print(f"Deleting endpoint: {endpoint_name}")
    predictor.delete_endpoint()
except Exception as e:
    print(f"{e}")

## Conclusion

In this tutorial, we deployed a Hugging Face Multimodal vilt-b32-finetuned-vqa to an Amazon SageMaker real-time endpoint. 

With SageMaker Hosting, you can easily host Multimodal and run inference.

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.


![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/inference|generativeai|huggingface-multimodal|vilt-b32-finetuned-vqa.ipynb)