# HuggingFace Pretrained ViT Inference on Trn1

## Introduction

This notebook demonstrates how to compile and run a HuggingFace 🤗 Google Vision Transformer (ViT) model for accelerated inference on Neuron. This notebook will use the [`google/vit-base-patch16-224`](https://huggingface.co/google/vit-base-patch16-224) model, which is primarily used for arbitrary image classification tasks.

This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger).

## Install Dependencies
This tutorial requires the following pip packages:

- `torch-neuronx`
- `neuronx-cc`
- `transformers`

Most of these packages will be installed when configuring your environment using the Trn1 setup guide. The additional dependencies must be installed here:

In [None]:
!pip install -U transformers

## Compile the model into an AWS Neuron optimized TorchScript

In the following section, we load the model and feature extractor, get s sample input, run inference on CPU, compile the model for Neuron using `torch_neuronx.trace()` and save the optimized model as `TorchScript`.

`torch_neuronx.trace()` expects a tensor or tuple of tensor inputs to use for tracing, so we unpack the feature extractor output. Additionally, the input shape that's used duing compilation must match the input shape that's used during inference. To handle this, we pad the inputs to the maximum size that we will see during inference.

In [None]:
from PIL import Image
import requests

import torch
import torch_neuronx
from transformers import ViTFeatureExtractor, ViTForImageClassification

# Create the feature extractor and model
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224', torchscript=True)
model.eval()

# Get an example input
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")

example = (inputs['pixel_values'],)

# Run inference on CPU
output_cpu = model(*example)

# Compile the model
model_neuron = torch_neuronx.trace(model, example)

# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)

## Run inference and compare results

In this section we load the compiled model, run inference on Neuron, and compare the CPU and Neuron outputs.

In [None]:
# Load the TorchScript compiled model
model_neuron = torch.jit.load(filename)

# Run inference using the Neuron model
output_neuron = model_neuron(*example)

# Compare the results
print(f"CPU tensor:            {output_cpu[0][0][0:10]}")
print(f"Neuron tensor:         {output_neuron[0][0][0:10]}")
print(f"CPU classification:    {model.config.id2label[output_cpu[0].argmax(-1).item()]}")
print(f"Neuron classification: {model.config.id2label[output_neuron[0].argmax(-1).item()]}")