# HuggingFace Pretrained CLIP Inference on Inf2

## Introduction

This notebook demonstrates how to compile and run a HuggingFace 🤗 Contrastive Language-Image Pretraining (CLIP) model for accelerated inference on Neuron. This notebook will use the [`openai/clip-vit-large-patch14`](https://huggingface.co/openai/clip-vit-large-patch14) model.

This Jupyter notebook should be run on an Inf2 or Trn1 instance, of size Inf2.8xlarge or Trn1.2xlarge or larger.

Note: for deployment, it is recommended to pre-compile the model on a compute instance using `torch_neuronx.trace()`, save the compiled model as a `.pt` file, and then distribute the `.pt` to Inf2.xlarge instances for inference.

Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/torch-neuronx.html#setup-torch-neuronx). You can select the kernel from the 'Kernel -> Change Kernel' option on the top of this Jupyter notebook page.

## Install Dependencies
This tutorial requires the following pip packages:

- `torch-neuronx`
- `neuronx-cc`
- `transformers`

Most of these packages will be installed when configuring your environment using the Trn1 setup guide. The additional dependencies must be installed here:

In [None]:
%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect
# torchvision version pinned to avoid pulling in torch 2.0
!pip install -U transformers opencv-python Pillow

## Compile the model into an AWS Neuron optimized TorchScript

In the following section, we load the model, and input preprocessor, get a sample input, run inference on CPU, compile the model for Neuron using `torch_neuronx.trace()`, and save the optimized model as `TorchScript`.

`torch_neuronx.trace()` expects a tensor or tuple of tensor inputs to use for tracing, so we unpack the input preprocessor's output. Additionally, the input shape that's used during compilation must match the input shape that's used during inference.

In [None]:
import os

import torch
import torch_neuronx
from transformers import CLIPProcessor, CLIPModel
from torchvision.datasets import CIFAR100

model_name = 'openai/clip-vit-large-patch14'

# Create the input preprocessor and model
processor = CLIPProcessor.from_pretrained(model_name)
model = CLIPModel.from_pretrained(model_name, return_dict=False)
model.eval()

# Get text captions for the model to classify the image against
cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)
text = []
# Classify the image against the first 100 classes of CIFAR100
for i in range(0, 100):
    text.append(f'a photo of a {cifar100.classes[i]}')

# Get an example input
image = cifar100[0][0]

inputs = processor(text=text, images=image, return_tensors="pt", padding=True)

example = (inputs['input_ids'], inputs['pixel_values'])

# Run inference on CPU
output_cpu = model(*example)

# Compile the model
model_neuron = torch_neuronx.trace(model, example, compiler_args='--enable-saturate-infinity')

# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)

## Run inference and compare results

In this section we load the compiled model, run inference on Neuron, and compare the CPU and Neuron outputs.

In [None]:
# Load the TorchScript compiled model
model_neuron = torch.jit.load(filename)

# Run inference using the Neuron model
output_neuron = model_neuron(*example)

# Compare the results
cpu_top5 = output_cpu[0][0].softmax(dim=-1).topk(5)
neuron_top5 = output_neuron[0][0].softmax(dim=-1).topk(5)

print('CPU top 5 classifications')
for value, index in zip(cpu_top5[0], cpu_top5[1]):
    print(f"{cifar100.classes[index.item()]:>16s}: {100 * value.item():.2f}%")

print('Neuron top 5 classifications')
for value, index in zip(neuron_top5[0], neuron_top5[1]):
    print(f"{cifar100.classes[index.item()]:>16s}: {100 * value.item():.2f}%")