<a href="https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/clip_onnx_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Restart colab session after installation
Reload the session if something doesn't work

In [1]:
%%capture
!pip install git+https://github.com/Lednik7/CLIP-ONNX.git
!pip install git+https://github.com/openai/CLIP.git
!pip install onnxruntime-gpu

In [2]:
%%capture
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

In [3]:
!nvidia-smi

Thu Jan  6 16:36:44 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
import onnxruntime
print(onnxruntime.get_device())

## CPU inference mode

### Torch CLIP

In [1]:
import clip
from PIL import Image
import numpy as np

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)

# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
image_onnx = image.detach().cpu().numpy().astype(np.float32)

# batch first
text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
text_onnx = text.detach().cpu().numpy().astype(np.int64)

In [2]:
%timeit model(image, text)

1 loop, best of 5: 636 ms per loop


### CLIP-ONNX

In [2]:
from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

onnx_model = clip_onnx(model)
onnx_model.convert2onnx(image, text, verbose=True)
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model.start_sessions(providers=["CPUExecutionProvider"]) # cpu mode

[CLIP ONNX] Start convert visual model


  head_dim = q.shape[2] // num_heads


[CLIP ONNX] Start check visual model
[CLIP ONNX] Start convert textual model


  "If indices include negative values, the exported graph will produce incorrect results.")


[CLIP ONNX] Start check textual model
[CLIP ONNX] Models converts successfully


In [5]:
%timeit onnx_model(image_onnx, text_onnx)

1 loop, best of 5: 550 ms per loop


## GPU inference mode
Select a runtime GPU to continue:

Click Runtime -> Change Runtime Type -> switch "Harware accelerator" to be GPU. Save it, and you maybe connect to GPU

### CLIP-ONNX

In [6]:
onnx_model.start_sessions(providers=["CUDAExecutionProvider"]) # GPU mode

In [7]:
onnx_model.visual_session.get_providers() # optional

['CUDAExecutionProvider', 'CPUExecutionProvider']

In [9]:
%timeit onnx_model(image_onnx, text_onnx)

The slowest run took 79.70 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 5: 60.8 ms per loop


### Torch CLIP

In [10]:
import clip
from PIL import Image

device = "cuda"
# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device=device, jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device) # [3, 77]

In [11]:
%timeit model(image, text)

10 loops, best of 5: 72.2 ms per loop
