# CLIP ViT-B/32 Transformer (from ER) 

## Imports

In [None]:
import json
import os
import requests

import wallaroo
from wallaroo.pipeline   import Pipeline
from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework

import pyarrow as pa
import numpy as np
import pandas as pd

from PIL import Image

wl = wallaroo.Client(auth_type="sso", interactive=True)

## Configure & Upload Model

### Save 🤗 Hugging Face pipeline locally

We will use the `openai/clip-vit-base-patch32` model for the `zero-shot-image-classification` pipeline task from the official `🤗 Hugging Face` [hub](https://huggingface.co/openai/clip-vit-base-patch32). The model can be found in model zoo [here](https://storage.cloud.google.com/wallaroo-model-zoo/model-auto-conversion/hugging-face/dummy-pipelines/zero-shot-image-classification-pipeline.zip?authuser=0).

You can create a `zero-shot-image-classification` pipeline with the aforementioned model and save it locally as follows:

```python
from transformers import pipeline

pipe = pipeline(
    SupportedTasks.ZERO_SHOT_IMAGE_CLASSIFICATION,
    model="openai/clip-vit-base-patch32",
    device=0 if torch.cuda.is_available() else -1,
)
pipe.save_pretrained("clip-vit-base-patch-32/")
```

> **Important:** You also have to download [pipeline_config.json](https://huggingface.co/openai/clip-vit-base-patch32/blob/main/preprocessor_config.json) and place it inside the same directory. That's an issue coming from `Hugging Face 🤗`, that's not able to load the file properly.

As a last step, you have to `zip` the saved pipeline as follows:

```bash
zip -r clip-vit-base-patch-32.zip clip-vit-base-patch-32/
```

### Get Framework for the `zero-shot-image-classification` pipeline

Let's see what frameworks are supported via the `Framework` Enum:

In [2]:
[e.value for e in Framework]

['onnx',
 'tensorflow',
 'python',
 'keras',
 'sklearn',
 'pytorch',
 'xgboost',
 'hugging-face-feature-extraction',
 'hugging-face-image-classification',
 'hugging-face-image-segmentation',
 'hugging-face-image-to-text',
 'hugging-face-object-detection',
 'hugging-face-question-answering',
 'hugging-face-stable-diffusion-text-2-img',
 'hugging-face-summarization',
 'hugging-face-text-classification',
 'hugging-face-translation',
 'hugging-face-zero-shot-classification',
 'hugging-face-zero-shot-image-classification',
 'hugging-face-zero-shot-object-detection',
 'hugging-face-sentiment-analysis',
 'hugging-face-text-generation',
 'custom']

The appropriate one for the `zero-shot-image-classification` pipeline is the following:

In [3]:
Framework.HUGGING_FACE_ZERO_SHOT_IMAGE_CLASSIFICATION

<Framework.HUGGING_FACE_ZERO_SHOT_IMAGE_CLASSIFICATION: 'hugging-face-zero-shot-image-classification'>

### Configure PyArrow Schema

You can find more info on the available inputs under the [official source code](https://github.com/huggingface/transformers/blob/v4.28.1/src/transformers/pipelines/zero_shot_image_classification.py#L78) from `🤗 Hugging Face`.

> ⚠️ Every extra input specified in the schema will raise an error when running inference.

In [4]:
input_schema = pa.schema([
    pa.field('inputs', # required, fixed image dimensions
        pa.list_(
            pa.list_(
                pa.list_(
                    pa.int64(),
                    list_size=3
                ),
                list_size=640 
            ),
        list_size=480
    )),
    pa.field('candidate_labels', pa.list_(pa.string(), list_size=4)), # required, equivalent to `options` in the provided demo
]) 

output_schema = pa.schema([
    pa.field('score', pa.list_(pa.float64(), list_size=4)), # has to be same as number of candidate labels
    pa.field('label', pa.list_(pa.string(), list_size=4)), # has to be same as number of candidate labels
])

### Upload Model

In [5]:
model = wl.upload_model('er-clip-vit', 'clip-vit-base-patch-32.zip', framework=Framework.HUGGING_FACE_ZERO_SHOT_IMAGE_CLASSIFICATION, input_schema=input_schema, output_schema=output_schema)
model

Waiting for model conversion... It may take up to 10.0min.
Model is Pending conversion..Converting..............Ready.


0,1
Name,er-clip-vit
Version,4564b16e-bab6-4c12-a014-6ff042051f1a
File Name,clip-vit-base-patch-32.zip
SHA,4efc24685a14e1682301cc0085b9db931aeb5f3f8247854bedc6863275ed0646
Status,ready
Image Path,proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mlflow-deploy:v2023.2.1-3530
Updated At,2023-25-Oct 13:36:45


## Deploy Pipeline

In [8]:
deployment_config = wallaroo.DeploymentConfigBuilder() \
    .cpus(.25).memory('1Gi') \
    .sidekick_memory(model, '4Gi') \
    .sidekick_cpus(model, 1.) \
    .build()

In [9]:
pipeline_name = "er-clip-vit-pipeline-new"
pipeline = wl.build_pipeline(pipeline_name)
pipeline.add_model_step(model)

pipeline.deploy(deployment_config=deployment_config)
pipeline.status()

Waiting for deployment - this will take up to 90s ......................... ok


{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.3.53',
   'name': 'engine-78f6bdb9fd-ml6ld',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'er-clip-vit-pipeline-new',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'er-clip-vit',
      'version': '4564b16e-bab6-4c12-a014-6ff042051f1a',
      'sha': '4efc24685a14e1682301cc0085b9db931aeb5f3f8247854bedc6863275ed0646',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.4.61',
   'name': 'engine-lb-584f54c899-5clf5',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.244.4.60',
   'name': 'engine-sidekick-er-clip-vit-25-7bcff8d494-8986c',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

## Run inference

In [10]:
pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.3.53',
   'name': 'engine-78f6bdb9fd-ml6ld',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'er-clip-vit-pipeline-new',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'er-clip-vit',
      'version': '4564b16e-bab6-4c12-a014-6ff042051f1a',
      'sha': '4efc24685a14e1682301cc0085b9db931aeb5f3f8247854bedc6863275ed0646',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.4.61',
   'name': 'engine-lb-584f54c899-5clf5',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.244.4.60',
   'name': 'engine-sidekick-er-clip-vit-25-7bcff8d494-8986c',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

In [11]:
image_urls = [
    "https://farm6.staticflickr.com/5200/5893309516_d22a116a65_z.jpg",
    "http://images.cocodataset.org/val2017/000000039769.jpg",
    "https://farm4.staticflickr.com/3726/9780496575_ec5d9c0e4f_z.jpg",
    "https://farm5.staticflickr.com/4021/4548948723_ab46d70f85_z.jpg",
    "https://farm1.staticflickr.com/162/342939460_6a7744c3c2_z.jpg"
]
images = []

for iu in image_urls:
    image = Image.open(requests.get(iu, stream=True).raw)
    image = image.resize((640, 480)) # fixed image dimensions
    images.append(np.array(image))

dataframe = pd.DataFrame({"images": images})

In [12]:
input_data = {
        "inputs": images,
        "candidate_labels": [["cat", "dog", "horse", "elephant"]] * 5,
}
dataframe = pd.DataFrame(input_data)
dataframe

Unnamed: 0,inputs,candidate_labels
0,"[[[177, 177, 177], [177, 177, 177], [177, 177,...","[cat, dog, horse, elephant]"
1,"[[[140, 25, 56], [144, 25, 67], [146, 24, 73],...","[cat, dog, horse, elephant]"
2,"[[[228, 235, 241], [229, 236, 242], [230, 237,...","[cat, dog, horse, elephant]"
3,"[[[60, 62, 61], [62, 64, 63], [67, 69, 68], [7...","[cat, dog, horse, elephant]"
4,"[[[24, 20, 11], [22, 18, 9], [18, 14, 5], [21,...","[cat, dog, horse, elephant]"


In [13]:
pipeline.infer(dataframe,timeout=600,dataset=["in", "out", "metadata.elapsed", "time", "check_failures"])

Unnamed: 0,time,in.candidate_labels,in.inputs,out.label,out.score,check_failures,metadata.elapsed
0,2023-10-25 13:39:28.824,"[cat, dog, horse, elephant]","[177, 177, 177, 177, 177, 177, 177, 177, 177, ...","[horse, dog, elephant, cat]","[0.7596803307533264, 0.21711139380931854, 0.02...",0,"[1854798121, 4294967295]"
1,2023-10-25 13:39:28.824,"[cat, dog, horse, elephant]","[140, 25, 56, 144, 25, 67, 146, 24, 73, 142, 1...","[cat, dog, elephant, horse]","[0.9870228171348572, 0.00664688041433692, 0.00...",0,"[1854798121, 4294967295]"
2,2023-10-25 13:39:28.824,"[cat, dog, horse, elephant]","[228, 235, 241, 229, 236, 242, 230, 237, 243, ...","[elephant, horse, dog, cat]","[0.9981434345245361, 0.001765866531059146, 6.8...",0,"[1854798121, 4294967295]"
3,2023-10-25 13:39:28.824,"[cat, dog, horse, elephant]","[60, 62, 61, 62, 64, 63, 67, 69, 68, 72, 74, 7...","[elephant, dog, horse, cat]","[0.41468727588653564, 0.3483794331550598, 0.12...",0,"[1854798121, 4294967295]"
4,2023-10-25 13:39:28.824,"[cat, dog, horse, elephant]","[24, 20, 11, 22, 18, 9, 18, 14, 5, 21, 17, 8, ...","[dog, horse, cat, elephant]","[0.5713930130004883, 0.1722952425479889, 0.155...",0,"[1854798121, 4294967295]"


## Undeploy Pipelines

In [9]:
for pipeline in wl.list_pipelines():
    pipeline.undeploy()

Waiting for undeployment - this will take up to 45s ..................................... ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
 ok
