feat(onnx): ViT zero-shot tasks #858

QIN2DIM · 2023-10-22T11:01:06Z

Intro

See the example code for details.

The CLIP multimodal model enables zero-shot image classification. I've tested this on multiple datasets and the model is over 99.9% accurate, as long as an appropriate prompt is provided.

We just need to write positive_labels and negative_labels based on the cue words of the known challenge (image_binary_challenge). If a new prompt is encountered that has never been processed before, the program automatically performs the conversion and adjustment for the dichotomous task.

We tried to reproduce the process module using numpy, i.e., we did not need to rely on PyTorch to implement the process.

By default, we use the RN50.openai specification of the model for classification tasks. We encapsulate the activation of both the ONNX and VitTransformer Pipeline branches so that the program switches automatically when you have both torch and transformers installed in your runtime environment and a CUDA GPU available. Otherwise, it defaults to using ONNX and running on a CPU.

hcaptcha-challenger/hcaptcha_challenger/onnx/modelhub.py

Lines 245 to 259 in 901afd1

    
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_RN50.openai.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_RN50.openai.onnx" 
        
               """ 
        
               Available Model 
        
               --- 1180+ MiB 
        
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_ViT-B-32.openai.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_ViT-B-32.openai.onnx" 
        
               --- 658.3 MiB 
        
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP_RN50.openai.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP_RN50.openai.onnx" 
        
               --- 3300+ MiB 
        
               DEFAULT_CLIP_VISUAL_MODEL: str = "visual_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.onnx" 
        
               DEFAULT_CLIP_TEXTUAL_MODEL: str = "textual_CLIP-ViT-L-14-DataComp.XL-s13B-b90K.onnx" 
        
               """

DEMO

"""
1. **positive_labels** can contain only the slashed prompt, i.e., the meaning specified by the prompt

2. **negative_labels** usually have multiple categories,

please observe the other labels in the 9 images and fill in the label_name

3. **positive_labels** can fill in more than one, when there is ambiguity in the prompt.

   For example, if the prompt asks to select a `vehicle`, but `car` and `airplane` appear in the task.

   You can fill in this: `positive_labels = ["vehicle", "car", "airplane"]`

4. Sometimes the prompt doesn't change, but its corresponding image group is replaced.
   If you observe this, update your `datalake_post` to do so!

5. If a prompt never appears, i.e. you don't update it to datalake, the program automatically disassembles the prompt
and adds simple antonyms to the mapping network to ensure that the binary classification task proceeds properly.

   This process works sometimes, but the correctness rate is obviously no better than the way you fill it out manually
"""
from hcaptcha_challenger import split_prompt_message, label_cleaning, DataLake


def handle(x): return split_prompt_message(label_cleaning(x), "en")


datalake_post = {
    # --> off-road vehicle
    handle("Please click each image containing an off-road vehicle"): {
        "positive_labels": ["off-road vehicle"],
        "negative_labels": ["car", "bicycle"],
    },
    # --> pair of headphones
    handle("Please click each image containing a pair of headphones"): {
        "positive_labels": ["headphones"],
        "negative_labels": ["car", "elephant", "cat"]
    },
    # --> item of office equipment
    handle("Please click each image containing an item of office equipment"): {
        "positive_labels": ["office equipment", "chair"],
        "negative_labels": ["shoes", "guitar", "drum", "musical instruments"]
    }
}


def common():
    from hcaptcha_challenger import ModelHub

    # ... Some of the operations you are familiar

    modelhub = ModelHub.from_github_repo()
    modelhub.parse_objects()

    print(f"Before {modelhub.datalake.keys()=}")

    # Merge the data. And use this modelhub object later
    for prompt, serialized_binary in datalake_post.items():
        modelhub.datalake[prompt] = DataLake.from_serialized(serialized_binary)

    print(f"After {modelhub.datalake.keys()=}\n")

    for prompt, dl in modelhub.datalake.items():
        print(f"{prompt=}")
        print(f"{dl=}\n")

    # ... Some of the operations you are familiar


if __name__ == '__main__':
    common()

hcaptcha-challenger/src/objects.yaml

Lines 553 to 574 in d38be1b

    
           datalake: 
        
             furniture: 
        
               positive_labels: 
        
               - furniture 
        
               negative_labels: 
        
               - headphones 
        
               - guitar 
        
               - game tool 
        
               - keyboard 
        
             off-road vehicle: 
        
               positive_labels: 
        
               - off-road vehicle 
        
               negative_labels: 
        
               - car 
        
               - bicycle 
        
             pair of headphones: 
        
               positive_labels: 
        
               - pair of headphones 
        
               negative_labels: 
        
               - elephant 
        
               - car 
        
               - cat

The text was updated successfully, but these errors were encountered:

QIN2DIM · 2023-10-25T18:42:47Z

https://github.com/QIN2DIM/awesome-clip-production

Preview Blog

Tutorials

Self Hosting

CLIP as service

Model Hub

Model Card

Benchmarks

https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark.png (Commits on Oct 17, 2022)
https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv (Commits on Oct 22, 2023)
https://github.com/huggingface/pytorch-image-models/tree/main/results (Commits on May 25, 2023)

Open-CLIP

https://github.com/mlfoundations/open_clip
https://laion.ai/blog/
Learning Transferable Visual Models From Natural Language Supervision [Submitted on 26 Feb 2021]
Reproducible scaling laws for contrastive language-image learning [Submitted on 14 Dec 2022]

EVA-CLIP

[Submitted on 27 Mar 2023]

DINOv2

[Submitted on 14 Apr 2023]

Datasets

LAION-400M

[Submitted on 3 Nov 2021]

https://arxiv.org/abs/2111.02114

LAION-2B

[Submitted on 16 Oct 2022]

https://arxiv.org/abs/2210.08402

DataComp

[Submitted on 27 Apr 2023 (v1), last revised 25 Jul 2023 (this version, v4)]

demo

import torch

torch.onnx.export(
    model,  # model being run
    # model input in one of acceptable format: torch.Tensor (for single input), tuple or list of tensors for multiple inputs or dictionary with string keys and tensors as values.
    dict(inputs),
    "clip-vit-base-patch16.onnx",  # where to save the model
    opset_version=14,  # the ONNX version to export the model to
    input_names=["input_ids", "pixel_values", "attention_mask"],  # the model's input names
    output_names=["logits_per_image", "logits_per_text", "text_embeds", "image_embeds"],  # the model's output names
    dynamic_axes={  # variable length axes
        "input_ids": {0: "batch", 1: "sequence"},
        "pixel_values": {0: "batch", 1: "num_channels", 2: "height", 3: "width"},
        "attention_mask": {0: "batch", 1: "sequence"},
        "logits_per_image": {0: "batch"},
        "logits_per_text": {0: "batch"},
        "text_embeds": {0: "batch"},
        "image_embeds": {0: "batch"}
    }
)

QIN2DIM added feature 新特性或新需求 🦜 blog labels Oct 22, 2023

QIN2DIM changed the title ~~feat(onnx): CLIP-ViT zero-shot image classification~~ feat(onnx): ViT zero-shot tasks Oct 22, 2023

QIN2DIM added the fixed BUG 已修复或问题已解决 label Oct 25, 2023

QIN2DIM linked a pull request Oct 25, 2023 that will close this issue

feat(onnx): ViT zero-shot image classification #863

Merged

QIN2DIM added this to the CLIP-as-service milestone Oct 25, 2023

QIN2DIM linked a pull request Oct 25, 2023 that will close this issue

feat(control): unsupervised binary challenge #865

Merged

QIN2DIM closed this as completed in #865 Oct 25, 2023

QIN2DIM mentioned this issue Oct 28, 2023

⚡⚡ CLIP - Retraining topics #892

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(onnx): ViT zero-shot tasks #858

feat(onnx): ViT zero-shot tasks #858

QIN2DIM commented Oct 22, 2023 •

edited

QIN2DIM commented Oct 25, 2023

feat(onnx): ViT zero-shot tasks #858

feat(onnx): ViT zero-shot tasks #858

Comments

QIN2DIM commented Oct 22, 2023 • edited

Intro

DEMO

QIN2DIM commented Oct 25, 2023

Preview Blog

Tutorials

Self Hosting

Model Hub

Model Card

Benchmarks

Open-CLIP

EVA-CLIP

DINOv2

Datasets

LAION-400M

LAION-2B

DataComp

demo

QIN2DIM commented Oct 22, 2023 •

edited