ORIGINAL README.md:

Changes:

Setting e.g. clipmodel = "ViT-L/14@336px" in demo.py now works -> auto input_dims variable
Small change to clip_model.py to accept this variable
In clip.py, bypass SHA256 checksum verification -> You can put your fine-tune in place of .cache/clip/<original_model>.pt
Include model.py from OpenAI/clip -> config for fine-tuned torch.save .pt files w/o inbuilt model config
Save plots rather than using plt.show()
⚠️ Note: No changes made to demo.ipynb - use demo.py from the console!

Advanced:

Use runall.py (type "python runall.py --help"). Will:
Batch process images + perform CLIP Surgery in a fully automated way:
1. Gets some CLIP opinions in gradient ascent -> model's own texts (labels) about the images.
1. Performs CLIP Surgery with whatever CLIP "saw" in the images.
⚠️ You can use large models, but from CLIP ViT-L/14 on, will require >24 GB memory.
FUN: After [above], run FUN_word-world-records.py to get a list of CLIP's craziest predicted longwords.
ℹ️ Requires: Original OpenAI/CLIP "pip install git+https://github.com/openai/CLIP.git"

Original CLIP Gradient Ascent Script: Used with permission by Twitter / X: @advadnoun
CLIP 'opinions' may contain biased rants, slurs, and profanity. For more information, refer to the CLIP Model Card.

ORIGINAL README.md:

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks (arxiv)

Introduction

This work focuses on the explainability of CLIP via its raw predictions. We identify two problems about CLIP's explainability: opposite visualization and noisy activations. Then we propose the CLIP Surgery, which does not require any fine-tuning or additional supervision. It greatly improves the explainability of CLIP, and enhances downstream open-vocabulary tasks such as multi-label recognition, semantic segmentation, interactive segmentation (specifically the Segment Anything Model, SAM), and multimodal visualization. Currently, we offer a simple demo for interpretability analysis, and how to convert text to point prompts for SAM. Rest codes including evaluation and other tasks will be released later.

Opposite visualization is due to wrong relation in self-attention:

Noisy activations is owing to redundant features across lables:

Our visualization results:

Text2Points to guide SAM:

Multimodal visualization:

Segmentation results:

Multilabel results:

Demo

Firstly to install the SAM, and download the model

pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Then explain CLIP via jupyter demo "demo.ipynb". Or use the python file:

python demo.py

(Note: demo's results are slightly different from the experimental code, specifically no apex amp fp16 for easier to use.)

Cite

@misc{li2023clip,
      title={CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks}, 
      author={Yi Li and Hualiang Wang and Yiqun Duan and Xiaomeng Li},
      year={2023},
      eprint={2304.05653},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
IMG_IN		IMG_IN
clip		clip
clipga		clipga
figs		figs
.gitignore		.gitignore
FUN_word-world-records.py		FUN_word-world-records.py
README.md		README.md
demo.ipynb		demo.ipynb
demo.jpg		demo.jpg
demo.py		demo.py
run_surgery.py		run_surgery.py
runall.py		runall.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMG_IN

IMG_IN

clip

clip

clipga

clipga

figs

figs

.gitignore

.gitignore

FUN_word-world-records.py

FUN_word-world-records.py

README.md

README.md

demo.ipynb

demo.ipynb

demo.jpg

demo.jpg

demo.py

demo.py

run_surgery.py

run_surgery.py

runall.py

runall.py

Repository files navigation

Changes:

Advanced:

ORIGINAL README.md:

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks (arxiv)

Introduction

Demo

Cite

About

Releases

Packages

Languages

zer0int/CLIP_Surgery

Folders and files

Latest commit

History

Repository files navigation

Changes:

Advanced:

ORIGINAL README.md:

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks (arxiv)

Introduction

Demo

Cite

About

Topics

Resources

Stars

Watchers

Forks

Languages