# **CLIP Captioner API**

A API endpoint for using OpenAI Clip to caption images.

This is a API meant to be used with tools for automating captioning images. You can use this colab notebook if you don't have a GPU.

*Author:* [_CypherpunkSamurai_](https://github.com/CypherpunkSamurai)

If you find any bugs feel free to contact me 😊

## Credits
- OpenAI CLIP
- pharmapsychotic (for the CLIP2 Colab)


**Shameless Self-promotion:**
Checkout my QT5 based captioning tool [Captioner](https://github.com/CypherpunkSamurai/captioner) 😛

---

In [2]:
#@markdown # Check Colab GPU
#@markdown Check if GPU is available. If not switch runtime.

!nvidia-smi

Sun Feb 19 19:07:26 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P8     9W /  70W |      3MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
#@markdown # Install requirements
#@markdown Install the required modules to run CLIP
!pip -q install gradio open_clip_torch clip-interrogator fastapi nest-asyncio python-multipart uvicorn pyngrok

In [4]:
#@markdown ## Config
#@markdown Choose a Model.
#@markdown - ViT-L-14 works good for 1.0 to 1.4
#@markdown - ViT-H-14 works good for 1.5 to 2.*.
clip_model_name = 'ViT-H-14/laion2b_s32b_b79k' #@param ["ViT-L-14/openai", "ViT-H-14/laion2b_s32b_b79k"]

In [5]:
#@markdown ## Setup CLIP
import gradio as gr
from clip_interrogator import Config, Interrogator

# Interrogator Conf
config = Config(
  clip_model_name=clip_model_name,
  clip_model_path='cache',
  device='cuda:0',
  blip_num_beams=64,
  blip_offload=False
)

#@markdown
# Custom Checkpoints for CLIP
# Comment to use official CLIP Model
# !apt-get install -qy aria2 > /dev/null
# !mkdir -p "cache"
# !aria2c -x 16 'https://huggingface.co/pharma/ci-preprocess/resolve/main/ViT-H-14_laion2b_s32b_b79k_artists.pkl' \
#     'https://huggingface.co/pharma/ci-preprocess/resolve/main/ViT-H-14_laion2b_s32b_b79k_flavors.pkl' \
#     'https://huggingface.co/pharma/ci-preprocess/resolve/main/ViT-H-14_laion2b_s32b_b79k_mediums.pkl' \
#     'https://huggingface.co/pharma/ci-preprocess/resolve/main/ViT-H-14_laion2b_s32b_b79k_movements.pkl' \
#     'https://huggingface.co/pharma/ci-preprocess/resolve/main/ViT-H-14_laion2b_s32b_b79k_trendings.pkl' \
#     -d "cache"

# New Interrogator
ci = Interrogator(config)


def image_analysis(image):
    image = image.convert('RGB')
    image_features = ci.image_to_features(image)

    top_mediums = ci.mediums.rank(image_features, 5)
    top_artists = ci.artists.rank(image_features, 5)
    top_movements = ci.movements.rank(image_features, 5)
    top_trendings = ci.trendings.rank(image_features, 5)
    top_flavors = ci.flavors.rank(image_features, 5)

    medium_ranks = {medium: sim for medium, sim in zip(top_mediums, ci.similarities(image_features, top_mediums))}
    artist_ranks = {artist: sim for artist, sim in zip(top_artists, ci.similarities(image_features, top_artists))}
    movement_ranks = {movement: sim for movement, sim in zip(top_movements, ci.similarities(image_features, top_movements))}
    trending_ranks = {trending: sim for trending, sim in zip(top_trendings, ci.similarities(image_features, top_trendings))}
    flavor_ranks = {flavor: sim for flavor, sim in zip(top_flavors, ci.similarities(image_features, top_flavors))}
    
    return medium_ranks, artist_ranks, movement_ranks, trending_ranks, flavor_ranks


def image_to_prompt(image, mode):
    """
      Convert Image to prompt
    """
    ci.config.chunk_size = 2048 if ci.config.clip_model_name == "ViT-L-14/openai" else 1024
    ci.config.flavor_intermediate_count = 2048 if ci.config.clip_model_name == "ViT-L-14/openai" else 1024
    image = image.convert('RGB')
    if mode == 'best':
        return ci.interrogate(image)
    elif mode == 'classic':
        return ci.interrogate_classic(image)
    elif mode == 'fast':
        return ci.interrogate_fast(image)
    elif mode == 'negative':
        return ci.interrogate_negative(image)


Loading BLIP model...
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
Loading CLIP model...


ViT-H-14_laion2b_s32b_b79k_artists.safetensors: 100%|██████████| 21.6M/21.6M [00:00<00:00, 326MB/s]
ViT-H-14_laion2b_s32b_b79k_flavors.safetensors: 100%|██████████| 207M/207M [00:00<00:00, 268MB/s]
ViT-H-14_laion2b_s32b_b79k_mediums.safetensors: 100%|██████████| 195k/195k [00:00<00:00, 27.9MB/s]
ViT-H-14_laion2b_s32b_b79k_movements.safetensors: 100%|██████████| 410k/410k [00:00<00:00, 23.7MB/s]
ViT-H-14_laion2b_s32b_b79k_negative.safetensors: 100%|██████████| 84.2k/84.2k [00:00<00:00, 15.7MB/s]
ViT-H-14_laion2b_s32b_b79k_trendings.safetensors: 100%|██████████| 148k/148k [00:00<00:00, 26.7MB/s]


Loaded CLIP model and data in 46.65 seconds.


# Run the API
Run the API and provide a url

In [23]:
#@markdown ### FastAPI Code
#@markdown HTTP Server Code for captioning images

#@markdown __Available API Endpoints:__
#@markdown - `/` - Shows Status
#@markdown - `/help` - Shows Help
#@markdown - `/caption` - Caption an image
#@markdown  - `file` is the image
#@markdown  - `prompt_mode` is prompt_mode (best, fast etc. default is `best`)
#@markdown - `docs` - Swagger UI

# Log
import logging
import traceback
# logging config
logging.basicConfig(
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    level=logging.INFO
)
clip_logger = logging.getLogger('CLIP')

# FastAPI
from typing import Optional # Optional Feild
from fastapi import FastAPI
from fastapi import File, UploadFile, Form
# Image API
from PIL import Image
from IPython.display import display
from io import BytesIO


# new api
app = FastAPI()


@app.get("/")
async def root():
  return {"status": "running", "response": {"clip_model": clip_model_name} }

@app.get("/help")
async def root():
  return {"status": "ok", "response": "please check /docs for documentation."}

@app.post("/caption")
async def caption_img(file: UploadFile = File(...), prompt_mode : Optional[str] = Form("best")):
    try:
        # read "file" and "prompt_mode" from multipart request
        contents = await file.read()
        
        # read PIL image
        img = Image.open(BytesIO(contents))
        
        # caption
        clip_logger.info(f"captioning image in prompt mode: {prompt_mode}")
        caption = image_to_prompt(img, prompt_mode)
        
        # return
        return {"status": "ok", "caption": caption, "prompt_mode": prompt_mode}

    except Exception:
        return {"status": "error", "response": traceback.format_exc()}
    finally:
        file.close()

In [None]:
import nest_asyncio
from pyngrok import ngrok
import uvicorn

#@markdown ### Enable Ngrok (disable for local machine)
use_ngrok = True #@param {type: "boolean"}

#@markdown Enter `NGROK_TOKEN` to get a public API url
if use_ngrok:
  NGROK_TOKEN = "YOUR TOKEN HERE" #@param {"type": "string"}
  ngrok.set_auth_token(NGROK_TOKEN)

  # stop running
  ngrok_process = ngrok.get_ngrok_process()
  ngrok_process.stop_monitor_thread()
  ngrok.kill()

  ngrok_tunnel = ngrok.connect(9000)
  print("Public URL:", ngrok_tunnel.public_url)
  print("Note: add `ngrok-skip-browser-warning: True` header to your request to bypass ngrok verification\n")

nest_asyncio.apply()
uvicorn.run(app, port=9000)

---

# Thanks for Colaborating! 😊
If you find any bugs please let me know on [Github](https://github.com/CypherpunkSamurai)

Join the SAIL Discord for more notebooks like this... 