# Image Classification using CLIP

This is a self contained notebook using `openai/CLIP` to classify images from dataset of images inside a folder using a `token` and group them in another folder where name of folder having `token` as the folder name.

For example, if `token = "a bike"` then all the images resembling the token, a bike, will be classified and grouped into a new folder in the parent directory named "a bike".

### Folder Structure before running the script:
- Parent Directory
  - classify.ipynb
  - images
    - car.png
    - bike.png
    - bike2.png

### Folder Structure after running the script:
here `token = "a bike"`

- Parent Directory
  - classify.ipynb
  - bike
    - bike.png
    - bike2.png
  - images
    - car.png

### Installation

In [None]:
! pip install ftfy regex tqdm
! pip install git+https://github.com/openai/CLIP.git

### Import Modules

In [None]:
import numpy as np
import torch
from pkg_resources import packaging

print("Torch version:", torch.__version__)

In [None]:
import clip

### Loading the Model

In [4]:
model, preprocess = clip.load("ViT-B/32")
model.cuda().eval()
input_resolution = model.visual.input_resolution
context_length = model.context_length
vocab_size = model.vocab_size

print("Model parameters:", f"{np.sum([int(np.prod(p.shape)) for p in model.parameters()]):,}")
print("Input resolution:", input_resolution)
print("Context length:", context_length)
print("Vocab size:", vocab_size)

100%|███████████████████████████████████████| 338M/338M [05:52<00:00, 1.01MiB/s]


Model parameters: 151,277,313
Input resolution: 224
Context length: 77
Vocab size: 49408


### Image Preprocessing

In [None]:
preprocess

### Main Code

In [48]:
import torch
import clip
from PIL import Image
import os
import sys
import shutil


def classify(img, token):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model, preprocess = clip.load("ViT-B/32", device=device)

    image = preprocess(Image.open(img)).unsqueeze(0).to(device)
    text = clip.tokenize([token, " "]).to(device)

    with torch.no_grad():
        image_features = model.encode_image(image)
        text_features = model.encode_text(text)

        logits_per_image, logits_per_text = model(image, text)
        probs = logits_per_image.softmax(dim=-1).cpu().numpy()

    return (probs[0][0])

# assign directory
directory = input("Enter the name of folder containing the images :\n")
token=input("Enter token :\n")
i = 0
# iterate over files in
# that directory
map = dict()
result = []

for filename in os.scandir(directory):
    if filename.is_file():
        map[filename.path] = classify(filename.path, token)

for i in map:
    if map[i] > 0.8:
        result.append(i)


# creating path
current_path = os.getcwd()
path = current_path + "\\" + token
os.mkdir(path)

for i in result:
    old_path = "C:\Files\image classifier\\" + i
    new_path = path
    shutil.move(old_path, new_path)
