# LAB 6: Image search using CLIP (Pre-trained version)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/biodatlab/xlab-recommendation/blob/notebook/solution_notebooks/06_CLIP_image_search_pretrained.ipynb)

This lab will download encoded images and images indices from google drive to try out the interactive UI

* Dataset ref: https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/overview
    * images in dataset use in this notebook are resized images from H&M personalized fashion recommendations (resize to 100 * 100 pixel)
    * contains 100k+ images
    * mounted on google drive: https://drive.google.com/drive/folders/1jX1hasS6HysjEuKG0ucmTxdndB03uliJ?usp=sharing

* Objectives
    * find and recommend clothes for customer using image/text search

* Notes
    * openai-clip: https://github.com/openai/CLIP
    * faiss: https://github.com/facebookresearch/faiss/wiki
    * please change runtime on google colab for faster computation
    * for direct trained version [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/biodatlab/xlab-recommendation/blob/notebook/solution_notebooks/05_CLIP_image_search.ipynb)

In [None]:
# install library

! pip install torch ftfy regex tqdm numpy
! pip install openai-clip
! pip install gradio
! pip install gdown

In [None]:
# import essential library

import os
import os.path as op
from PIL import Image
from zipfile import ZipFile

import numpy as np
from tqdm import tqdm
import torch
import gdown

import clip

In [None]:
# check available runtime

device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cuda": 
  ! pip install faiss-gpu 
else:
  ! pip install faiss-cpu 

print("Now running with " + device)

In [None]:
# load Vit-B/32 model

model, preprocess = clip.load("ViT-B/32", device=device)

In [None]:
# download and extract resize images for recommendations
url = "https://drive.google.com/drive/folders/1jX1hasS6HysjEuKG0ucmTxdndB03uliJ?usp=sharing"
gdown.download_folder(url, use_cookies=False)

# extract dataset zipped file
path =  op.join(os.getcwd(),"h-and-m-resize-image-zip/h-and-m-resize-image.zip")
  
# opening the zip file in READ mode
with ZipFile(path, 'r') as zip:

    # extracting all the files
    print('Extracting all the files now...')
    %time zip.extractall()
    print('Done!')

In [None]:
# download encoded images from shared google drive

url = "https://drive.google.com/drive/folders/132_YbF_cSFZMGesD0wLU_3CMoe0LOtI2?usp=sharing"
gdown.download_folder(url, use_cookies=False)


In [None]:
embeddings_storage = np.load("/encoded_embeddings/h-and-m-CLIP-image-embeddings.npy")
indices_file = open("/encoded_embeddings/item_path.txt",'r')