running on google colab...

# Todo
- [x] load clip
- [x] load images
- [x] batch run inference on images
- [x] figure out how to load images from zip
- [x] save embeddings
- [x] benchmark, gpu: 2h
- [x] upload to gcp bucket
- [ ] embed text as well
- [ ] predict a baseline score for kaggle, on image+description cos similarity only

# Setup

Don't forget to upload your **kaggle.json** for authentication

1. get data

In [1]:
!KAGGLE_CONFIG_DIR=/content kaggle competitions download -c h-and-m-personalized-fashion-recommendations

Downloading h-and-m-personalized-fashion-recommendations.zip to /content
100% 28.7G/28.7G [07:47<00:00, 72.9MB/s]
100% 28.7G/28.7G [07:47<00:00, 65.9MB/s]


2. mount zip

In [2]:
!apt-get install -y fuse-zip

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libzip4
The following NEW packages will be installed:
  fuse-zip libzip4
0 upgraded, 2 newly installed, 0 to remove and 39 not upgraded.
Need to get 65.6 kB of archives.
After this operation, 178 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libzip4 amd64 1.1.2-1.1 [37.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 fuse-zip amd64 0.4.4-1 [27.9 kB]
Fetched 65.6 kB in 1s (53.8 kB/s)
Selecting previously unselected package libzip4:amd64.
(Reading database ... 156210 files and directories currently installed.)
Preparing to unpack .../libzip4_1.1.2-1.1_amd64.deb ...
Unpacking libzip4:amd64 (1.1.2-1.1) ...
Selecting previously unselected package fuse-zip.
Preparing to unpack .../fuse-zip_0.4.4-1_amd64.deb ...
Unpacking fuse-zip (0.4.4-1) ...
Setting up libzip4:amd64 (

In [3]:
!mkdir /content/archive
!fuse-zip /content/h-and-m-personalized-fashion-recommendations.zip /content/archive

In [4]:
# to unmount
# !fusermount -u /content/archive

3. get clip model

In [5]:
!pip install ftfy regex tqdm
!pip install git+https://github.com/openai/CLIP.git

Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[?25l[K     |██████▏                         | 10 kB 18.3 MB/s eta 0:00:01[K     |████████████▍                   | 20 kB 18.7 MB/s eta 0:00:01[K     |██████████████████▌             | 30 kB 13.3 MB/s eta 0:00:01[K     |████████████████████████▊       | 40 kB 4.4 MB/s eta 0:00:01[K     |██████████████████████████████▉ | 51 kB 4.9 MB/s eta 0:00:01[K     |████████████████████████████████| 53 kB 1.1 MB/s 
Installing collected packages: ftfy
Successfully installed ftfy-6.1.1
Collecting git+https://github.com/openai/CLIP.git
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-4w2a7qnz
  Running command git clone -q https://github.com/openai/CLIP.git /tmp/pip-req-build-4w2a7qnz
Building wheels for collected packages: clip
  Building wheel for clip (setup.py) ... [?25l[?25hdone
  Created wheel for clip: filename=clip-1.0-py3-none-any.whl size=1369221 sha256=bf1c16e1b9ae199dde2d65035619136d5f6f31e2d

# Get embeddings

In [6]:
import torch as t
import clip

device = "cuda" if t.cuda.is_available() else "cpu"

In [7]:
# TODO to use tpu
# import torch_xla
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()

In [8]:
# example useage of clip ala. https://github.com/openai/CLIP
# model, preprocess = clip.load("ViT-B/32", device=device)
# image = preprocess(Image.open("/content/images/010/0108775015.jpg")).unsqueeze(0).to(device)
# text = clip.tokenize(["a dress", "a dog", "a cat"]).to(device)

# with t.no_grad():
#     image_features = model.encode_image(image)
#     text_features = model.encode_text(text)
    
#     logits_per_image, logits_per_text = model(image, text)
#     probs = logits_per_image.softmax(dim=-1).cpu().numpy()

# print("Label probs:", probs)

In [9]:
import pandas as pd
import zipfile
from io import BytesIO
from PIL import Image
from t.utils.data import Dataset, DataLoader
from os.path import exists


class FashionImagesDataset(Dataset):
    def __init__(self, transform=lambda id: id):
        self.articles = pd.read_csv('/content/archive/articles.csv')
        self.articles['img_path'] = self.articles['article_id'].map(lambda id: "/content/archive/images/0" + str(id)[0:2] + "/0" + str(id) + ".jpg")
        self.valid_idx = self.articles[self.articles.apply(lambda article: exists(article['img_path']), axis=1)]
        print('valid and has image:', len(self.valid_idx), 'from:', len(self.articles))
        self.transform = transform

    def __len__(self):
        return len(self.valid_idx)

    def __getitem__(self, idx):
        img_path = self.valid_idx.iloc[idx]['img_path']
        image = Image.open(img_path)
        label = self.valid_idx.iloc[idx]['article_id']
        image = self.transform(image)
        return image, label

In [10]:
model_name = 'ViT-B/32'
# also ViT-L/14, etc.
clip.available_models()

['RN50',
 'RN101',
 'RN50x4',
 'RN50x16',
 'RN50x64',
 'ViT-B/32',
 'ViT-B/16',
 'ViT-L/14']

In [11]:
model, preprocess = clip.load(model_name, device=device)

100%|███████████████████████████████████████| 338M/338M [00:06<00:00, 54.4MiB/s]


In [12]:
batch_size = 64

In [13]:
dataset = FashionImagesDataset(transform=preprocess)

valid and has image: 105100 from: 105542


In [14]:
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)

In [15]:
images, labels = next(iter(data_loader))

In [16]:
images.size(), images.chunk(batch_size)[1].squeeze().size()

(t.Size([64, 3, 224, 224]), t.Size([3, 224, 224]))

In [None]:
from tqdm import tqdm

image_features = {}
with t.no_grad():
    for images, labels in tqdm(data_loader):
      features = model.encode_image(images.to(device))
      for label, feature in zip(labels, features):
        image_features[label.item()] = feature.to('cpu')

In [35]:
image_features[111565003].size()

t.Size([512])

# Save

In [19]:
file_name = '/content/fashion-recommendation-image-embeddings-clip-' + model_name.replace('/', '-') + '.pt'
t.save(image_features, file_name)

In [20]:
len(image_features.keys())

105088

In [21]:
!ls -lah $file_name

-rw-r--r-- 1 root root 130M Mar 28 10:52 /content/fashion-recommendation-image-embeddings-clip-ViT-B-32.pt


In [23]:
from google.colab import auth
auth.authenticate_user()

In [24]:
!gsutil cp $file_name gs://heii-public/

Copying file:///content/fashion-recommendation-image-embeddings-clip-ViT-B-32.pt [Content-Type=application/octet-stream]...
/
Operation completed over 1 objects/130.0 MiB.                                    


In [29]:
"https://storage.googleapis.com/" + file_name.replace('/content/', '')

'https://storage.googleapis.com/fashion-recommendation-image-embeddings-clip-ViT-B-32.pt'

# Get text embeddings

predict images based on text descriptions, check accuracy

# Predict clothes