### Training Leboncoin - Vinted separation

We want to be able to classify the source of a screen shot to later be able to extract data accordingly.<br>
We have two sources : Leboncoin and vinted.<br>
To achieve this, we will use CLIP to extract data and a simple classifier.

In [7]:
import os
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from PIL import Image
import clip

In [8]:
device = "cuda" if torch.cuda.is_available() else "cpu"
clip_model, preprocess = clip.load("ViT-B/32", device=device)
for p in clip_model.parameters():
    p.requires_grad = False
classifier = nn.Linear(512, 2).to(device)

We then build the data set class.

In [9]:
class ScreenshotDataset(Dataset):
    def __init__(self, root, transform):
        self.samples = []
        self.transform = transform
        for label, platform in enumerate(["leboncoin", "vinted"]):
            folder = os.path.join(root, platform)
            for f in os.listdir(folder):
                self.samples.append((os.path.join(folder, f), label))

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        path, label = self.samples[idx]
        img = Image.open(path).convert("RGB")
        return self.transform(img), label

In [10]:
train_dataset = ScreenshotDataset("../data/processed/train", preprocess)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

Now, we can train our model.

In [11]:
optimizer = optim.AdamW(classifier.parameters(), lr=1e-3)
epoch_number = 20
for epoch in range(epoch_number):
    classifier.train()
    total_loss = 0

    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)

        with torch.no_grad():
            feats = clip_model.encode_image(imgs)
            feats = feats / feats.norm(dim=-1, keepdim=True)

        logits = classifier(feats)
        loss = F.cross_entropy(logits, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print(f"Epoch {epoch} - loss: {total_loss:.3f}")

Epoch 0 - loss: 0.690
Epoch 1 - loss: 0.685
Epoch 2 - loss: 0.680
Epoch 3 - loss: 0.675
Epoch 4 - loss: 0.670
Epoch 5 - loss: 0.665
Epoch 6 - loss: 0.659
Epoch 7 - loss: 0.654
Epoch 8 - loss: 0.650
Epoch 9 - loss: 0.645
Epoch 10 - loss: 0.640
Epoch 11 - loss: 0.635
Epoch 12 - loss: 0.630
Epoch 13 - loss: 0.625
Epoch 14 - loss: 0.620
Epoch 15 - loss: 0.616
Epoch 16 - loss: 0.611
Epoch 17 - loss: 0.606
Epoch 18 - loss: 0.602
Epoch 19 - loss: 0.597


At the end, we only store the classifier because we froze CLIP.

In [12]:
checkpoint = {
    "clip_model": "ViT-B/32",
    "classifier_state": classifier.state_dict(),
    "num_classes": 2,
    "class_names": ["leboncoin", "vinted"]
}
torch.save(checkpoint, "clip_screenshot_classifier.pt")