<center> <h2> Re-identification with Market 1501 </h2></center>

L'objectif est pouvoir identifier la personne sous plusieurs points de vue.

Nous allons d'abord appliquer un modèle préentraîné (VGG16) pour trouver les images les plus similaires, puis les utiliser comme backbone d'un modèle .

### Imports

In [1]:
import os
import sys
import pickle
import time
import shutil
import torch
import cv2

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import pandas as pd

from tensorflow import keras
from scipy import spatial
from tqdm import tqdm
from argparse import Namespace

import reid_data
from model import ReIdModel
from utils import LogCollector

import logging
import tensorboard_logger as tb_logger

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Le dataset Market contient 5 dossiers:

* bounding_box_{train, test}: BB obtenus avec un Deformable Part Model.
* gt_bbox: BB annotés à la main
* get_query/ query: requêtes pour obtenir des samples

On s'intéressera aux premiers dossiers car ils contiennent respectivement 12936 et 19732 images de taille (128, 64, 3)

In [None]:
import zipfile
with zipfile.ZipFile("drive/MyDrive/Market-1501-v15.09.15.zip", 'r') as zip_ref:
    zip_ref.extractall("Market_data")

### Prétraitement du dataset

  Le dataset contient (notamment le dossier test) plusieurs détection fausses (d'une partie du corps, ou du sol etc.) dont l'identifiant est 0000 ou -1, nous allons d'abord supprimer ces images avant de les donner au modèle pour la partie validation:

In [2]:
for file in os.listdir("../Market_data/Market-1501-v15.09.15/bounding_box_test/"):
  if (file.startswith("0000")) or (file.startswith("-1")):
      os.remove(f"../Market_data/Market-1501-v15.09.15/bounding_box_test/{file}")

In [3]:
os.remove("../Market_data/Market-1501-v15.09.15/bounding_box_test/Thumbs.db")

In [4]:
os.remove("../Market_data/Market-1501-v15.09.15/bounding_box_train/Thumbs.db")


### Data Loader
Dans ce projet, nous allons utilisé la librairie pytorch et les dataloader disponibles pour pouvoir encoder par batch les images du dataset et les importée proprement et efficacement (la librairie a été conçu pour que ces actions ne prennent pas beaucoup d'espace et de temps).

Dans le fichier reid_data.py nous avons notre class ReIdDataset où nous avons défini la structure de notre dataset, la fonction collate_fn utilisée par les dataloaders pour obtenir un minibatch des données : dans notre cas il s'agit de (image, indice, identifiant) puis la fonction get_loader pour créer un loader avec notre dataset.

In [2]:
def encode_data(model, data_loader):
    # switch to evaluate mode
    model.eval()


    # numpy array to keep all the embeddings
    img_embs = None
    loss_val = 0
    for i, (images, indexes, ids) in enumerate(data_loader):

        # compute the embeddings
        img_emb = model.forward(images)

        # initialize the numpy arrays given the size of the embeddings
        if img_embs is None:
            img_embs = np.zeros((len(data_loader.dataset), img_emb.size(1)))
        # preserve the embeddings by copying from gpu and converting to numpy
        img_embs[indexes] = img_emb.data.cpu().numpy().copy()

        # measure accuracy and record loss
        loss_val += model.forward_loss(img_emb, ids)

        del images
    print("loss test: ", loss_val/(i+1))
    return img_embs

In [3]:
def get_logger(log_level,log_name='YourNameGoesHere'):
    logger = logging.getLogger(log_name)
    if not logger.handlers:
        logger.setLevel(log_level)
        ch = logging.StreamHandler()
        ch.setLevel(log_level)
        ch.setFormatter(logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        ))
        logger.addHandler(ch)
        logger.propagate = 0
    return logger

## Model: VGG19 finetuned

### Encodage avec VGG16
Après avoir importé le dataset, on applique un Zero Shot VGG16 sur ces images, on pourrait aussi utilisé DarkNet53 (de Yolo-v3)

In [4]:
%load_ext autoreload
%autoreload 2

In [5]:
optdict = {"batch_size":128, "cnn_type":'vgg19', "data_path":'../Market_data/Market-1501-v15.09.15/bounding_box_train', "embed_size":1024, "learning_rate":0.0002, "margin":0.2, "num_epochs":30}
options = Namespace(**optdict)
logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)
tb_logger.configure(options.cnn_type+"_reidmodel", flush_secs=5)   


In [6]:
logger = get_logger(0, options.cnn_type+"_reidmodel")
model_cnn = ReIdModel(options)
model_cnn.logger = LogCollector()
train_loader = reid_data.get_loader(options.data_path, batch_size=options.batch_size)
val_loader = reid_data.get_loader('../Market_data/Market-1501-v15.09.15/bounding_box_test', batch_size=options.batch_size)
# switch to train mode
model_cnn.train()
for epoch in range(options.num_epochs):
    for i, train_data in enumerate(tqdm(train_loader)):
        model_cnn.train()
        
        # Update the model
        model_cnn.train_emb(*train_data)
    tb_logger.log_value('epoch', epoch, step=model_cnn.Eiters)
    tb_logger.log_value('step', i, step=model_cnn.Eiters)
    model_cnn.logger.tb_log(tb_logger, step=model_cnn.Eiters)
    print("training loss: ", model_cnn.loss_t)
    encode_data(model_cnn, val_loader)
model_cnn.save(f"model_{options.cnn_type}.PTH") 

  positive = np.array([np.where(np.array(ids)==id) for id in np.array(ids)]) # shape (batch_size, vary)
  0%|          | 0/102 [00:35<?, ?it/s]


TypeError: max() received an invalid combination of arguments - got (axis=int, out=NoneType, ), but expected one of:
 * ()
 * (Tensor other)
 * (int dim, bool keepdim)
      didn't match because some of the keywords were incorrect: axis, out
 * (name dim, bool keepdim)
      didn't match because some of the keywords were incorrect: axis, out


### Mettre à jour le code sur git:

In [None]:
!git add -f checkpoints/*

In [None]:
!git commit -m "model unet trained"

[master 2283f77] model unet trained
 6 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 checkpoints/MODEL.pth
 create mode 100644 checkpoints/checkpoint_epoch1.pth
 create mode 100644 checkpoints/checkpoint_epoch2.pth
 create mode 100644 checkpoints/checkpoint_epoch3.pth
 create mode 100644 checkpoints/checkpoint_epoch4.pth
 create mode 100644 checkpoints/checkpoint_epoch5.pth


In [None]:
!git config --global user.email "knzmakhlouf@gmail.com"

In [None]:
!git config --global user.name "Mkenza"

In [None]:
!git pull origin master

remote: Enumerating objects: 5, done.[K
remote: Counting objects:  20% (1/5)[Kremote: Counting objects:  40% (2/5)[Kremote: Counting objects:  60% (3/5)[Kremote: Counting objects:  80% (4/5)[Kremote: Counting objects: 100% (5/5)[Kremote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects:  33% (1/3)[Kremote: Compressing objects:  66% (2/3)[Kremote: Compressing objects: 100% (3/3)[Kremote: Compressing objects: 100% (3/3), done.[K
remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0[K
Unpacking objects:  33% (1/3)   Unpacking objects:  66% (2/3)   Unpacking objects: 100% (3/3)   Unpacking objects: 100% (3/3), done.
From https://github.com/Mkenza/Unet
 * branch            master     -> FETCH_HEAD
   a462c77..1979bc9  master     -> origin/master
Updating a462c77..1979bc9
Fast-forward
 .gitignore | 2 [32m+[m[31m-[m
 1 file changed, 1 insertion(+), 1 deletion(-)


In [None]:
!git remote remove origin 

In [None]:
!git remote add origin https://Mkenza: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@github.com/Mkenza/Unet.git

In [None]:
%cd Unet

/content/Unet


In [None]:
!cd ..