# Triplet Loss Based Re-Id
### Ref: "In Defense of the Triplet Loss for Person Re-Identification": https://arxiv.org/abs/1703.07737

Triplet Loss deals with solving an optimization problem with a different loss function pertaining to three images. These images are the anchor, <i>a</i> , the positive case, <i>p</i> , and the negative case, <i>n</i> . Where <i>a</i> is the query image, <i>p</i> is of the same class as <i>a</i>, and <i>n</i> is from a negative class from <i>a</i>. 

<br>

Below we build our model for Triplet Loss for evaluation on our model. We train with the Market1501 dataset (due to availability issues with other datasets). These models are then visualized within the subdirectory /visrank_cvision_results/visrank_triplet_dataset/.

<br>

This is tested via the torchreid API provided by Kaiyang Zhou who authored numerous papers within the domain. Link here: https://github.com/KaiyangZhou/deep-person-reid


In [1]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [2]:
cd drive/MyDrive/person-reid/deep-person-reid

/content/drive/MyDrive/person-reid/deep-person-reid


In [3]:
import torchreid
from comp_vis_data_f import CvDataSet # our dataloader

Register our dataset with the Torchreid API and load our data with the ImageDataLoader. In this case, we expect the input to be of height 256, 128 as they are upsized to fit the resnet50 prebuilt model. Additionally, the model combines all the query and gallery data into the training data for Market1501 for more data samples (as this is evaluated on our dataset- not market1501). The training sampler parameter is set to RandomIdentitySampler to indicate that these images should be image triplets.

In [4]:
torchreid.data.register_image_dataset('cv_data', CvDataSet)

In [7]:
datamanager = torchreid.data.ImageDataManager(
    root='reid-data-triplet',
    sources='market1501',
    targets='cv_data',
    height=256,    
    width=128,
    combineall=True,
    batch_size_train=64,
    batch_size_test=64,
    num_instances=3,
    train_sampler='RandomIdentitySampler', # Image Triplet
    transforms=['random_flip', 'random_crop', 'color_jitter']
)

Building train transforms ...
+ resize to 256x128
+ random flip
+ random crop (enlarge to 288x144 and crop 256x128)
+ color jitter
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Building test transforms ...
+ resize to 256x128
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
=> Loading train (source) dataset
Creating directory "/content/drive/My Drive/person-reid/deep-person-reid/reid-data-triplet/market1501"
Downloading Market1501 dataset to "/content/drive/My Drive/person-reid/deep-person-reid/reid-data-triplet/market1501"
* url="http://188.138.127.15:81/Datasets/Market-1501-v15.09.15.zip"
* destination="/content/drive/My Drive/person-reid/deep-person-reid/reid-data-triplet/market1501/Market-1501-v15.09.15.zip"
...100%, 145 MB, 7638 KB/s, 19 seconds passed
Extracting "/content/drive/My Drive/person-reid/deep-person-reid/reid-data-triplet/market1501/Market-1501-v15.09.15

Below we build our model and as mentioned above, we use transfer learning to build off the pre-existing resnet50 model. The loss would be triplet and solving the function , 

$L(a, p, n) = max(0, D(a, p) — D(a, n) + alpha)$, 

where a is the anchor, p is the positive, n is the negative case. By finding a maximizing function between 0 and the euclidean distance between positive and negative cases, we would hope to train the model to learn from incorrect cases (i.e, cases that are positive in value). 

In [8]:
model = torchreid.models.build_model(
    name='resnet50',
    num_classes=datamanager.num_train_pids,
    loss='triplet'
)
model = model.cuda()
optimizer = torchreid.optim.build_optimizer(
    model, optim='adam', lr=0.0003
)
scheduler = torchreid.optim.build_lr_scheduler(
    optimizer,
    lr_scheduler='single_step',
    stepsize=20
)
engine = torchreid.engine.ImageTripletEngine(
    datamanager, model, optimizer, margin=0.3,
    weight_t=0.7, weight_x=1, scheduler=scheduler
)

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth


HBox(children=(FloatProgress(value=0.0, max=102502400.0), HTML(value='')))




In [9]:
engine.run(
    max_epoch=60,
    save_dir='log/resnet50-triplet-market1501xcv_data_fixed',
    eval_freq=10,
    print_freq=10
)

=> Start training


	addmm_(Number beta, Number alpha, Tensor mat1, Tensor mat2)
Consider using one of the following signatures instead:
	addmm_(Tensor mat1, Tensor mat2, *, Number beta, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  dist.addmm_(1, -2, inputs, inputs.t())


epoch: [1/60][10/435]	time 0.519 (1.051)	data 0.000 (0.513)	eta 7:36:56	loss_t 2.2532 (2.9507)	loss_x 7.5569 (7.4681)	acc 0.0000 (0.3125)	lr 0.000300
epoch: [1/60][20/435]	time 0.484 (0.771)	data 0.000 (0.257)	eta 5:35:00	loss_t 1.1513 (2.3671)	loss_x 8.7016 (7.9279)	acc 0.0000 (0.1562)	lr 0.000300
epoch: [1/60][30/435]	time 0.488 (0.677)	data 0.000 (0.171)	eta 4:54:21	loss_t 1.8069 (2.1304)	loss_x 8.3886 (8.1472)	acc 0.0000 (0.2083)	lr 0.000300
epoch: [1/60][40/435]	time 0.483 (0.629)	data 0.000 (0.128)	eta 4:33:19	loss_t 0.7063 (1.8250)	loss_x 7.6927 (8.1374)	acc 0.0000 (0.1562)	lr 0.000300
epoch: [1/60][50/435]	time 0.481 (0.601)	data 0.000 (0.103)	eta 4:20:44	loss_t 0.6482 (1.5853)	loss_x 7.5404 (8.0306)	acc 0.0000 (0.1562)	lr 0.000300
epoch: [1/60][60/435]	time 0.519 (0.585)	data 0.000 (0.086)	eta 4:13:51	loss_t 0.5127 (1.4061)	loss_x 7.6168 (7.9452)	acc 0.0000 (0.1302)	lr 0.000300
epoch: [1/60][70/435]	time 0.486 (0.573)	data 0.000 (0.073)	eta 4:08:46	loss_t 0.4218 (1.2647)	loss_

As we can see, the mAP is 45.8% which isn't a huge improvement from using softmax + cross entropy loss. The Rank-1 and Rank-5 also significantly performed worse. This may be do to the fact that it is learning colors. This is probably because models built on hard triplet mining usually perform better, compared to these models. The accuracy of the model did well to learn on Market1501 data, but performed poorly on evaluation data.

Visualizations are done in FeatureExtractor.py + is visualized in the visrank_cvision_data/ directory

In [10]:
import os
save_dir = os.getcwd() + '/visrank_cvision_results/visrank_triplet_dataset_2_fixed'

In [12]:
save_dir

'/content/drive/My Drive/person-reid/deep-person-reid/visrank_cvision_results/visrank_triplet_dataset_2_fixed'

In [13]:
engine.run(save_dir='./', 
           max_epoch=10, 
           test_only=True,
           visrank=True,
           visrank_topk=10,
           dist_metric="euclidean"
           )

##### Evaluating cv_data (target) #####
Extracting features from query set ...
Done, obtained 79-by-2048 matrix
Extracting features from gallery set ...
Done, obtained 235-by-2048 matrix
Speed: 0.0108 sec/batch
Computing distance matrix with metric=euclidean ...
Computing CMC and mAP ...
** Results **
mAP: 95.4%
CMC curve
Rank-1  : 95.6%
Rank-5  : 98.5%
Rank-10 : 98.5%
Rank-20 : 98.5%
# query: 79
# gallery 235
Visualizing top-10 ranks ...
Done. Images have been saved to "./visrank_cv_data" ...
