# Triplet Loss Based Re-Id
### Ref: "In Defense of the Triplet Loss for Person Re-Identification": https://arxiv.org/abs/1703.07737

Triplet Loss deals with solving an optimization problem with a different loss function pertaining to three images. These images are the anchor, <i>a</i> , the positive case, <i>p</i> , and the negative case, <i>n</i> . Where <i>a</i> is the query image, <i>p</i> is of the same class as <i>a</i>, and <i>n</i> is from a negative class from <i>a</i>. 

<br>

Below we build our model for Triplet Loss for evaluation on our model. We train with the Market1501 dataset (due to availability issues with other datasets). These models are then visualized within the subdirectory /visrank_cvision_results/visrank_triplet_dataset/.

<br>

This is tested via the torchreid API provided by Kaiyang Zhou who authored numerous papers within the domain. Link here: https://github.com/KaiyangZhou/deep-person-reid


In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
cd drive/MyDrive/person-reid/deep-person-reid

/content/drive/MyDrive/person-reid/deep-person-reid


In [None]:
import torchreid
from comp_vis_data import CvDataSet # our dataloader

Register our dataset with the Torchreid API and load our data with the ImageDataLoader. In this case, we expect the input to be of height 256, 128 as they are upsized to fit the resnet50 prebuilt model. Additionally, the model combines all the query and gallery data into the training data for Market1501 for more data samples (as this is evaluated on our dataset- not market1501). The training sampler parameter is set to RandomIdentitySampler to indicate that these images should be image triplets.

In [None]:
torchreid.data.register_image_dataset('cv_data', CvDataSet)

In [None]:
datamanager = torchreid.data.ImageDataManager(
    root='reid-data',
    sources='market1501',
    targets='cv_data',
    height=256,    
    width=128,
    combineall=True,
    batch_size_train=64,
    batch_size_test=64,
    num_instances=3,
    train_sampler='RandomIdentitySampler' # Image Triplet
)

Building train transforms ...
+ resize to 256x128
+ random flip
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Building test transforms ...
+ resize to 256x128
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
=> Loading train (source) dataset
=> Loaded Market1501
  ----------------------------------------
  subset   | # ids | # images | # cameras
  ----------------------------------------
  train    |  1501 |    29419 |         6
  query    |   750 |     3368 |         6
  gallery  |   751 |    15913 |         6
  ----------------------------------------
=> Loading test (target) dataset
self.dataset_dir /content/drive/My Drive/person-reid/deep-person-reid/reid-data/data
=> Loaded CvDataSet
  ----------------------------------------
  subset   | # ids | # images | # cameras
  ----------------------------------------
  train    |     3 |      628 |         9
  query    |   

Below we build our model and as mentioned above, we use transfer learning to build off the pre-existing resnet50 model. The loss would be triplet and solving the function , 

$L(a, p, n) = max(0, D(a, p) — D(a, n) + alpha)$, 

where a is the anchor, p is the positive, n is the negative case. By finding a maximizing function between 0 and the euclidean distance between positive and negative cases, we would hope to train the model to learn from incorrect cases (i.e, cases that are positive in value). 

In [None]:
model = torchreid.models.build_model(
    name='resnet50',
    num_classes=datamanager.num_train_pids,
    loss='triplet'
)
model = model.cuda()
optimizer = torchreid.optim.build_optimizer(
    model, optim='adam', lr=0.0003
)
scheduler = torchreid.optim.build_lr_scheduler(
    optimizer,
    lr_scheduler='single_step',
    stepsize=20
)
engine = torchreid.engine.ImageTripletEngine(
    datamanager, model, optimizer, margin=0.3,
    weight_t=0.7, weight_x=1, scheduler=scheduler
)

In [None]:
engine.run(
    max_epoch=60,
    save_dir='log/resnet50-triplet-market1501xcv_data',
    eval_freq=10,
    print_freq=10
)

=> Start training
epoch: [1/60][10/425]	time 0.508 (0.628)	data 0.000 (0.099)	eta 4:26:56	loss_t 0.3775 (0.3479)	loss_x 7.4763 (7.4100)	acc 0.0000 (0.6250)	lr 0.000300
epoch: [1/60][20/425]	time 0.518 (0.569)	data 0.000 (0.050)	eta 4:01:30	loss_t 0.3306 (0.3464)	loss_x 7.2756 (7.4067)	acc 0.0000 (0.3125)	lr 0.000300
epoch: [1/60][30/425]	time 0.515 (0.550)	data 0.000 (0.033)	eta 3:53:17	loss_t 0.3297 (0.3393)	loss_x 7.4993 (7.4045)	acc 0.0000 (0.2083)	lr 0.000300
epoch: [1/60][40/425]	time 0.510 (0.538)	data 0.000 (0.025)	eta 3:48:26	loss_t 0.3239 (0.3366)	loss_x 7.4643 (7.4161)	acc 0.0000 (0.1562)	lr 0.000300
epoch: [1/60][50/425]	time 0.540 (0.534)	data 0.000 (0.020)	eta 3:46:31	loss_t 0.3471 (0.3346)	loss_x 7.7156 (7.4209)	acc 0.0000 (0.2500)	lr 0.000300
epoch: [1/60][60/425]	time 0.542 (0.534)	data 0.000 (0.017)	eta 3:46:17	loss_t 0.3200 (0.3293)	loss_x 7.2423 (7.4125)	acc 0.0000 (0.2083)	lr 0.000300
epoch: [1/60][70/425]	time 0.528 (0.532)	data 0.000 (0.014)	eta 3:45:33	loss_t 0.3

As we can see, the mAP is 45.8% which isn't a huge improvement from using softmax + cross entropy loss. The Rank-1 and Rank-5 also significantly performed worse. This may be do to the fact that it is learning colors. This is probably because models built on hard triplet mining usually perform better, compared to these models. The accuracy of the model did well to learn on Market1501 data, but performed poorly on evaluation data.

Visualizations are done in FeatureExtractor.py + is visualized in the visrank_cvision_data/ directory