Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about model pretrain method? #13

Closed
ZhangYuef opened this issue May 24, 2019 · 7 comments
Closed

Question about model pretrain method? #13

ZhangYuef opened this issue May 24, 2019 · 7 comments

Comments

@ZhangYuef
Copy link

Thanks for your sharing.

I find that the paper mentions that the model is pretrained only used $$L_{AL}$$first. In section 4.2,

i.e. we first pretrain the network using only L AL (without enforcing the unit norm constraint) to endow the basic discriminative power with the embedding and to determine the directions of the reference agents in the hypersphere embedding....

And I don't know how to pretrain the model according to the code now. I need some more detailed instructions, e.g how many epoches should I pretrain the model.

Thanks >.<

@KovenYu
Copy link
Owner

KovenYu commented May 30, 2019

Hi @ZhangYuef , thanks for your attention.

As you can see, in fact the pretraining is simply using a softmax classification loss, with every identity being a unique class. I didn't pay much attention on it or tune it, so I didn't remember the exact values of the hyperparameters. But it should be somewhere around:

epoch: 40
batchsize: 64 (not sure)
lr: 1e-2
wd: 1e-2

You may tune a bit and obtain some reasonable results.

@moodom
Copy link

moodom commented Jun 17, 2019

Hi @KovenYu ,thanks for your sharing.
According to your settings, I reproduced the results of your pretrained model. but I found that I can't get the results in the paper when I train this pretrained model in the second-stage training. I think the parameter distribution of the pretrained model is very important for the parameter setting of the second-stage training. can you share the pretrained model code?
Thank you very much.

@KovenYu
Copy link
Owner

KovenYu commented Jun 30, 2019

hi @moodom thank you for your attention. Did you try using the provided pretrained model and is that working?

@moodom
Copy link

moodom commented Jun 30, 2019

HI,@KovenYu. I had used the provided pretrained model and got a good result. But when I used the LAL loss as described in the paper and remove the unit norm constraint to train a pretrained model. After that, I used the pretrained model in the second stage of training and the rank 1 can only reach about 56. I tried to adjust LR and WD. the results were the same. I tested the average parameters of the provided pretrained model in the FC layer and the Euclidean distance between the FC layer column vector. The results are as follows:
Average parameters of FC layer: - 0.00755771
Column Vector Euclidean Distance Mean: - 413379.0
Standard deviation of column vector Euclidean distance: 1.8415+08
I think it's a very good result. The parameters are very small, but the distance is very large. But the pretrained model I trained did not reach that level.
Do you use any other training skills?

@KovenYu
Copy link
Owner

KovenYu commented Jun 30, 2019

@moodom thank you for your detailed description!
I looked at the pretrained code and I find two notable points:

  1. By "without the unit norm constraint" it means both a and f(x) are not normalized, and the scale factor 30 is also not used.
  2. I find that I actually tried a few different pretraining strategies, and chose a best baseline obtained by using ImageNet initialized weights (downloaded from here), then trained for 60 epochs with batchsize=256 (256 SOURCE images without any target images, unlike in the current code), and increase LR to 1e-3. Other settings (incl. data augmentation, lr strategy, etc.) were the same as in the code.

@pzhren
Copy link

pzhren commented Mar 21, 2020

Thank you for sharing the code. I set the corresponding parameters according to your description and want to re-loss_al pre-training. However, the pre-training weights obtained in the second stage of training appeared a large number of nan cases. The following is my pre-training code: https://github.com/pzhren/Papers/blob/master/%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%B8%8Ere-id%E4%BB%BB%E5%8A%A1/MAR-master/src/pretrain.py#L6

@pzhren
Copy link

pzhren commented Mar 21, 2020

The following are the hyperparameter settings during pre-training.
python version : 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 13:26:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
torch version : 1.1.0

------------------------------------------------------- options --------------------------------------------------------
batch_size: 256 beta: 0.2 crop_size: (384, 128)
epochs: 60 gpu: 0 img_size: (384, 128)
lamb_1: 0.0002 lamb_2: 50.0 lr: 0.001
margin: 1.0 mining_ratio: 0.005 ml_path: ../data/ml_Market.dat
padding: 7 pretrain: True pretrain_path: ../data/resnet50-19c8e357.pth
print_freq: 100 resume: save_path: ../runs/debug
scala_ce: 30.0 source: MSMT17 target: Market
wd: 0.025

do not use pre-trained model. train from scratch.
loaded pre-trained model from ../data/resnet50-19c8e357.pth

==>>[2020-03-20 18:12:12] [Epoch=000/060] Stage 1, [Need: 00:00:00]
Iter: [000/969] Freq 37.5 loss_total 8.316 loss_source 8.316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants