Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the choice of loss function and learning rate #15

Closed
LiangXu123 opened this issue Apr 18, 2018 · 6 comments
Closed

the choice of loss function and learning rate #15

LiangXu123 opened this issue Apr 18, 2018 · 6 comments

Comments

@LiangXu123
Copy link

LiangXu123 commented Apr 18, 2018

Hi, guys,have you ever try to find out where the original caffe's L1 loss function really work in pytorch?

part 1:
In my experiment the following two loss function get absolutely different results:
1: loss_fn = torch.nn.L1Loss(size_average=False)
bad result,using lr=1e-5,see part 2 next
2: loss_fn = torch.nn.SmoothL1Loss(size_average=True)
relatively good result,using lr=5e-3,since the loss scale is roughly 1:100 compared with above L1 loss

part 2:
and there is one more thing,I think your learning rate is not right,since in the original GOTURN,the base_lr: 0.000001 is 1e-6 that's all right.but in the corresponding tracker.prototxt file,the learned fc layer has parameters like :
name: "fc6-new"
type: "InnerProduct"
bottom: "pool5_concat"
top: "fc6"
param {
lr_mult: 10
decay_mult: 1
}
which means the indeed lr for fc layer is equal to : base_lr*lr_mult=1e-5.
so the lr for fc layer should set to 1e-5 in our pytorch code.

so,as far as i can see,there are two problems remains:
1:should we use better loss function like SmoothL1Loss?
2:have you reproduce the original GOTURN result using this code? HOW? and what's the best learning rate schedule?

@LiangXu123 LiangXu123 changed the title thechoice of loss function thechoice of loss function and learning rate Apr 18, 2018
@LiangXu123 LiangXu123 changed the title thechoice of loss function and learning rate the choice of loss function and learning rate Apr 18, 2018
@amoudgl
Copy link
Owner

amoudgl commented Apr 18, 2018

@cc786537662 thanks for your efforts.

In my experiment the following two loss function get absolutely different results:
1: loss_fn = torch.nn.L1Loss(size_average=False)
bad result,using lr=1e-5,see part 2 next
2: loss_fn = torch.nn.SmoothL1Loss(size_average=True)
relatively good result,using lr=5e-3,since the loss scale is roughly 1:100 compared with above L1 loss

I believe that the results are almost the same with both the loss functions. Smooth L1 loss torch.nn.SmoothL1Loss(size_average=True), reports per sample loss (on a batch of 50 samples) whereas torch.nn.L1Loss(size_average=False) reports the total loss on a batch of 50 samples. I get a per sample loss Smooth L1 loss of around 1.1 in the first few iterations of which is equivalent to l1 loss (~180) on a batch (it's just that loss is defined in a different way, that's why we are getting different number). I haven't tested in long-term training if this Smooth L1 loss loss would work or not but my priority would be to replicate the original GOTURN results using their exact formulation, if possible.

and there is one more thing,I think your learning rate is not right,since in the original GOTURN,the base_lr: 0.000001 is 1e-6 that's all right.but in the corresponding tracker.prototxt file,the learned fc layer has parameters like ...

Similar to original GOTURN project, learning rate is set differently for weights and biases here in train.py.

have you reproduce the original GOTURN result using this code? HOW? and what's the best learning rate schedule?

I trained the pytorch model for 280k iterations, I get a loss of around ~90-100 and it saturates there. Original GOTURN project gets loss of around ~50 in the same number of iterations although we follow the exact same batch formation procedure, learning rates, models etc. I am still trying to replicate the GOTURN results using pytorch. As of now, I found out that exp_lr_scheduler in train.py needs to be modified to handle different learning rates for weights and biases. Currently, after one step (i.e. 1e5 iterations), it sets learning rate to gamma*1e-6 for all the weights and biases which is incorrect, I believe. But still, loss at 1e5 iterations (~100) is more than the original GOTURN loss at 1e5 iterations (~50).

@amoudgl
Copy link
Owner

amoudgl commented Apr 18, 2018

pygoturn training plot:
image

Original GOTURN training plot (credits: @sydney0zq):

image

@LiangXu123
Copy link
Author

great job man,thanks for your quick reply.my code is not the latest version so i don't know you have add different learning rate for weights and biases ,as for now, I am still working on reproducing the paper's result, and I am trying to keep everything the same as the paper said,after about 12 epoch of training,evaluation with vot toolkit,the result looks like:

@LiangXu123
Copy link
Author

rankingplot_baseline_mean
tracker_legend

@LiangXu123
Copy link
Author

as you can see,there is a large margin between the paper GOTURN result with my experiment result marked as GOTURN_My.
and now I got a little confuse about training iterations,the paper use 500000 iterations,with a batchsize of 50,while the training list is about 28W images pair,which means one epoch contain about 28W/50=5600 iterations,the total epoch = 500000/5600=90 epochs,that's really require a lot of time to train the network,even with my TITAN XP ,one epoch need about 2 hours,90 epoch need about 7.5 days to train,Is that correct?
training set is the same as the paper used:ALOV300+Image DET

@amoudgl
Copy link
Owner

amoudgl commented Apr 18, 2018

That sounds correct, it took 4 days for ~280,000 iterations on GeForce GTX 1080 Ti.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants