-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the choice of loss function and learning rate #15
Comments
@cc786537662 thanks for your efforts.
I believe that the results are almost the same with both the loss functions. Smooth L1 loss
Similar to original GOTURN project, learning rate is set differently for weights and biases here in
I trained the pytorch model for 280k iterations, I get a loss of around ~90-100 and it saturates there. Original GOTURN project gets loss of around ~50 in the same number of iterations although we follow the exact same batch formation procedure, learning rates, models etc. I am still trying to replicate the GOTURN results using pytorch. As of now, I found out that |
Original GOTURN training plot (credits: @sydney0zq): |
great job man,thanks for your quick reply.my code is not the latest version so i don't know you have add different learning rate for weights and biases ,as for now, I am still working on reproducing the paper's result, and I am trying to keep everything the same as the paper said,after about 12 epoch of training,evaluation with vot toolkit,the result looks like: |
as you can see,there is a large margin between the paper GOTURN result with my experiment result marked as GOTURN_My. |
That sounds correct, it took 4 days for ~280,000 iterations on GeForce GTX 1080 Ti. |
Hi, guys,have you ever try to find out where the original caffe's L1 loss function really work in pytorch?
part 1:
In my experiment the following two loss function get absolutely different results:
1: loss_fn = torch.nn.L1Loss(size_average=False)
bad result,using lr=1e-5,see part 2 next
2: loss_fn = torch.nn.SmoothL1Loss(size_average=True)
relatively good result,using lr=5e-3,since the loss scale is roughly 1:100 compared with above L1 loss
part 2:
and there is one more thing,I think your learning rate is not right,since in the original GOTURN,the base_lr: 0.000001 is 1e-6 that's all right.but in the corresponding tracker.prototxt file,the learned fc layer has parameters like :
name: "fc6-new"
type: "InnerProduct"
bottom: "pool5_concat"
top: "fc6"
param {
lr_mult: 10
decay_mult: 1
}
which means the indeed lr for fc layer is equal to : base_lr*lr_mult=1e-5.
so the lr for fc layer should set to 1e-5 in our pytorch code.
so,as far as i can see,there are two problems remains:
1:should we use better loss function like SmoothL1Loss?
2:have you reproduce the original GOTURN result using this code? HOW? and what's the best learning rate schedule?
The text was updated successfully, but these errors were encountered: