-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contrastive loss layer differs from loss equation #2308
Comments
@SlevinKelevra I have beeen analysing it too recently. I was thinning about bug too, but then I understand it :)
So, except the average, everything is ok. |
Hm, I am not pretty sure... Look at your third point. |
You are right, there is the difference, good catch.
I will try to implement your idea and see if there is any different in practical experiment. |
Ok, I will test it independently. We'll see, might be these changes significantly affect the results. I think it would be good to point @shelhamer out this thread. He may help us :) |
I have quick implementation of it https://gist.github.com/melgor/962800c3200efcfb78c1 As a result, I have bigger value of loss at the last step of learning, but no change in final result. How we can measure if this influence on result? @SlevinKelevra Could you check it? Notice, that in #2312 was noticed bug in gradient. I think it is connected with bug in loss function. |
Unfortunately, I haven't checked it yet. I try to figure out how to implement a backpropagation part in CUDA code. |
At first glance I though this wasn't really an issue, but it does result in a noticeably different cost function. It would be interesting to see if it results in better embeddings. An easy test would be with the MNIST data using the notebook in the examples folder. It might not result in a significantly different embedding, because both cost functions are encouraging similar things. However, learning might be easier / faster with the original cost function from Hadsell et all, especially when you consider the gradients for non-matching pairs near dist=0.0 and dist=margin.
|
@SlevinKelevra and @melgor, If you guys want to double check it, and give it a try, I created a PR (#2321) which fixes this. I ran the MNIST example using both versions, and didn't see a big difference---but that is just a simple problem. My inclination would be to just fix it so that it matches the Hadsell et al paper. Here are the learning curve and embedding using the current version. |
Great, I didn't get big improvements in my project after the fix had been applied as well. I will double check a little bit later, may be I missed something. |
Closing for fix in #2321. |
Hi,
I am a little bit confused of an implementation of contrastive loss function. As pointed in LeCun's paper
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf the equation of this loss function should be:
L = 0.5 * (1 - Y) * D^2 + 0.5 * Y * {max(0, margin - D)}^2
. It's an equation 4 in the original paper. As far as I understood, source code of contrastive loss layer implements this loss function in other way. Here is a piece of code (lines 48-52 of contrastive_loss_layer.cpp file):Is it a bug in implementation?
The text was updated successfully, but these errors were encountered: