Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the loss function in paper #3

Closed
wcyjerry opened this issue Nov 22, 2021 · 7 comments
Closed

the loss function in paper #3

wcyjerry opened this issue Nov 22, 2021 · 7 comments

Comments

@wcyjerry
Copy link

Hi, I have read ur paper again today and read with codes, still got a question that why Lc works, our purpose seems to minimize the dissimilarity but the function have a '-' , I think with the '-' it just do the opposite work.

@ChongjianGE
Copy link
Owner

ChongjianGE commented Nov 22, 2021

Hi @q671383789 ,
Thx for the question.
We truly want to minimize the dissimilarity of the two vectors ${f_1}, {f_2}$, i.e., maximizing the similarity of the two vectors.
The subtraction in $L_c$ loss term truly does the right job here.

Specifically, the <,> operation in $L_c$ could be considered as the dot product. Taking ${f_1}=(1,0)$ for example, where (1,0) is the vector, the $f_2$ should be close to (1,0) to maximize the similarity. Under this case, <$f_1$, $f_2$&gt; meets the maximum value when $f_2$ is close to (1,0). However, the training procedure strives for decreasing the loss term of $L_c$. That's why we use the subtraction in $L_c$ to meet the consistency.

@wcyjerry
Copy link
Author

Hi @q671383789 , Thx for the question. We truly want to minimize the dissimilarity of the two vectors ${f_1}, {f_2}$, i.e., maximizing the similarity of the two vectors. The subtraction in $L_c$ loss term truly does the right job here.

Specifically, the <,> operation in $L_c$ could be considered as the dot product. Taking ${f_1}=(1,0)$ for example, where (1,0) is the vector, the $f_2$ should be close to (1,0) to maximize the similarity. Under this case, <$f_1$, $f_2$&gt; meets the maximum value when $f_2$ is close to (1,0). However, the training procedure strives for decreasing the loss term of $L_c$. That's why we use the subtraction in $L_c$ to meet the consistency.

However, I don't see the L2_norm in the codes.

@ChongjianGE
Copy link
Owner

The L2 norm is implemented by the network module in fc layer

Normalize(),

@ChongjianGE ChongjianGE reopened this Nov 22, 2021
@wcyjerry
Copy link
Author

@ChongjianGE Thxs, But codes seem to be different with paper. it seems like that the both the numerator and denominator are L2_normed matrix not only the denominator term.

@ChongjianGE
Copy link
Owner

The implementation is mathematically equivalent to the loss term in the paper.

@wcyjerry
Copy link
Author

wcyjerry commented Nov 22, 2021 via email

@ChongjianGE
Copy link
Owner

That's alright~
I will close this issue first. Please feel free to reopen it again if there are further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants