Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foreground normalization #281

Closed
powder21 opened this issue Jun 26, 2019 · 3 comments
Closed

foreground normalization #281

powder21 opened this issue Jun 26, 2019 · 3 comments

Comments

@powder21
Copy link

powder21 commented Jun 26, 2019

a
How can the coefficient |x| be canceled in the right term. Is there something that i misunderstand?

@JiahuiYu
Copy link
Owner

@powder21 Hi, sorry for the misleading and for being so patience. It has been a while when I wrote that code. And just now I toke a look at it seriously and found that this term will not be cancelled. On the other hand, as we add a scale before softmax anyway, the norm |x_i| may not matter (can be viewed as a high softmax temperature). Note the softmax is over channels (which means over w_j) thus adding |w_j| is more important.

In conclusion I agree that there should be a norm |x_i| and I appreciate your findings. But the results may not change a lot because anyway the softmax is with high temperature. If you are interested, you can run a simple example of contextual attention as shown in https://github.com/JiahuiYu/generative_inpainting#faq (How to implement contextual attention?) to verify the difference when with or without norm |x_i|.

yi = tf.nn.conv2d(xi, wi_normed, strides=[1,1,1,1], padding="SAME")
yi = tf.nn.softmax(yi*scale, 3)
yi = tf.nn.conv2d_transpose(yi, wi_center, tf.concat([[1], raw_fs[1:]], axis=0), strides=[1,rate,rate,1]) / 4.

@powder21
Copy link
Author

powder21 commented Jun 27, 2019

@JiahuiYu Thanks! I understand That |x_i| doesn't matter rather than is canceled: the high softmax temperature is to find the max similarity (just like one-hot), and the x_i is common if calculate softmax over channels.

Besides, normalization can not be executed on x_i if we don't extract patches from it, but extracting patches may destroy the efficiency by convolution.

@JiahuiYu
Copy link
Owner

Thanks for pointing that out! These issues will be very helpful for those who have same question/concern in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants