Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Normalization and the direction of softmax #27

Open
murphyyhuang opened this issue Sep 11, 2019 · 0 comments
Open

Batch Normalization and the direction of softmax #27

murphyyhuang opened this issue Sep 11, 2019 · 0 comments

Comments

@murphyyhuang
Copy link

Hi,

Thank you for sharing your code. It is a very exciting paper! I have a few concerns about some details in your code. Please correct me if I make any mistakes. :-)

  1. The parameters in BatchNormalization is not trainable, as discussed in this issue. In this case, I personally think it is better to call it standardization instead of batch normalization. And I am wondering whether it is possible to view it in the way of CNN and image processing, where 1D batch normalization is applied to every image in a batch and every location in an image. Here we can regard the different nodes in graphs as different locations in images where convolutional kernels are applied.

  2. Which dimension should be applied to BatchNormalization. As discussed in (1), if we regard the nodes in graphs equivalent to the different locations in images, whether it would be better to do batch normalization to the last dimension, which is the feature channel of nodes. I personally guess it might make more sense than applying BatchNormalization to the node dimension.

  3. The direction of softmax in calculating the assignment matrix. I am curious whether it is better to softmax across the dimension of the node or the dimension of the new feature. I agree that to create an assignment matrix, the contribution of the original nodes to the new clusters should sum to 1 by applying softmax to the last dimension. But I am still wondering whether it makes sense if I apply softmax to the dimension of the node. In this way, each column of the assignment matrix could represent a contribution distribution of the original nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant