Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After adding self-implemented Layer-Normalization, the backward time of gradient_penalty became large #10

Closed
santisy opened this issue Jun 14, 2017 · 5 comments

Comments

@santisy
Copy link

santisy commented Jun 14, 2017

My implementation of layer-normalization is:

class Layer_Norm(nn.Module):

    def __init__(self, dim):
        super(Layer_Norm, self).__init__()
        self.dim = dim
        self.g = Parameter(torch.zeros(1, dim))
        self.b = Parameter(torch.zeros(1, dim))
        self.init_weights()

    def forward(self, input):
        miu = torch.sum(input, 1).unsqueeze(1)/self.dim
        input_minus_miu = input - miu.expand_as(input)
        sigma = (torch.sum((input_minus_miu).pow(2), 1)/self.dim).sqrt().unsqueeze(1)
        input = input_minus_miu*self.g.expand(input.size())/sigma.expand_as(input) + self.b.expand(input.size())

        return input

    def init_weights(self):
        self.g.data.fill_(1)
        self.b.data.fill_(0)

After plugging in this before ReLU, the backward of gradient_penalty became large 0.1149s compared to the former value 0.0075s.

I compiled the source code from master branch, commit deb0aef30cdaa78f9840bfa4a919ad206e8e73a7 and also modified the ReLU source code before compiling according to your instruction.
I am wondering if it is because the my implementation of layer-normalization contains something not suitable for double backward?

@caogang
Copy link
Owner

caogang commented Jun 15, 2017

Do you confirm that the value is wrong? Maybe you can refer to the gradgradcheck to test your double backward instead of just using the value. The gradgradcheck will be soon added in PR pytorch/pytorch#1643. You can write a gradgradcheck similar to the PR.

@santisy
Copy link
Author

santisy commented Jun 15, 2017

@caogang The result seems reasonable but the time seems a bit too long. I'll test using gradgradcheck later. Thanks for your reply.

@caogang
Copy link
Owner

caogang commented Jun 15, 2017

Oh, your problem is the bigger time cost after pluggin your Layer_Norm module? From my intuition, it may because of the expand or expand_as. Maybe you can test it.

@santisy
Copy link
Author

santisy commented Jun 15, 2017

@caogang what is the other way to complete similar behavior here as expand and expand_as. Broadcasting mechanism seems to still have some problems. pytorch/pytorch#1787. And repeat costs more about 0.3s.

@caogang
Copy link
Owner

caogang commented Jun 15, 2017

Yeah, maybe it is the best methods using expand in current branch. Just waiting for the broadcasting feature to be merged, or you can contrubute to the above PR. :)

@caogang caogang closed this as completed Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants