Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do you disable cudnn for batch_norm? #8

Closed
jin-s13 opened this issue Aug 25, 2018 · 7 comments
Closed

Why do you disable cudnn for batch_norm? #8

jin-s13 opened this issue Aug 25, 2018 · 7 comments

Comments

@jin-s13
Copy link

@jin-s13 jin-s13 commented Aug 25, 2018

Thank you for releasing the code. The README reads that it is required to disable cudnn for batch_norm. Would you please explain why we should do this?

@leoxiaobin

This comment has been minimized.

Copy link
Contributor

@leoxiaobin leoxiaobin commented Aug 27, 2018

we found that in our code, the global running stats of cudnn's implement for evaluation will get total wrong results when using some GPU architectures (P40, P100, V100), so we disabled the cudnn's implement for BN.

@7color94

This comment has been minimized.

Copy link

@7color94 7color94 commented Sep 13, 2018

Hi, I wonder under what circumstances will this kind of cudnn bn bug appear ? Does the cudnn bn bug appear for all tasks, not only human pose estimation ?

@leoxiaobin

This comment has been minimized.

Copy link
Contributor

@leoxiaobin leoxiaobin commented Sep 23, 2018

I am not sure if any other task has also this problem and also not know what's the root cause for this.

@makslevental

This comment has been minimized.

Copy link

@makslevental makslevental commented Sep 3, 2019

@leoxiaobin you don't need a sed. you can monkey patch

def monkey_patch_bn():
    # print(inspect.getsource(torch.nn.functional.batch_norm))
    def batch_norm(input, running_mean, running_var, weight=None, bias=None,
                   training=False, momentum=0.1, eps=1e-5):
        if training:
            size = input.size()
            size_prods = size[0]
            for i in range(len(size) - 2):
                size_prods *= size[i + 2]
            if size_prods == 1:
                raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))

        return torch.batch_norm(
            input, weight, bias, running_mean, running_var,
            training, momentum, eps, False
        )
    torch.nn.functional.batch_norm = batch_norm
@LordLiang

This comment has been minimized.

Copy link

@LordLiang LordLiang commented Nov 26, 2019

@leoxiaobin you don't need a sed. you can monkey patch

def monkey_patch_bn():
    # print(inspect.getsource(torch.nn.functional.batch_norm))
    def batch_norm(input, running_mean, running_var, weight=None, bias=None,
                   training=False, momentum=0.1, eps=1e-5):
        if training:
            size = input.size()
            size_prods = size[0]
            for i in range(len(size) - 2):
                size_prods *= size[i + 2]
            if size_prods == 1:
                raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))

        return torch.batch_norm(
            input, weight, bias, running_mean, running_var,
            training, momentum, eps, False
        )
    torch.nn.functional.batch_norm = batch_norm

Hi, I want to know where I should add monkey_patch_bn? main.py? call it before training?

@makslevental

This comment has been minimized.

Copy link

@makslevental makslevental commented Nov 26, 2019

@LordLiang i would call it right after importing torch.nn

@mgarbade

This comment has been minimized.

Copy link

@mgarbade mgarbade commented Feb 10, 2020

is this still relevant in higher versions of pytorch? (eg version 1.4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.