Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d #56

Open
Marsjunwang opened this issue May 12, 2023 · 1 comment

Comments

@Marsjunwang
Copy link

if torch.cuda.is_available():
    x = torch.randn(64, 384, 256, 31).cuda()
    m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda()
    m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda()
    m2.load_state_dict(m1.state_dict())
    with torch.cuda.amp.autocast(True):
        import time 
        t1 = time.time()
        y1 = m1(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The big kernal time is {t2 - t1}')
        t1 = time.time()
        y2 = m2(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The pytorch time is {t2 - t1}')
    (y1.mean() * 1024).backward()
    (y2.mean() * 1024).backward()
    print("output difference:", ((y1 - y2) ** 2).mean())
    print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())

The big kernal time is 0.02849888801574707
The pytorch time is 0.1821727752685547

    x = torch.randn(64, 384, 256, 200).cuda()
    m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda()
    m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda()
    m2.load_state_dict(m1.state_dict())
    with torch.cuda.amp.autocast(True):
        import time 
        t1 = time.time()
        y1 = m1(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The big kernal time is {t2 - t1}')
        t1 = time.time()
        y2 = m2(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The pytorch time is {t2 - t1}')
    (y1.mean() * 1024).backward()
    (y2.mean() * 1024).backward()
    print("output difference:", ((y1 - y2) ** 2).mean())
    print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())

The big kernal time is 0.951230525970459
The pytorch time is 1.1460661888122559

torch.random.manual_seed(0)
if torch.cuda.is_available():
    x = torch.randn(64, 384, 256, 256).cuda()
    m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda()
    m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda()
    m2.load_state_dict(m1.state_dict())
    with torch.cuda.amp.autocast(True):
        import time 
        t1 = time.time()
        y1 = m1(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The big kernal time is {t2 - t1}')
        t1 = time.time()
        y2 = m2(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The pytorch time is {t2 - t1}')
    (y1.mean() * 1024).backward()
    (y2.mean() * 1024).backward()
    print("output difference:", ((y1 - y2) ** 2).mean())
    print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())

The big kernal time is 1.524620771408081
The pytorch time is 1.4657022953033447

@Marsjunwang
Copy link
Author

Please give me some guidiance to fix it?,then I can try this great idea with big kernal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant