I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d #56

Marsjunwang · 2023-05-12T06:38:18Z

if torch.cuda.is_available():
    x = torch.randn(64, 384, 256, 31).cuda()
    m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda()
    m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda()
    m2.load_state_dict(m1.state_dict())
    with torch.cuda.amp.autocast(True):
        import time 
        t1 = time.time()
        y1 = m1(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The big kernal time is {t2 - t1}')
        t1 = time.time()
        y2 = m2(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The pytorch time is {t2 - t1}')
    (y1.mean() * 1024).backward()
    (y2.mean() * 1024).backward()
    print("output difference:", ((y1 - y2) ** 2).mean())
    print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())

The big kernal time is 0.02849888801574707
The pytorch time is 0.1821727752685547

    x = torch.randn(64, 384, 256, 200).cuda()
    m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda()
    m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda()
    m2.load_state_dict(m1.state_dict())
    with torch.cuda.amp.autocast(True):
        import time 
        t1 = time.time()
        y1 = m1(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The big kernal time is {t2 - t1}')
        t1 = time.time()
        y2 = m2(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The pytorch time is {t2 - t1}')
    (y1.mean() * 1024).backward()
    (y2.mean() * 1024).backward()
    print("output difference:", ((y1 - y2) ** 2).mean())
    print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())

The big kernal time is 0.951230525970459
The pytorch time is 1.1460661888122559

torch.random.manual_seed(0)
if torch.cuda.is_available():
    x = torch.randn(64, 384, 256, 256).cuda()
    m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda()
    m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda()
    m2.load_state_dict(m1.state_dict())
    with torch.cuda.amp.autocast(True):
        import time 
        t1 = time.time()
        y1 = m1(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The big kernal time is {t2 - t1}')
        t1 = time.time()
        y2 = m2(x)
        torch.cuda.synchronize()
        t2 = time.time()
        print(f'The pytorch time is {t2 - t1}')
    (y1.mean() * 1024).backward()
    (y2.mean() * 1024).backward()
    print("output difference:", ((y1 - y2) ** 2).mean())
    print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())

The big kernal time is 1.524620771408081
The pytorch time is 1.4657022953033447

The text was updated successfully, but these errors were encountered:

Marsjunwang · 2023-05-12T06:41:31Z

Please give me some guidiance to fix it?,then I can try this great idea with big kernal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d #56

I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d #56

Marsjunwang commented May 12, 2023

Marsjunwang commented May 12, 2023

I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d #56

I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d #56

Comments

Marsjunwang commented May 12, 2023

Marsjunwang commented May 12, 2023