[CONV] Asymmetric padding #4511

optima2005 · 2019-12-12T07:15:04Z

The PR is to implement the 1st item in issue #2682.

I tried to find as more as possible the places where padding is handled as symmetric. But there may be some missing. Please comment is your find more.

The following module/file are kept using symmetic padding due to interface limitaion or I am not sure if changing to asymmetric would cause any trouble.

topi/python/topi/x86/conv2d_alter_op.py
topi/python/topi/rocm/conv2d.py
topi/include/topi/nn.h
convolution with 'cudnn' lib.

optima2005 · 2019-12-12T07:16:01Z

@FrozenGene @Rasterer

FrozenGene · 2019-12-12T07:58:48Z

For topi/python/topi/x86/conv2d_alter_op.py, do you mean legalize? We should modify it. Because the get_int_tuple["padding"] maybe return 4D padding. Then next we should use pad_top, pad_bottom to represent (pad_h, pad_h) , pad_left, pad_right to represent (pad_w, pad_w). i.e. padding[0] and padding[1] there.

For topi/python/topi/rocm/conv2d.py, I think we could get pad_h / pad_w use pad_top + pad_bottom / pad_left + pad_right, then pass to miopen.conv2d_forward. This is the same as cudnn. We don't change the interface of cudnn, but change the value passed to them.

For topi/include/topi/nn.h , conv2d_* / depthwise_conv2d_* shouldn't be used in any other places, because we use Python's topi version of conv2d / depthwise_conv2d, so we could keep them.

FrozenGene · 2019-12-12T07:59:16Z

@Huyuwei It would be nice you could help to review too.

topi/tests/python/test_topi_conv2d_nchw.py

optima2005 · 2019-12-13T05:45:58Z

@FrozenGene I have addressed your comments, please go on review. Thanks!

FrozenGene · 2019-12-24T02:44:13Z

Thanks. Could you help to resolve the conflict?

cc @optima2005 @zhiics @yzhliu @Huyuwei anyone could help to review and handle it?

optima2005 · 2019-12-24T06:31:39Z

rebase to master to fix conflict.

yzhliu · 2020-01-01T21:54:25Z

@optima2005 could you rebase again? sorry for the delayed review

optima2005 · 2020-01-02T06:56:05Z

@yzhliu I have rebased to master. Thanks!

optima2005 · 2020-01-02T06:57:43Z

Cannot reproduce the ci failure in my local box. How to recheck?

yzhliu · 2020-01-02T17:12:05Z

@optima2005 do you have gpu instance? if it's the environment difference, docker is probably the easiest way to reproduce.

optima2005 · 2020-01-03T01:52:58Z

@yzhliu, I have GPU instance and using the docker environment. I am wondering if the failure is occasional and might be due to the CI environment.
Anyway, I rebased again to re-trigger the ci.

yzhliu · 2020-01-06T03:53:43Z

Thanks @optima2005 @FrozenGene

tqchen · 2020-01-06T17:16:03Z

NOTE: there has been two CI failures on the master since this PR get merged.

Both are on the CI instance aws.g4.n0.cuda0 , @yzhliu @zhiics it would be great if you can confirm if such error can be reproduced.

It could either due to a real problem, which we need to address, or due to some caching issues(as i see this PR changes the memoize format), it can be confirmed by running docker from a clean folder.

zhiics · 2020-01-06T17:52:13Z

@tqchen Sure. Let me log into the machine and run the unit test from the master to see if it fails or not.

zhiics · 2020-01-06T19:30:26Z

@tqchen I just tested the lasted master branch locally and also tried to execute some of the failed tests in the the docker image. They both worked well. Let me clean the folder and enable the CI again to see if it solves the problem.

tqchen · 2020-01-07T00:55:55Z

Seems we still get some flaky error, see https://ci.tvm.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-4635/runs/2/nodes/244/log/?start=0 @zhiics would be great if you can try to confirm again

zhiics · 2020-01-07T01:15:45Z

@tqchen Sure. I will double check tonight once I get home.

tqchen · 2020-01-07T01:19:32Z

Note, we might need to run the exact script as the error could due to interference between tests

zhiics · 2020-01-07T05:04:31Z

Yeah, I can confirm that it fails with the following error:

topi/tests/python/test_topi_depthwise_conv2d_back_weight.py::test_topi_depthwise_conv2d_backward_weight_nhwc Fatal Python error: Segmentation fault

when I ran docker/bash.sh tvmai/ci-gpu:v0.56 ./tests/scripts/task_python_topi.sh on the G4 instance.

I can get many this type of errors when I just run pytest -s -v topi/tests/python:

topi/tests/python/test_topi_transform.py:168: in verify_concatenate
    check_device(device)
topi/tests/python/test_topi_transform.py:159: in check_device
    foo = tvm.build(s, tensor_l + [out_tensor], device, name="concatenate")
python/tvm/build_module.py:638: in build
    fhost, mdev = _build_for_device(flist, tar, target_host)
python/tvm/build_module.py:504: in _build_for_device
    mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
python/tvm/codegen.py:36: in build_module
    return _Build(lowered_func, target)
tvm/_ffi/_cython/./function.pxi:304: in tvm._ffi._cy3.core.FunctionBase.__call__
    ???
tvm/_ffi/_cython/./function.pxi:239: in tvm._ffi._cy3.core.FuncCall
    ???
tvm/_ffi/_cython/./function.pxi:228: in tvm._ffi._cy3.core.FuncCall3
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   tvm._ffi.base.TVMError: Traceback (most recent call last):
E     [bt] (5) /home/ubuntu/workspace/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7f22b56b8c01]
E     [bt] (4) /home/ubuntu/workspace/tvm/build/libtvm.so(+0xb592f5) [0x7f22b4ef42f5]
E     [bt] (3) /home/ubuntu/workspace/tvm/build/libtvm.so(tvm::codegen::Build(tvm::Array<tvm::LoweredFunc, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xb85) [0x7f22b501a835]
E     [bt] (2) /home/ubuntu/workspace/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::Array<tvm::LoweredFunc, void>)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::Array<tvm::LoweredFunc, void>)>(tvm::runtime::Module (*)(tvm::Array<tvm::LoweredFunc, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x4e) [0x7f22b5045f4e]
E     [bt] (1) /home/ubuntu/workspace/tvm/build/libtvm.so(tvm::codegen::BuildCUDA(tvm::Array<tvm::LoweredFunc, void>)+0x3c2) [0x7f22b564ec52]
E     [bt] (0) /home/ubuntu/workspace/tvm/build/libtvm.so(+0x131912b) [0x7f22b56b412b]
E     File "tvm/_ffi/_cython/./function.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback
E     File "/home/ubuntu/workspace/tvm/python/tvm/autotvm/measure/measure_methods.py", line 591, in tvm_callback_cuda_compile
E       ptx = nvcc.compile_cuda(code, target=target, arch=AutotvmGlobalScope.current.cuda_target_arch)
E     File "/home/ubuntu/workspace/tvm/python/tvm/contrib/nvcc.py", line 94, in compile_cuda
E       cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
E     File "/home/ubuntu/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
E       restore_signals, start_new_session)
E     File "/home/ubuntu/anaconda3/lib/python3.6/subprocess.py", line 1275, in _execute_child
E       restore_signals, start_new_session, preexec_fn)
E   OSError: [Errno 12] Cannot allocate memory

zhiics · 2020-01-07T05:27:17Z

Not sure if it is because we don't have enough swap memory on it. I can create a swap space and see if it works.

tqchen · 2020-01-07T05:53:30Z

@zhiics The nvcc error might be irrelevant, we should focus on the error within the docker container

given that error was pretty recent(could cause by this merge or a few one before it) can you first run a git history bisect to see if a particular commit caused the error?

Also if the segfault on topi/tests/python/test_topi_depthwise_conv2d_back_weight.py is reproducible, it would be great to get a stack trace(you can install gdb on the docker and run python using gdb)

yzhliu · 2020-01-07T20:55:13Z

also we need to update the tophub log files. workload does not match after the change as the padding dimension now becomes 4D

tqchen · 2020-01-07T21:25:34Z

opened a new thread #4646 to track this.

* [CONV] Asymmetic padding * fix lint error * update for legalize, rocm and cudnn * add more test cases * change more symmetric padding * change conv2d winograd tests according orginal cases * remove 'alter_op_layout.h' header in bitserial.cc

optima2005 changed the title ~~[CONV] Asymmetic padding~~ [CONV] Asymmetric padding Dec 12, 2019

optima2005 commented Dec 13, 2019

View reviewed changes

topi/tests/python/test_topi_conv2d_nchw.py Outdated Show resolved Hide resolved

optima2005 mentioned this pull request Dec 13, 2019

Implement 1d deconvolution #4476

Merged

optima2005 changed the title ~~[CONV] Asymmetric padding~~ [WIP] [CONV] Asymmetric padding Dec 15, 2019

optima2005 changed the title ~~[WIP] [CONV] Asymmetric padding~~ [CONV] Asymmetric padding Dec 16, 2019

FrozenGene approved these changes Dec 24, 2019

View reviewed changes

optima2005 force-pushed the conv_asymmetric_padding branch from 25c3af9 to b366641 Compare December 24, 2019 06:29

optima2005 force-pushed the conv_asymmetric_padding branch from b366641 to 421a290 Compare January 2, 2020 05:27

optima2005 added 7 commits January 3, 2020 09:23

[CONV] Asymmetic padding

b113596

fix lint error

c085e3e

update for legalize, rocm and cudnn

e2327a7

add more test cases

5f435e6

change more symmetric padding

8297f2b

change conv2d winograd tests according orginal cases

ef44bab

remove 'alter_op_layout.h' header in bitserial.cc

a492b9b

optima2005 force-pushed the conv_asymmetric_padding branch from 421a290 to a492b9b Compare January 3, 2020 01:35

yzhliu approved these changes Jan 6, 2020

View reviewed changes

yzhliu merged commit 34b98eb into apache:master Jan 6, 2020

yzhliu added the status: accepted label Jan 6, 2020

tqchen mentioned this pull request Jan 7, 2020

[TEST][FLAKY] topi/tests/python/test_topi_depthwise_conv2d_back_weight.py #4646

Closed

tqchen mentioned this pull request Feb 8, 2020

[RFC] Asymmetric padding for convolution #2682

Closed

3 tasks

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CONV] Asymmetric padding #4511

[CONV] Asymmetric padding #4511

optima2005 commented Dec 12, 2019

optima2005 commented Dec 12, 2019

FrozenGene commented Dec 12, 2019

FrozenGene commented Dec 12, 2019

optima2005 commented Dec 13, 2019

FrozenGene commented Dec 24, 2019

optima2005 commented Dec 24, 2019

yzhliu commented Jan 1, 2020 •

edited

optima2005 commented Jan 2, 2020

optima2005 commented Jan 2, 2020

yzhliu commented Jan 2, 2020

optima2005 commented Jan 3, 2020

yzhliu commented Jan 6, 2020

tqchen commented Jan 6, 2020

zhiics commented Jan 6, 2020

zhiics commented Jan 6, 2020

tqchen commented Jan 7, 2020

zhiics commented Jan 7, 2020

tqchen commented Jan 7, 2020

zhiics commented Jan 7, 2020

zhiics commented Jan 7, 2020

tqchen commented Jan 7, 2020 •

edited

yzhliu commented Jan 7, 2020

tqchen commented Jan 7, 2020

[CONV] Asymmetric padding #4511

[CONV] Asymmetric padding #4511

Conversation

optima2005 commented Dec 12, 2019

optima2005 commented Dec 12, 2019

FrozenGene commented Dec 12, 2019

FrozenGene commented Dec 12, 2019

optima2005 commented Dec 13, 2019

FrozenGene commented Dec 24, 2019

optima2005 commented Dec 24, 2019

yzhliu commented Jan 1, 2020 • edited

optima2005 commented Jan 2, 2020

optima2005 commented Jan 2, 2020

yzhliu commented Jan 2, 2020

optima2005 commented Jan 3, 2020

yzhliu commented Jan 6, 2020

tqchen commented Jan 6, 2020

zhiics commented Jan 6, 2020

zhiics commented Jan 6, 2020

tqchen commented Jan 7, 2020

zhiics commented Jan 7, 2020

tqchen commented Jan 7, 2020

zhiics commented Jan 7, 2020

zhiics commented Jan 7, 2020

tqchen commented Jan 7, 2020 • edited

yzhliu commented Jan 7, 2020

tqchen commented Jan 7, 2020

yzhliu commented Jan 1, 2020 •

edited

tqchen commented Jan 7, 2020 •

edited