add dcn from mmdetection #693

zimenglan-sysu-512 · 2019-04-19T14:23:26Z

add deformable convolution and deformable pooling from mmdetection. here thanks my friend Jinqiang who help me out to add them.

…stributed

…d use 'BaseStem'&'Bottleneck' to simply codes

…ount of indentation and code duplication

…, and add dilation into ResNetHead

facebook-github-bot · 2019-04-19T14:44:13Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2019-04-19T14:55:16Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

fmassa

This looks generally good, thanks!

I didn't try it myself but I'll merge this as is, let's see what happens :-)

fmassa · 2019-04-19T15:20:23Z

maskrcnn_benchmark/layers/dcn/deform_conv_func.py

+            )
+        return output
+
+    @staticmethod


You should ideally make it once_differentiable as well, given that there is no support for double backwards here

hi @fmassa
add once_differentiable like this?

@staticmethod @once_differentiable def backward(ctx, grad_output):

Yes, exactly, that's how it's done for RoIAlign / RoIPool

maskrcnn-benchmark/maskrcnn_benchmark/layers/roi_align.py

Lines 25 to 27 in b318c3e

@staticmethod

@once_differentiable

def backward(ctx, grad_output):

Jacobew · 2020-04-19T06:34:30Z

@zimenglan-sysu-512 Hi, thanks for your excellent contribution! I'm building my own model but I've encountered an error when using DFConv2d from maskrcnn_benchmark.layers.misc.
When the kernel_size of DFConv2d (deformablev2) is 3, everything goes well. But when changing the kernel_size to 5 or 7, a CUDA error happens. Could you please take a look at it?

To reproduce this error, just change this line kernel_size=3, into kernel_size=5 and excute:

python -m torch.distributed.launch --nproc_per_node=2 tools/train_net.py --config configs/dcn/e2e_mask_rcnn_mdconv_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2

The error log:

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) (gemm at /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/cuda/CUDABlas.cpp:174)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f4c2071a627 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x4168427 (0x7f4c265ad427 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: THCudaBlas_Sgemm + 0x7e (0x7f4c269b8e9e in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: + 0x45c479b (0x7f4c26a0979b in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: THCudaTensor_addmm + 0x57 (0x7f4c26a0e0a7 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #5: + 0x4239a64 (0x7f4c2667ea64 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #6: + 0x41c3ba5 (0x7f4c26608ba5 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #7: + 0x36e2931 (0x7f4c25b27931 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #8: at::Tensor::addmm_(at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) const + 0x229 (0x7f4c51746849 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: modulated_deform_conv_cuda_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, bool) + 0xeab (0x7f4c032864a7 in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #10: modulated_deform_conv_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, bool) + 0x1aa (0x7f4c0322a5ca in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #11: + 0x6d7c0 (0x7f4c032407c0 in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #12: + 0x6dc4e (0x7f4c03240c4e in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #13: + 0x67a20 (0x7f4c0323aa20 in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #14: _PyMethodDef_RawFastCallKeywords + 0x264 (0x55eb34d58fc4 in /home/user/anaconda3/envs/myenv/bin/python)
frame #15: _PyCFunction_FastCallKeywords + 0x21 (0x55eb34d590e1 in /home/user/anaconda3/envs/myenv/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x4e3c (0x55eb34db51cc in /home/user/anaconda3/envs/myenv/bin/python)
frame #17: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #18: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #19: THPFunction_apply(_object*, _object*) + 0xa0f (0x7f4c51b75a3f in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #20: _PyMethodDef_RawFastCallKeywords + 0x1e0 (0x55eb34d58f40 in /home/user/anaconda3/envs/myenv/bin/python)
frame #21: _PyCFunction_FastCallKeywords + 0x21 (0x55eb34d590e1 in /home/user/anaconda3/envs/myenv/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x4784 (0x55eb34db4b14 in /home/user/anaconda3/envs/myenv/bin/python)
frame #23: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #24: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #25: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #27: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #28: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #29: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #30: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #31: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x5322 (0x55eb34db56b2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #33: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #34: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #35: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #37: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #38: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #39: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #40: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #41: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x5322 (0x55eb34db56b2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #43: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #44: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #45: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #47: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #48: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #49: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #50: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #51: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #52: _PyEval_EvalFrameDefault + 0x4bc6 (0x55eb34db4f56 in /home/user/anaconda3/envs/myenv/bin/python)
frame #53: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #54: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #55: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #57: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #58: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #59: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #60: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #61: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #62: _PyEval_EvalFrameDefault + 0x4bc6 (0x55eb34db4f56 in /home/user/anaconda3/envs/myenv/bin/python)
frame #63: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)

terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered (insert_events at /opt/conda/conda-bld/pytorch_1579040055865/work/c10/cuda/CUDACachingAllocator.cpp:764)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f4c2071a627 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1ab04 (0x7f4c2095ab04 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x1cbd1 (0x7f4c2095cbd1 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x4d (0x7f4c20707b9d in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x687f7a (0x7f4c51b86f7a in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x1ad526 (0x55eb34d8e526 in /home/user/anaconda3/envs/myenv/bin/python)
frame #6: + 0x10e392 (0x55eb34cef392 in /home/user/anaconda3/envs/myenv/bin/python)
frame #7: + 0x10e8d7 (0x55eb34cef8d7 in /home/user/anaconda3/envs/myenv/bin/python)
frame #8: + 0x10e8d7 (0x55eb34cef8d7 in /home/user/anaconda3/envs/myenv/bin/python)
frame #9: + 0xfdb78 (0x55eb34cdeb78 in /home/user/anaconda3/envs/myenv/bin/python)
frame #10: + 0x10eea7 (0x55eb34cefea7 in /home/user/anaconda3/envs/myenv/bin/python)
frame #11: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #12: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #13: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #14: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #15: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #16: PyDict_SetItem + 0x4d2 (0x55eb34d57032 in /home/user/anaconda3/envs/myenv/bin/python)
frame #17: PyDict_SetItemString + 0x4f (0x55eb34d57abf in /home/user/anaconda3/envs/myenv/bin/python)
frame #18: PyImport_Cleanup + 0x9e (0x55eb34d8f0ae in /home/user/anaconda3/envs/myenv/bin/python)
frame #19: Py_FinalizeEx + 0x72 (0x55eb34e02782 in /home/user/anaconda3/envs/myenv/bin/python)
frame #20: + 0x23976b (0x55eb34e1a76b in /home/user/anaconda3/envs/myenv/bin/python)
frame #21: _Py_UnixMain + 0x80 (0x55eb34e1ae30 in /home/user/anaconda3/envs/myenv/bin/python)
frame #22: __libc_start_main + 0xf0 (0x7f4c60104830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: + 0x1df052 (0x55eb34dc0052 in /home/user/anaconda3/envs/myenv/bin/python)

Jacobew · 2020-04-19T06:40:05Z

@fmassa @chengyangfu I'll really appreciate it if you can give me some advice on debugging!

Jacobew · 2020-04-19T06:43:21Z

My environment:

2020-04-19 14:20:09,676 maskrcnn_benchmark INFO:
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.12.3

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 430.40
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.3.0
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6.0.21

Versions of relevant libraries:
[pip] numpy==1.18.2
[pip] torch==1.4.0
[pip] torchvision==0.2.1
[conda] blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl 2020.0 166 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl-service 2.3.0 py37he904b0f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_fft 1.0.15 py37ha843d7b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_random 1.1.0 py37hd6b4f25_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] pytorch 1.4.0 py3.7_cuda10.0.130_cudnn7.6.3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision 0.2.1 py_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
Pillow (6.1.0)

* make pixel indexes 0-based for bounding box in pascal voc dataset * replacing all instances of torch.distributed.deprecated with torch.distributed * replacing all instances of torch.distributed.deprecated with torch.distributed * add GroupNorm * add GroupNorm -- sort out yaml files * use torch.nn.GroupNorm instead, replace 'use_gn' with 'conv_block' and use 'BaseStem'&'Bottleneck' to simply codes * modification on 'group_norm' and 'conv_with_kaiming_uniform' function * modification on yaml files in configs/gn_baselines/ and reduce the amount of indentation and code duplication * use 'kaiming_uniform' to initialize resnet, disable gn after fc layer, and add dilation into ResNetHead * agnostic-regression for bbox * please set 'STRIDE_IN_1X1' to be 'False' when backbone use GN * add README.md for GN * add dcn from mmdetection

zimenglan-sysu-512 added 30 commits November 25, 2018 22:12

make pixel indexes 0-based for bounding box in pascal voc dataset

f93b369

Merge remote-tracking branch 'upstream/master'

45e4ba8

Merge remote-tracking branch 'upstream/master'

86caae2

replacing all instances of torch.distributed.deprecated with torch.di…

a46bfb9

…stributed

replacing all instances of torch.distributed.deprecated with torch.di…

7bbf46f

…stributed

Merge remote-tracking branch 'upstream/master'

7c8cf41

Merge remote-tracking branch 'upstream/master'

07a0f9c

Merge remote-tracking branch 'upstream/master'

6f09a6f

Merge remote-tracking branch 'upstream/master'

c4e3245

add GroupNorm

baba31f

add GroupNorm -- sort out yaml files

4877e36

use torch.nn.GroupNorm instead, replace 'use_gn' with 'conv_block' an…

d4ae039

…d use 'BaseStem'&'Bottleneck' to simply codes

modification on 'group_norm' and 'conv_with_kaiming_uniform' function

333864d

modification on yaml files in configs/gn_baselines/ and reduce the am…

58da4d5

…ount of indentation and code duplication

Merge remote-tracking branch 'upstream/master'

1798e63

Merge remote-tracking branch 'upstream/master'

ecc68c4

Merge remote-tracking branch 'upstream/master'

02a86f3

use 'kaiming_uniform' to initialize resnet, disable gn after fc layer…

9808a21

…, and add dilation into ResNetHead

Merge remote-tracking branch 'upstream/master'

d1ce06e

agnostic-regression for bbox

4de3488

Merge remote-tracking branch 'upstream/master'

52be8d7

Merge remote-tracking branch 'upstream/master'

288e16f

Merge remote-tracking branch 'upstream/master'

d91ff2d

Merge remote-tracking branch 'upstream/master'

d177092

Merge remote-tracking branch 'upstream/master'

a52e159

Merge remote-tracking branch 'upstream/master'

f7ad55e

please set 'STRIDE_IN_1X1' to be 'False' when backbone use GN

483fca8

merge from upstream/master

f5786e9

add README.md for GN

d2d55f9

Merge remote-tracking branch 'upstream/master'

fc08cdd

zimenglan-sysu-512 added 3 commits April 1, 2019 14:51

Merge remote-tracking branch 'upstream/master'

db01edb

Merge remote-tracking branch 'upstream/master'

ca7276d

add dcn from mmdetection

eee83fc

This was referenced Apr 19, 2019

Will this benchmark support the deformable convnets？ #681

Closed

support deformable convolution? #424

Closed

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Apr 19, 2019

fmassa approved these changes Apr 19, 2019

View reviewed changes

fmassa merged commit 1d6e9ad into facebookresearch:master Apr 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dcn from mmdetection #693

add dcn from mmdetection #693

zimenglan-sysu-512 commented Apr 19, 2019

facebook-github-bot commented Apr 19, 2019

facebook-github-bot commented Apr 19, 2019

fmassa left a comment

fmassa Apr 19, 2019

zimenglan-sysu-512 Apr 20, 2019

fmassa Apr 20, 2019

Jacobew commented Apr 19, 2020 •

edited

Jacobew commented Apr 19, 2020

Jacobew commented Apr 19, 2020

	@staticmethod
	@once_differentiable
	def backward(ctx, grad_output):

add dcn from mmdetection #693

add dcn from mmdetection #693

Conversation

zimenglan-sysu-512 commented Apr 19, 2019

facebook-github-bot commented Apr 19, 2019

facebook-github-bot commented Apr 19, 2019

fmassa left a comment

Choose a reason for hiding this comment

fmassa Apr 19, 2019

Choose a reason for hiding this comment

zimenglan-sysu-512 Apr 20, 2019

Choose a reason for hiding this comment

fmassa Apr 20, 2019

Choose a reason for hiding this comment

Jacobew commented Apr 19, 2020 • edited

Jacobew commented Apr 19, 2020

Jacobew commented Apr 19, 2020

Jacobew commented Apr 19, 2020 •

edited