Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

add dcn from mmdetection #693

Merged
merged 33 commits into from
Apr 19, 2019
Merged

Conversation

zimenglan-sysu-512
Copy link
Contributor

add deformable convolution and deformable pooling from mmdetection. here thanks my friend Jinqiang who help me out to add them.

…d use 'BaseStem'&'Bottleneck' to simply codes
@facebook-github-bot
Copy link

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Apr 19, 2019
Copy link
Contributor

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally good, thanks!

I didn't try it myself but I'll merge this as is, let's see what happens :-)

)
return output

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should ideally make it once_differentiable as well, given that there is no support for double backwards here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @fmassa
add once_differentiable like this?

    @staticmethod
    @once_differentiable
    def backward(ctx, grad_output):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, that's how it's done for RoIAlign / RoIPool

@staticmethod
@once_differentiable
def backward(ctx, grad_output):

@fmassa fmassa merged commit 1d6e9ad into facebookresearch:master Apr 19, 2019
@Jacobew
Copy link
Contributor

Jacobew commented Apr 19, 2020

@zimenglan-sysu-512 Hi, thanks for your excellent contribution! I'm building my own model but I've encountered an error when using DFConv2d from maskrcnn_benchmark.layers.misc.
When the kernel_size of DFConv2d (deformablev2) is 3, everything goes well. But when changing the kernel_size to 5 or 7, a CUDA error happens. Could you please take a look at it?

To reproduce this error, just change this line kernel_size=3, into kernel_size=5 and excute:

python -m torch.distributed.launch --nproc_per_node=2 tools/train_net.py --config configs/dcn/e2e_mask_rcnn_mdconv_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2

The error log:

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) (gemm at /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/cuda/CUDABlas.cpp:174)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f4c2071a627 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x4168427 (0x7f4c265ad427 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: THCudaBlas_Sgemm + 0x7e (0x7f4c269b8e9e in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: + 0x45c479b (0x7f4c26a0979b in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: THCudaTensor_addmm + 0x57 (0x7f4c26a0e0a7 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #5: + 0x4239a64 (0x7f4c2667ea64 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #6: + 0x41c3ba5 (0x7f4c26608ba5 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #7: + 0x36e2931 (0x7f4c25b27931 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #8: at::Tensor::addmm_(at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) const + 0x229 (0x7f4c51746849 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: modulated_deform_conv_cuda_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, bool) + 0xeab (0x7f4c032864a7 in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #10: modulated_deform_conv_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, bool) + 0x1aa (0x7f4c0322a5ca in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #11: + 0x6d7c0 (0x7f4c032407c0 in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #12: + 0x6dc4e (0x7f4c03240c4e in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #13: + 0x67a20 (0x7f4c0323aa20 in /home/user/myenv/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #14: _PyMethodDef_RawFastCallKeywords + 0x264 (0x55eb34d58fc4 in /home/user/anaconda3/envs/myenv/bin/python)
frame #15: _PyCFunction_FastCallKeywords + 0x21 (0x55eb34d590e1 in /home/user/anaconda3/envs/myenv/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x4e3c (0x55eb34db51cc in /home/user/anaconda3/envs/myenv/bin/python)
frame #17: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #18: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #19: THPFunction_apply(_object*, _object*) + 0xa0f (0x7f4c51b75a3f in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #20: _PyMethodDef_RawFastCallKeywords + 0x1e0 (0x55eb34d58f40 in /home/user/anaconda3/envs/myenv/bin/python)
frame #21: _PyCFunction_FastCallKeywords + 0x21 (0x55eb34d590e1 in /home/user/anaconda3/envs/myenv/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x4784 (0x55eb34db4b14 in /home/user/anaconda3/envs/myenv/bin/python)
frame #23: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #24: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #25: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #27: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #28: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #29: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #30: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #31: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x5322 (0x55eb34db56b2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #33: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #34: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #35: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #37: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #38: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #39: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #40: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #41: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x5322 (0x55eb34db56b2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #43: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #44: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #45: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #47: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #48: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #49: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #50: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #51: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #52: _PyEval_EvalFrameDefault + 0x4bc6 (0x55eb34db4f56 in /home/user/anaconda3/envs/myenv/bin/python)
frame #53: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)
frame #54: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #55: PyObject_Call + 0x62 (0x55eb34d028d2 in /home/user/anaconda3/envs/myenv/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x1e1b (0x55eb34db21ab in /home/user/anaconda3/envs/myenv/bin/python)
frame #57: _PyEval_EvalCodeWithName + 0x2f9 (0x55eb34cf6059 in /home/user/anaconda3/envs/myenv/bin/python)
frame #58: _PyFunction_FastCallDict + 0x1d4 (0x55eb34cf7134 in /home/user/anaconda3/envs/myenv/bin/python)
frame #59: _PyObject_Call_Prepend + 0x63 (0x55eb34d0da03 in /home/user/anaconda3/envs/myenv/bin/python)
frame #60: + 0x16fbaa (0x55eb34d50baa in /home/user/anaconda3/envs/myenv/bin/python)
frame #61: _PyObject_FastCallKeywords + 0x4ab (0x55eb34d5961b in /home/user/anaconda3/envs/myenv/bin/python)
frame #62: _PyEval_EvalFrameDefault + 0x4bc6 (0x55eb34db4f56 in /home/user/anaconda3/envs/myenv/bin/python)
frame #63: _PyFunction_FastCallDict + 0x10b (0x55eb34cf706b in /home/user/anaconda3/envs/myenv/bin/python)

terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered (insert_events at /opt/conda/conda-bld/pytorch_1579040055865/work/c10/cuda/CUDACachingAllocator.cpp:764)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f4c2071a627 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1ab04 (0x7f4c2095ab04 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x1cbd1 (0x7f4c2095cbd1 in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x4d (0x7f4c20707b9d in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x687f7a (0x7f4c51b86f7a in /home/user/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x1ad526 (0x55eb34d8e526 in /home/user/anaconda3/envs/myenv/bin/python)
frame #6: + 0x10e392 (0x55eb34cef392 in /home/user/anaconda3/envs/myenv/bin/python)
frame #7: + 0x10e8d7 (0x55eb34cef8d7 in /home/user/anaconda3/envs/myenv/bin/python)
frame #8: + 0x10e8d7 (0x55eb34cef8d7 in /home/user/anaconda3/envs/myenv/bin/python)
frame #9: + 0xfdb78 (0x55eb34cdeb78 in /home/user/anaconda3/envs/myenv/bin/python)
frame #10: + 0x10eea7 (0x55eb34cefea7 in /home/user/anaconda3/envs/myenv/bin/python)
frame #11: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #12: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #13: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #14: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #15: + 0x10eebd (0x55eb34cefebd in /home/user/anaconda3/envs/myenv/bin/python)
frame #16: PyDict_SetItem + 0x4d2 (0x55eb34d57032 in /home/user/anaconda3/envs/myenv/bin/python)
frame #17: PyDict_SetItemString + 0x4f (0x55eb34d57abf in /home/user/anaconda3/envs/myenv/bin/python)
frame #18: PyImport_Cleanup + 0x9e (0x55eb34d8f0ae in /home/user/anaconda3/envs/myenv/bin/python)
frame #19: Py_FinalizeEx + 0x72 (0x55eb34e02782 in /home/user/anaconda3/envs/myenv/bin/python)
frame #20: + 0x23976b (0x55eb34e1a76b in /home/user/anaconda3/envs/myenv/bin/python)
frame #21: _Py_UnixMain + 0x80 (0x55eb34e1ae30 in /home/user/anaconda3/envs/myenv/bin/python)
frame #22: __libc_start_main + 0xf0 (0x7f4c60104830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: + 0x1df052 (0x55eb34dc0052 in /home/user/anaconda3/envs/myenv/bin/python)

@Jacobew
Copy link
Contributor

Jacobew commented Apr 19, 2020

@fmassa @chengyangfu I'll really appreciate it if you can give me some advice on debugging!

@Jacobew
Copy link
Contributor

Jacobew commented Apr 19, 2020

My environment:

2020-04-19 14:20:09,676 maskrcnn_benchmark INFO:
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.12.3

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 430.40
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.3.0
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6.0.21

Versions of relevant libraries:
[pip] numpy==1.18.2
[pip] torch==1.4.0
[pip] torchvision==0.2.1
[conda] blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl 2020.0 166 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl-service 2.3.0 py37he904b0f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_fft 1.0.15 py37ha843d7b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_random 1.1.0 py37hd6b4f25_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] pytorch 1.4.0 py3.7_cuda10.0.130_cudnn7.6.3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision 0.2.1 py_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
Pillow (6.1.0)

Lyears pushed a commit to Lyears/maskrcnn-benchmark that referenced this pull request Jun 28, 2020
* make pixel indexes 0-based for bounding box in pascal voc dataset

* replacing all instances of torch.distributed.deprecated with torch.distributed

* replacing all instances of torch.distributed.deprecated with torch.distributed

* add GroupNorm

* add GroupNorm -- sort out yaml files

* use torch.nn.GroupNorm instead, replace 'use_gn' with 'conv_block' and use 'BaseStem'&'Bottleneck' to simply codes

* modification on 'group_norm' and 'conv_with_kaiming_uniform' function

* modification on yaml files in configs/gn_baselines/ and reduce the amount of indentation and code duplication

* use 'kaiming_uniform' to initialize resnet, disable gn after fc layer, and add dilation into ResNetHead

* agnostic-regression for bbox

* please set 'STRIDE_IN_1X1' to be 'False' when backbone use GN

* add README.md for GN

* add dcn from mmdetection
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants