When I use apex on 2080ti, I get the following error, how can I solve it? #42

yeyuanzheng177 · 2019-12-09T09:25:41Z

RuntimeError: expected scalar type Float but found Half (data at /usr/local/lib/python3.5/dist-packages/torch/include/ATen/core/TensorMethods.h:1386)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f9f2dbdd441 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f9f2dbdcd7a in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #2: float* at::Tensor::data() const + 0xcf (0x7f9f1c69fa2f in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #3: dcn_v2_cuda_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0xbc0 (0x7f9f1c6a4b50 in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #4: dcn_v2_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x8b (0x7f9f1c689c0b in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #5: + 0x1f91c (0x7f9f1c69591c in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #6: + 0x1f99e (0x7f9f1c69599e in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #7: + 0x1cdc0 (0x7f9f1c692dc0 in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)

frame #11: python3() [0x4e3423]
frame #14: THPFunction_apply(_object*, _object*) + 0x6b1 (0x7f9f2e3bf491 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #18: python3() [0x4e3537]
frame #22: python3() [0x4e3423]
frame #24: python3() [0x4f08be]
frame #26: python3() [0x55fbf6]
frame #30: python3() [0x4e3537]
frame #34: python3() [0x4e3423]
frame #36: python3() [0x4f08be]
frame #38: python3() [0x55fbf6]
frame #42: python3() [0x4e3537]
frame #46: python3() [0x4e3423]
frame #48: python3() [0x4f08be]
frame #50: python3() [0x55fbf6]
frame #54: python3() [0x4e3537]
frame #58: python3() [0x4e3423]
frame #60: python3() [0x4f08be]
frame #62: python3() [0x55fbf6]

ming71 · 2019-12-23T01:27:55Z

Same error with you, have you worked out ?

JesseYang · 2020-04-18T13:24:22Z

This adds apex support with level O1. But I got the following error when running it.
RuntimeError: Function _DCNv2Backward returned an invalid gradient at index 1 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor

JesseYang · 2020-04-20T22:16:45Z

The problem is solved. Besides following the code of https://github.com/lbin/DCNv2, I add the following three lines before return in the _backward function of _DCNv2 in dcn_v2.py:

grad_input = grad_input.half()
grad_offset = grad_offset.half()
grad_mask = grad_mask.half()

zhangjinsong3 · 2020-06-10T06:12:40Z

The problem is solved. Besides following the code of https://github.com/lbin/DCNv2, I add the following three lines before return in the _backward function of _DCNv2 in dcn_v2.py:
grad_input = grad_input.half()
grad_offset = grad_offset.half()
grad_mask = grad_mask.half()

Did you modified any other code besides dcn_v2.py? I modified those code but still get the same error as @yeyuanzheng177 .

GilbertTam · 2020-09-09T11:51:13Z

me too,some one has any update?

steven22tom · 2020-12-24T07:40:32Z

In dcn_v2.py the "_backend.dcn_v2_forward" and "_backend.dcn_v2_backward" only expect float32 input.
So if you use mix-precision(Apex/amp)，you should convert float16 to float32, and convert the final output from float32 to float16.
You can consult https://github.com/jasonkena/yolact/tree/amp/external/DCNv2

ttjjmm · 2021-01-01T14:10:09Z

In dcn_v2.py the "_backend.dcn_v2_forward" and "_backend.dcn_v2_backward" only expect float32 input.
So if you use mix-precision(Apex/amp)，you should convert float16 to float32, and convert the final output from float32 to float16.
You can consult https://github.com/jasonkena/yolact/tree/amp/external/DCNv2
@steven22tom Thank you for your hint, It works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When I use apex on 2080ti, I get the following error, how can I solve it? #42

When I use apex on 2080ti, I get the following error, how can I solve it? #42

yeyuanzheng177 commented Dec 9, 2019

ming71 commented Dec 23, 2019

JesseYang commented Apr 18, 2020

JesseYang commented Apr 20, 2020

zhangjinsong3 commented Jun 10, 2020 •

edited

Loading

GilbertTam commented Sep 9, 2020

steven22tom commented Dec 24, 2020 •

edited

Loading

ttjjmm commented Jan 1, 2021

When I use apex on 2080ti, I get the following error, how can I solve it? #42

When I use apex on 2080ti, I get the following error, how can I solve it? #42

Comments

yeyuanzheng177 commented Dec 9, 2019

ming71 commented Dec 23, 2019

JesseYang commented Apr 18, 2020

JesseYang commented Apr 20, 2020

zhangjinsong3 commented Jun 10, 2020 • edited Loading

GilbertTam commented Sep 9, 2020

steven22tom commented Dec 24, 2020 • edited Loading

ttjjmm commented Jan 1, 2021

zhangjinsong3 commented Jun 10, 2020 •

edited

Loading

steven22tom commented Dec 24, 2020 •

edited

Loading