Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I use apex on 2080ti, I get the following error, how can I solve it? #42

Open
yeyuanzheng177 opened this issue Dec 9, 2019 · 7 comments

Comments

@yeyuanzheng177
Copy link

RuntimeError: expected scalar type Float but found Half (data at /usr/local/lib/python3.5/dist-packages/torch/include/ATen/core/TensorMethods.h:1386)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f9f2dbdd441 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f9f2dbdcd7a in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #2: float* at::Tensor::data() const + 0xcf (0x7f9f1c69fa2f in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #3: dcn_v2_cuda_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0xbc0 (0x7f9f1c6a4b50 in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #4: dcn_v2_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x8b (0x7f9f1c689c0b in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #5: + 0x1f91c (0x7f9f1c69591c in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #6: + 0x1f99e (0x7f9f1c69599e in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)
frame #7: + 0x1cdc0 (0x7f9f1c692dc0 in /home/yyz/bigdisk/CenterNet-master0/src/lib/models/networks/DCNv2/_ext.cpython-35m-x86_64-linux-gnu.so)

frame #11: python3() [0x4e3423]
frame #14: THPFunction_apply(_object*, _object*) + 0x6b1 (0x7f9f2e3bf491 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #18: python3() [0x4e3537]
frame #22: python3() [0x4e3423]
frame #24: python3() [0x4f08be]
frame #26: python3() [0x55fbf6]
frame #30: python3() [0x4e3537]
frame #34: python3() [0x4e3423]
frame #36: python3() [0x4f08be]
frame #38: python3() [0x55fbf6]
frame #42: python3() [0x4e3537]
frame #46: python3() [0x4e3423]
frame #48: python3() [0x4f08be]
frame #50: python3() [0x55fbf6]
frame #54: python3() [0x4e3537]
frame #58: python3() [0x4e3423]
frame #60: python3() [0x4f08be]
frame #62: python3() [0x55fbf6]

@ming71
Copy link

ming71 commented Dec 23, 2019

Same error with you, have you worked out ?

@JesseYang
Copy link

This adds apex support with level O1. But I got the following error when running it.
RuntimeError: Function _DCNv2Backward returned an invalid gradient at index 1 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor

@JesseYang
Copy link

The problem is solved. Besides following the code of https://github.com/lbin/DCNv2, I add the following three lines before return in the _backward function of _DCNv2 in dcn_v2.py:

grad_input = grad_input.half()
grad_offset = grad_offset.half()
grad_mask = grad_mask.half()

@zhangjinsong3
Copy link

zhangjinsong3 commented Jun 10, 2020

The problem is solved. Besides following the code of https://github.com/lbin/DCNv2, I add the following three lines before return in the _backward function of _DCNv2 in dcn_v2.py:

grad_input = grad_input.half()
grad_offset = grad_offset.half()
grad_mask = grad_mask.half()

Did you modified any other code besides dcn_v2.py? I modified those code but still get the same error as @yeyuanzheng177 .

@GilbertTam
Copy link

me too,some one has any update?

@steven22tom
Copy link

steven22tom commented Dec 24, 2020

In dcn_v2.py the "_backend.dcn_v2_forward" and "_backend.dcn_v2_backward" only expect float32 input.
So if you use mix-precision(Apex/amp),you should convert float16 to float32, and convert the final output from float32 to float16.
You can consult https://github.com/jasonkena/yolact/tree/amp/external/DCNv2

@ttjjmm
Copy link

ttjjmm commented Jan 1, 2021

In dcn_v2.py the "_backend.dcn_v2_forward" and "_backend.dcn_v2_backward" only expect float32 input.
So if you use mix-precision(Apex/amp),you should convert float16 to float32, and convert the final output from float32 to float16.
You can consult https://github.com/jasonkena/yolact/tree/amp/external/DCNv2
@steven22tom Thank you for your hint, It works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants