Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why you not support CPU #34

Open
azuryl opened this issue Oct 25, 2019 · 11 comments
Open

why you not support CPU #34

azuryl opened this issue Oct 25, 2019 · 11 comments

Comments

@azuryl
Copy link

azuryl commented Oct 25, 2019

it is difficult to rrealize by code?

@yossibiton
Copy link

yossibiton commented Nov 12, 2019

I agree - CPU version would be very useful for debugging purposes.

I'm trying to use the CornerNet code, which relies on your code (https://github.com/xingyizhou/CenterNet/tree/master/src/lib/models/networks/DCNv2).
It fails to run the inference demo with most of the models, because of memory issues (i have 4GB GPU).

@abhigoku10
Copy link

@azuryl @yossibiton @CharlesShang do we have cpu version of dcnv2 ? if not when can we expect the cpu version

@palver7
Copy link
Contributor

palver7 commented Mar 17, 2020

@abhigoku10, @yossibiton, @azuryl, I have modified DCNv2 from this repository to add the CPU functionality. I have submitted a pull request to Charles Shang, but so far there is no response from him. Have a look and try my implementation: https://github.com/palver7/DCNv2 .

@CharlesShang Please have a look and comment/review on my pull request.

@macqueen09
Copy link

@abhigoku10, @yossibiton, @azuryl, I have modified DCNv2 from this repository to add the CPU functionality. I have submitted a pull request to Charles Shang, but so far there is no response from him. Have a look and try my implementation: https://github.com/palver7/DCNv2 .

@CharlesShang Please have a look and comment/review on my pull request.

your link https://github.com/palver7/DCNv2 are 404 , where can I get CPU DCNv2 .
Thanks very much .

@abhigoku10
Copy link

@palver7 thanks for sharing it , but getting 404 error can you share you the link

@palver7
Copy link
Contributor

palver7 commented May 21, 2020

Hi, @macqueen09 @abhigoku10, Charles Shang has merged my repo with his, now DCNv2 in this repo can operate using cpu or gpu. Because of this, I do not need to maintain my repo and I deleted it. That is why you get the 404 error You can re download the DCNv2 and run python3 testcpu.py to see if it runs on your cpu.

@abhigoku10
Copy link

@palver7 can you share the location of ur repo , i tried to find it but could not see in your profile thanks for doing it

@palver7
Copy link
Contributor

palver7 commented May 22, 2020

@abhigoku10 I have deleted DCNv2 from my repo. Check again this link https://github.com/CharlesShang/DCNv2 readme. it now has a line that says run python testcpu.py to check if it runs on CPU. This was from my merged repo.

Also, If you check the files inside the src/cpu directory you will see that they now contain actual codes instead of the previous "not implemented on cpu" error message placeholders. You can now use Charles' DCNv2 on CPU as well as GPU.

@abhigoku10
Copy link

@palver7 @CharlesShang thanks a lot for work you guys have done !!!

@tabsun
Copy link

tabsun commented Jun 19, 2020

@abhigoku10 I have deleted DCNv2 from my repo. Check again this link https://github.com/CharlesShang/DCNv2 readme. it now has a line that says run python testcpu.py to check if it runs on CPU. This was from my merged repo.

Also, If you check the files inside the src/cpu directory you will see that they now contain actual codes instead of the previous "not implemented on cpu" error message placeholders. You can now use Charles' DCNv2 on CPU as well as GPU.

Great work!
And I have used your dcnv2-cpu version into mmdetection for prediction and get correct result.
But the cpu dcnv2 is really slow. In my situation one dcn operation will cost 200~600ms as GPU only use 3ms. For networks with multiple dcn layers, the speed is a real concern. When I want to speed up it, I read the code and "yeah, not much to do".
Do you have some advice for better implementation? Or any other implementation we can refer to ?

Update:
I added openmp into im2col, it's a good tool to speed up loop operations.

@palver7
Copy link
Contributor

palver7 commented Jun 23, 2020

@tabsun Hi, I am happy to hear the CPU implementation works for you. Thanks for sharing about openmp too. I was going to suggest that you try making a CPU version of the TH Cuda blas Sgemmbatched routine, since that was what Charles used (in the dcn_v2_cuda.cu file) to improve the CUDA version. I changed that to just ordinary TH float blas gemm because I cannot find the CPU version for the cuda batched gemm routine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants