Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast and generic implementation using OpenMP and CUDA #44

Open
shikishima-TasakiLab opened this issue Jun 29, 2021 · 3 comments · May be fixed by #45
Open

Fast and generic implementation using OpenMP and CUDA #44

shikishima-TasakiLab opened this issue Jun 29, 2021 · 3 comments · May be fixed by #45

Comments

@shikishima-TasakiLab
Copy link

I have implemented a module using OpenMP and CUDA that runs faster while maintaining the memory efficiency of your CuPy implementation.

shikishima-TasakiLab/Involution-PyTorch

It also supports TorchScript and 16-bit float.

shikishima-TasakiLab/Involution-PyTorch#1

@d-li14
Copy link
Owner

d-li14 commented Jun 29, 2021

Great work! It will help a lot in practice!
As I have mentioned in the README, would you please make a PR to contribute to this repo? Just to be on the safe side, I will run some experiments to double-check the reimplementation's correctness before merging it into the main branch. Thanks.

@shikishima-TasakiLab
Copy link
Author

I made a PR.
I did not merge the conflicting parts of the README, so please add module descriptions accordingly.

@d-li14
Copy link
Owner

d-li14 commented Jun 29, 2021

OK, I will verify and merge it as soon as I could.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants