Fast and generic implementation using OpenMP and CUDA #44

shikishima-TasakiLab · 2021-06-29T04:11:02Z

I have implemented a module using OpenMP and CUDA that runs faster while maintaining the memory efficiency of your CuPy implementation.

shikishima-TasakiLab/Involution-PyTorch

It also supports TorchScript and 16-bit float.

shikishima-TasakiLab/Involution-PyTorch#1

d-li14 · 2021-06-29T07:02:45Z

Great work! It will help a lot in practice!
As I have mentioned in the README, would you please make a PR to contribute to this repo? Just to be on the safe side, I will run some experiments to double-check the reimplementation's correctness before merging it into the main branch. Thanks.

shikishima-TasakiLab · 2021-06-29T07:51:43Z

I made a PR.
I did not merge the conflicting parts of the README, so please add module descriptions accordingly.

d-li14 · 2021-06-29T08:16:29Z

OK, I will verify and merge it as soon as I could.

shikishima-TasakiLab linked a pull request Jun 29, 2021 that will close this issue

Fast and generic implementation using OpenMP and CUDA #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast and generic implementation using OpenMP and CUDA #44

Fast and generic implementation using OpenMP and CUDA #44

shikishima-TasakiLab commented Jun 29, 2021

d-li14 commented Jun 29, 2021

shikishima-TasakiLab commented Jun 29, 2021

d-li14 commented Jun 29, 2021

Fast and generic implementation using OpenMP and CUDA #44

Fast and generic implementation using OpenMP and CUDA #44

Comments

shikishima-TasakiLab commented Jun 29, 2021

d-li14 commented Jun 29, 2021

shikishima-TasakiLab commented Jun 29, 2021

d-li14 commented Jun 29, 2021