Pytorch CUDA extension by compiling online. Basically, we use pynvrtc(NVIDIA's Python Bindings to NVRTC) for online compiling, and cupy for wrapping CUDA functions.