It would be nice to have the ability to sync kernel threads when the kernel is running (similar to how CUDA has `__syncThreads()`
It would be nice to have the ability to sync kernel threads when the kernel is running (similar to how CUDA has
__syncThreads()