-
Couldn't load subscription status.
- Fork 116
Description
I am unsure of whether this is by design or an issue with the DPCPP runtime. However, every time I select a accelerator device for setting up a queue and submit a kernel to it, for the duration of the execution of the kernel on device, the CPU utilization remains at 100% i.e one full core is being occupied.
This wasn't the case when using C for Metal. I believe even level-zero allows for the CPU to not busy wait while the accelerator is executing a job. Similar is the case with CUDA where after offloading to GPU the CPU utilization is < 5%.
Is this something that is going to be addressed? This is critical to every single customer I work with.