You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have observed that when I install NumPy using pip install numpy and run a np.dot() workload, it only utilizes 4 cores (4 threads), even though my Windows on ARM64 device has 12 cores.
I suspect that since we are not using NUM_THREADS while building for ARM64 in this script, it ends up using the number of cores available on the build machine as the value for NUM_THREADS.
To avoid this dependency on the build machine's core count, can we use a flag during the OpenBLAS build similar to what we do for x64 to make the number of threads configurable at runtime?