I have built OpenBLAS with OpenMP, and with NUM_CORES=12 which is the maximum I have across my machines.
Now, the docs tell I should define OMP_NUM_THREADS... but, let's suppose I don't define it and I run the code on a CPU with 4 native threads... will it use 12 threads, or will it ask the OpenMP implementation to get the recommended number of threads for the system?
If OMP_NUM_THREADS must be really set, I suggest to change that behavior, because if the user didn't set OMP_NUM_THREADS, the expected behaviour would be to use the max number of threads for the system rather than the NUM_CORES value, which might be non-optimal for the system.