New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default value for the number of threads #89
Comments
|
It's a bit counterintuitive for me. Hypertreading was created to fully utilize the CPU during memory bound programs. If we are talking about limited CPU utilization on the VM, in my opinion, the library should not solve such case and should offer the most productive option for the base scenario. In general, it is necessary to test the performance on different systems and choose the most performance default value. |
This was tested in original llama. C++ implementation already utilities full available compute of physical cores, increasing amount of threads beyond would only to lower performance due to reason mentioned above. |
Okay. If the systems with hypertrading receive reductions, then for them this logic can be left, but why for systems with all physical cores logic is not different? |
Closing as it is impossible to support every config of hardware. |
The current default value is cpu_count/2:
llama-cpp-python/llama_cpp/llama.py
Line 102 in b2a24bd
This value does not seem to be optimal for multicore systems. For example, a CPU with 8 cores will have 4 cores idle. Or to put it simply, we will get twice the slowdown (if there are no more nuances in model execution).
Related issues: #71
In this discussion I would like to know the motivation for such a default value, as it seems that it is not obvious to most users.
The text was updated successfully, but these errors were encountered: