Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value for the number of threads #89

Closed
avdosev opened this issue Apr 17, 2023 · 5 comments
Closed

Default value for the number of threads #89

avdosev opened this issue Apr 17, 2023 · 5 comments
Labels
hardware Hardware specific issue

Comments

@avdosev
Copy link

avdosev commented Apr 17, 2023

The current default value is cpu_count/2:

self.n_threads = n_threads or max(multiprocessing.cpu_count() // 2, 1)

This value does not seem to be optimal for multicore systems. For example, a CPU with 8 cores will have 4 cores idle. Or to put it simply, we will get twice the slowdown (if there are no more nuances in model execution).

Related issues: #71

In this discussion I would like to know the motivation for such a default value, as it seems that it is not obvious to most users.

@gjmulder
Copy link
Contributor

  • Most physical systems are hyperthreaded
  • Hyperthreading doesn't seem to improve performance due to the memory I/O bound nature of llama.cpp
  • Might be invalid for VMs

@avdosev
Copy link
Author

avdosev commented Apr 17, 2023

Hyperthreading doesn't seem to improve performance due to the memory I/O bound nature of llama.cpp

It's a bit counterintuitive for me. Hypertreading was created to fully utilize the CPU during memory bound programs.

If we are talking about limited CPU utilization on the VM, in my opinion, the library should not solve such case and should offer the most productive option for the base scenario.

In general, it is necessary to test the performance on different systems and choose the most performance default value.

@Priestru
Copy link

Priestru commented Apr 18, 2023

This was tested in original llama. C++ implementation already utilities full available compute of physical cores, increasing amount of threads beyond would only to lower performance due to reason mentioned above.

@avdosev
Copy link
Author

avdosev commented Apr 21, 2023

Okay. If the systems with hypertrading receive reductions, then for them this logic can be left, but why for systems with all physical cores logic is not different?

@gjmulder
Copy link
Contributor

Closing as it is impossible to support every config of hardware.

@gjmulder gjmulder added the hardware Hardware specific issue label May 12, 2023
@gjmulder gjmulder closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardware Hardware specific issue
Projects
None yet
Development

No branches or pull requests

3 participants