Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly modify the detection of the number of CPU cores #44973

Merged
merged 9 commits into from Feb 20, 2023

Conversation

rschu1ze
Copy link
Member

@rschu1ze rschu1ze commented Jan 6, 2023

AMD offers 48/67/64 core systems (Milan), and soon CPUs with up to 96 cores (Genoa). Intel also sells Xeon CPUs with >32 cores. If the system has >= 32 logical cores then the current logic is to use only half of them. If SMT (HyperThreading) is on, then ClickHouse will effectively utilize only the physical cores (i.e. "disable" HyperThreading by means of the software).

However, if SMT is disabled (logical == physical core count), which is not uncommon, then ClickHouse uses only half the available physical cores.

With this PR, the 32-core threshold is retained but the number of used cores is only reduced if SMT is on.

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Remove the limitation that on systems with >=32 cores and SMT disabled ClickHouse uses only half of the cores

This might be a bit controversial but
1. AMD offers 48/67/64 (true) core systems already (Milan, (*)), up to
   96 cores/CPU soon (Genoa),
2. Intel sells a few Xeon CPUs with > cores (**),
and we shouldn't artificially limit core utilization on such sytems.

(*)  https://en.wikipedia.org/wiki/Epyc
(**) https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable/platinum/products.html
@rschu1ze
Copy link
Member Author

rschu1ze commented Jan 6, 2023

@nickitat Do you know if SMP/hyperthreading generally benefits/deteriorates performance? In case of the latter, it will be better to check programmatically if SMP is on and halve the max used cores only in this case.

@nickitat
Copy link
Member

nickitat commented Jan 6, 2023

looks like generally (e.g. if we take the whole set of clickbench queries) we should benefit from hyper threading. it was the case when we brought it back for 16 cores instances. it is because normal queries do have stalls on memory access and branch mispredictions. only heavy number crunching queries without lots of memory accesses (like Q29 from CB) could probably slow down, but it is fairly marginal case.
so if we run CB on machines with different number of cores I would bet that we will see better relative time with HT for both intel and amd even for cpu_count >= 32.

@rschu1ze
Copy link
Member Author

rschu1ze commented Jan 6, 2023

Tnx, this makes sense.
Even if HT isn't beneficial, overhead will be low I guess (1-3%?).
(I could actually not find not much material how HT impacts other DBs, so I guess it is not a big point of contention.)

@alexey-milovidov Feel free to merge or advice further.

@nickitat
Copy link
Member

nickitat commented Jan 6, 2023

imo it worth to do measurements anyway to be confident in what we are doing. e.g. launch all c6a.<n>xlarge, c5.<n>xlarge for n > 4 and run clickbench.

@rschu1ze
Copy link
Member Author

rschu1ze commented Jan 6, 2023

Some funny measurements of ClickBench on a 128-vCore / 64 physical core instance (m6i.32xlarge):

  • current HEAD: 64 threads --> user time: 3.463s
  • not artificially reducing the thread number: 128 threads --> user time: 3.677s

(averaging 6 measurements each)

So with HyperThreadin,g CPU consumption is surprisingly 6% worse. Of course, this could be because ClickBench is CPU-intensive.

I nevertheless think that the code should be made aware of HT so we don't underutilize >32-core systems with HT disabled. We can keep the existing threshold though.

@alexey-milovidov
Copy link
Member

Hyperthreading cores don't increase performance linearly.

If a query has a low number of cache misses and branch mispredictions, hyperthreading makes performance worse.
If a query has a high number of cache misses and branch mispredictions, hyperthreading makes performance better, but usually in the 10..50% range. Therefore, using all HT cores for a single query by default will worsen performance on many concurrent queries.

Cache misses mainly appear while doing large GROUP BY, DISTINCT, IN, and JOIN. But on a very large number of cores, the memory becomes the bottleneck earlier than we start using all hyper-threaded cores.

It will make sense to allow more cores by default, but queries should automatically stop using so many threads when concurrent queries start running. I can be done with #37285, but this setting is not enabled by default.

@alexey-milovidov
Copy link
Member

PS. SMT, not SMP.

@alexey-milovidov alexey-milovidov changed the title (draft) Remove SMP deactivation on big machines (draft) Remove SMT deactivation on big machines Jan 6, 2023
@rschu1ze rschu1ze changed the title (draft) Remove SMT deactivation on big machines Remove SMT deactivation on big machines Jan 22, 2023
@rschu1ze rschu1ze marked this pull request as ready for review January 22, 2023 17:30
@rschu1ze rschu1ze added the can be tested Allows running workflows for external contributors label Jan 22, 2023
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-performance Pull request with some performance improvements label Jan 23, 2023
@alexey-milovidov alexey-milovidov changed the title Remove SMT deactivation on big machines Slightly modify the detection of the number of CPU cores Jan 29, 2023
@serxa serxa self-assigned this Feb 19, 2023
@rschu1ze rschu1ze merged commit 135170b into master Feb 20, 2023
@rschu1ze rschu1ze deleted the remove-smp-limitation branch February 20, 2023 10:22
yokofly added a commit to timeplus-io/proton that referenced this pull request Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants