Slightly modify the detection of the number of CPU cores #44973

rschu1ze · 2023-01-06T10:39:39Z

AMD offers 48/67/64 core systems (Milan), and soon CPUs with up to 96 cores (Genoa). Intel also sells Xeon CPUs with >32 cores. If the system has >= 32 logical cores then the current logic is to use only half of them. If SMT (HyperThreading) is on, then ClickHouse will effectively utilize only the physical cores (i.e. "disable" HyperThreading by means of the software).

However, if SMT is disabled (logical == physical core count), which is not uncommon, then ClickHouse uses only half the available physical cores.

With this PR, the 32-core threshold is retained but the number of used cores is only reduced if SMT is on.

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Remove the limitation that on systems with >=32 cores and SMT disabled ClickHouse uses only half of the cores

This might be a bit controversial but 1. AMD offers 48/67/64 (true) core systems already (Milan, (*)), up to 96 cores/CPU soon (Genoa), 2. Intel sells a few Xeon CPUs with > cores (**), and we shouldn't artificially limit core utilization on such sytems. (*) https://en.wikipedia.org/wiki/Epyc (**) https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable/platinum/products.html

rschu1ze · 2023-01-06T10:42:58Z

@nickitat Do you know if SMP/hyperthreading generally benefits/deteriorates performance? In case of the latter, it will be better to check programmatically if SMP is on and halve the max used cores only in this case.

nickitat · 2023-01-06T11:17:00Z

looks like generally (e.g. if we take the whole set of clickbench queries) we should benefit from hyper threading. it was the case when we brought it back for 16 cores instances. it is because normal queries do have stalls on memory access and branch mispredictions. only heavy number crunching queries without lots of memory accesses (like Q29 from CB) could probably slow down, but it is fairly marginal case.
so if we run CB on machines with different number of cores I would bet that we will see better relative time with HT for both intel and amd even for cpu_count >= 32.

rschu1ze · 2023-01-06T11:38:46Z

Tnx, this makes sense.
Even if HT isn't beneficial, overhead will be low I guess (1-3%?).
(I could actually not find not much material how HT impacts other DBs, so I guess it is not a big point of contention.)

@alexey-milovidov Feel free to merge or advice further.

nickitat · 2023-01-06T11:40:38Z

imo it worth to do measurements anyway to be confident in what we are doing. e.g. launch all c6a.<n>xlarge, c5.<n>xlarge for n > 4 and run clickbench.

rschu1ze · 2023-01-06T20:13:29Z

Some funny measurements of ClickBench on a 128-vCore / 64 physical core instance (m6i.32xlarge):

current HEAD: 64 threads --> user time: 3.463s
not artificially reducing the thread number: 128 threads --> user time: 3.677s

(averaging 6 measurements each)

So with HyperThreadin,g CPU consumption is surprisingly 6% worse. Of course, this could be because ClickBench is CPU-intensive.

I nevertheless think that the code should be made aware of HT so we don't underutilize >32-core systems with HT disabled. We can keep the existing threshold though.

alexey-milovidov · 2023-01-06T21:50:51Z

Hyperthreading cores don't increase performance linearly.

If a query has a low number of cache misses and branch mispredictions, hyperthreading makes performance worse.
If a query has a high number of cache misses and branch mispredictions, hyperthreading makes performance better, but usually in the 10..50% range. Therefore, using all HT cores for a single query by default will worsen performance on many concurrent queries.

Cache misses mainly appear while doing large GROUP BY, DISTINCT, IN, and JOIN. But on a very large number of cores, the memory becomes the bottleneck earlier than we start using all hyper-threaded cores.

It will make sense to allow more cores by default, but queries should automatically stop using so many threads when concurrent queries start running. I can be done with #37285, but this setting is not enabled by default.

alexey-milovidov · 2023-01-06T21:51:02Z

PS. SMT, not SMP.

src/Common/getNumberOfPhysicalCPUCores.cpp

porting ClickHouse/ClickHouse#44973 on Feb 20, 2023

Disable HyperThreading on big machines

a733303

alexey-milovidov changed the title ~~(draft) Remove SMP deactivation on big machines~~ (draft) Remove SMT deactivation on big machines Jan 6, 2023

rschu1ze changed the title ~~(draft) Remove SMT deactivation on big machines~~ Remove SMT deactivation on big machines Jan 22, 2023

rschu1ze added 2 commits January 22, 2023 16:22

Merge remote-tracking branch 'origin/master' into remove-smp-limitation

d7416f4

Some fixups

d07b7ce

rschu1ze marked this pull request as ready for review January 22, 2023 17:30

rschu1ze added the can be tested Allows running workflows for external contributors label Jan 22, 2023

robot-clickhouse-ci-1 added the pr-performance Pull request with some performance improvements label Jan 23, 2023

rschu1ze added 3 commits January 23, 2023 19:48

Fix ARM build

1a1fa4a

Yet another build fix

36d6af0

Fix style

094e42d

alexey-milovidov reviewed Jan 29, 2023

View reviewed changes

src/Common/getNumberOfPhysicalCPUCores.cpp Show resolved Hide resolved

alexey-milovidov changed the title ~~Remove SMT deactivation on big machines~~ Slightly modify the detection of the number of CPU cores Jan 29, 2023

rschu1ze added 2 commits January 29, 2023 17:25

Handle obscure case that /proc/cpuinfo is unmounted

1bf280c

Update getNumberOfPhysicalCPUCores.cpp

33a719b

serxa self-assigned this Feb 19, 2023

serxa reviewed Feb 19, 2023

View reviewed changes

src/Common/getNumberOfPhysicalCPUCores.cpp Show resolved Hide resolved

src/Common/getNumberOfPhysicalCPUCores.cpp Show resolved Hide resolved

serxa approved these changes Feb 20, 2023

View reviewed changes

rschu1ze merged commit 135170b into master Feb 20, 2023

rschu1ze deleted the remove-smp-limitation branch February 20, 2023 10:22

yokofly added a commit to timeplus-io/proton that referenced this pull request Jan 31, 2024

Slightly modify the detection of the number of CPU cores

d833879

porting ClickHouse/ClickHouse#44973 on Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly modify the detection of the number of CPU cores #44973

Slightly modify the detection of the number of CPU cores #44973

rschu1ze commented Jan 6, 2023 •

edited

rschu1ze commented Jan 6, 2023

nickitat commented Jan 6, 2023

rschu1ze commented Jan 6, 2023 •

edited

nickitat commented Jan 6, 2023 •

edited

rschu1ze commented Jan 6, 2023

alexey-milovidov commented Jan 6, 2023

alexey-milovidov commented Jan 6, 2023

Slightly modify the detection of the number of CPU cores #44973

Slightly modify the detection of the number of CPU cores #44973

Conversation

rschu1ze commented Jan 6, 2023 • edited

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

rschu1ze commented Jan 6, 2023

nickitat commented Jan 6, 2023

rschu1ze commented Jan 6, 2023 • edited

nickitat commented Jan 6, 2023 • edited

rschu1ze commented Jan 6, 2023

alexey-milovidov commented Jan 6, 2023

alexey-milovidov commented Jan 6, 2023

rschu1ze commented Jan 6, 2023 •

edited

rschu1ze commented Jan 6, 2023 •

edited

nickitat commented Jan 6, 2023 •

edited