PR - Ticket 51072 - improve autotune defaults #4126

389-ds-bot · 2020-09-13T09:37:04Z

Cloned from Pagure Pull-Request: https://pagure.io/389-ds-base/pull-request/51073

Created at 2020-05-07 06:47:13 by firstyear (@Firstyear)
Merged at 2020-06-03 01:15:59

Bug Description: we have learnt that the CPU autotuning is too aggresive, potentially
decreasing throughput due to overhead in context switching and lock contention, and
that our memory tuning is not aggressive enough, at only 10% of the system memory.
Additionally, in containers, we are able to have access to different memory limits
and reservations, so we can choose to be even more forward in our selection.

Fix Description: Change thread tuning to match the number of threads available on
the system. Change memory tuning to 25% of system memory by default. Finally add
an environment variable to containers allowing more aggressive tuning to be
set DS_MEMORY_PERCENTAGE. Later this could be set to a higher default value.

Resolves: #4125

Author: William Brown william@blackhats.net.au

Review by: ???

389-ds-bot · 2020-09-13T09:37:07Z

Comment from firstyear (@Firstyear) at 2020-05-07 06:48:14

It's worth noting, that even on basic tests, this was significantly faster on my machine:

before:
35 passed, 3 skipped, 150 warnings in 239.14s (0:03:59)
after:
35 passed, 3 skipped, 150 warnings in 173.03s (0:02:53)

That's roughly a 25% speedup.

389-ds-bot · 2020-09-13T09:37:08Z

Comment from tbordaz (@tbordaz) at 2020-05-07 10:50:58

Is 'threads' the final nsslapd-threadnumber ?
If yes, it looks very high.

The fix looks good but I have a doubt. Default was 30. Unless workers are very slow or doing very long job, I would expect 30 workers would be enough for most of load with rapid operations. By any chance do you have searchrate/modrate numbers that shows #workers > 50 or 100 are beneficial ?

389-ds-bot · 2020-09-13T09:37:09Z

Comment from firstyear (@Firstyear) at 2020-05-07 12:37:01

@tbordaz it sets the number of threads based on how many CPU hardware threads are presented by the OS. so if you have a 4 core machine, it's 4. If it's 256 it's 256. If you have an 8 core with hyper threads, it would be 16.

The <512 there is a cap that we don't exceed. Not a minimum.

389-ds-bot · 2020-09-13T09:37:10Z

Comment from vashirov (@vashirov) at 2020-05-07 13:08:44

Another data point: search rate on a 4 CPU machine, 1 client with 1-10 threads:

thread #	4	24
1	5837.800	6160.100
2	8549.308	11939.750
3	7687.200	17349.333
4	9962.475	7191.825
5	16807.538	8386.949
6	17051.641	9447.744
7	17788.385	9990.026
8	18838.359	10417.077
9	22419.949	10889.000
10	23950.950	11561.769

389-ds-bot · 2020-09-13T09:37:10Z

Comment from firstyear (@Firstyear) at 2020-05-07 13:46:29

@vashirov wow, those numbers are stunning. I think the scaling at the high end (10 client threads) is more important than the low numbbers since we do need to consider high concurrency as a key workload for us.

389-ds-bot · 2020-09-13T09:37:11Z

Comment from lkrispen (@elkris) at 2020-05-07 16:25:05

yes, it looks good at first sight. But you need to see what happens in a mixed load, I think write operations can easiiy block 4 threads and delay all binds and searches.

We should not optimize for one specific load pattern

389-ds-bot · 2020-09-13T09:37:12Z

Comment from firstyear (@Firstyear) at 2020-05-08 01:32:18

I think that this may not be true going forward, as with lmdb on a concurrent cache design we can only have a single active writer, which means that we could distinguish between read operations and write operations in the thread pool, to guarantee that bind/read is always seperate.

Flip side of this, is we could just also have many many readers stalling writers causing them to stall too.

Saying this, I still agree that @vashirov can do some more of his excellent load testing to check this patch, :)

389-ds-bot · 2020-09-13T09:37:13Z

Comment from vashirov (@vashirov) at 2020-05-12 09:12:52

Here are the test results with 1-30 client threads and 4, 8, 16, 24 worker threads:
https://fedorapeople.org/groups/389ds/ci/pr51073/search.html
https://fedorapeople.org/groups/389ds/ci/pr51073/modify.html
https://fedorapeople.org/groups/389ds/ci/pr51073/mixed.html
https://fedorapeople.org/groups/389ds/ci/pr51073/auth.html

This is only with this patch applied, no other tunings.

389-ds-bot · 2020-09-13T09:37:14Z

Comment from tbordaz (@tbordaz) at 2020-05-12 10:15:16

Thanks for all these runs !

@vashirov for mixed load is it sync operation ? is it accounting MOD+SEARCH as one operation ? It is looking like the search rate is hidden by the mod rate.

For MODs at the moment we can not really conclude of a benefit of high/lower workers. For searchs there is still an unexpected significant negative impact of #workers.

389-ds-bot · 2020-09-13T09:37:15Z

Comment from vashirov (@vashirov) at 2020-05-12 10:26:14

@vashirov for mixed load is it sync operation ? is it accounting MOD+SEARCH as one operation ? It is looking like the search rate is hidden by the mod rate.

It was a SRCH followed by the MOD:

[12/May/2020:08:23:26.169046247 +0200] conn=1868 op=3715 SRCH base="dc=example,dc=com" scope=2 filter="(uid=user.8306)" attrs="givenName sn mail"
[12/May/2020:08:23:26.169258473 +0200] conn=1868 op=3715 RESULT err=0 tag=101 nentries=1 etime=0.000289852
[12/May/2020:08:23:26.169500575 +0200] conn=1868 op=3716 MOD dn="uid=user.8306,ou=People,dc=example,dc=com"
[12/May/2020:08:23:26.181633127 +0200] conn=1868 op=3716 RESULT err=0 tag=103 nentries=0 etime=0.012312424

I will add another test with async SRCH and MOD.

389-ds-bot · 2020-09-13T09:37:16Z

Comment from firstyear (@Firstyear) at 2020-05-13 06:45:54

My analysis of what this shows is that search seems to improve with more threads, but something causes contention leading to the loss - so lower threads == less contention yielding the cpu-matched threads to give better search throughput and latency. It appears in the mixed workload our writes are heavily impacting the searches, so I think our write path is likely to be preventing search performance improvement. Regardless, it didn't make it worse, so I'm of course in favour of this change :)

389-ds-bot · 2020-09-13T09:37:16Z

Comment from mreynolds (@mreynolds389) at 2020-05-13 15:50:29

IMHO LGTM, like William said it's not hurting the numbers. It's definitely an improvement, and if we need to fine tune at a later date so be it.

389-ds-bot · 2020-09-13T09:37:18Z

Comment from mreynolds (@mreynolds389) at 2020-05-13 15:58:09

Actually there was something I'd like to see tested with this change. A machine with more CPU/cores.

So we tested a 4 core machine and setting the thread number to 4 was great, but what about a 16 core system with varying worker threads. Do we same improvement if we set the thread number to 16 vs 32 or 8 or 4?

@vashirov - would it be hard to reserve a system with this hardware and run one more rounds of tests?

389-ds-bot · 2020-09-13T09:37:19Z

Comment from mreynolds (@mreynolds389) at 2020-06-02 18:29:11

When I did investigation for another potential customer a few years ago I also saw that setting the thread number to the number of cores gave the best performance. I think this is definitely an improvement over what we had, ack

389-ds-bot · 2020-09-13T09:37:19Z

Comment from firstyear (@Firstyear) at 2020-06-03 01:13:21

rebased onto 5eacf45e7caa50de2721f85d7fbee58767bcb8f0

389-ds-bot · 2020-09-13T09:37:20Z

Comment from firstyear (@Firstyear) at 2020-06-03 01:15:22

rebased onto 9a06935

389-ds-bot · 2020-09-13T09:37:21Z

Comment from firstyear (@Firstyear) at 2020-06-03 01:16:00

Pull-Request has been merged by Firstyear

389-ds-bot · 2020-09-13T09:37:22Z

Patch
51073.patch

389-ds-bot added merged Migration flag - PR pr Migration flag - PR labels Sep 13, 2020

389-ds-bot closed this as completed Sep 13, 2020

389-ds-bot mentioned this issue Sep 13, 2020

Improve autotuning defaults (hosts and container) #4125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR - Ticket 51072 - improve autotune defaults #4126

PR - Ticket 51072 - improve autotune defaults #4126

389-ds-bot commented Sep 13, 2020 •

edited

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

PR - Ticket 51072 - improve autotune defaults #4126

PR - Ticket 51072 - improve autotune defaults #4126

Comments

389-ds-bot commented Sep 13, 2020 • edited

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 •

edited