Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693

mreynolds389 · 2023-03-09T22:54:24Z

Description:

There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.

Then the worker can check for changes on each loop and rebuild the hashtables as needed.

Did some code cleanup in schema.c

relates: #4598

progier389 · 2023-03-10T13:59:36Z

Still reviewing but I have a general comment:

I would have put the code that check version change and duplicate the hash table (in connection.c) in a separate function (and in libslapd rather than in ns-slapd)
And same thing for the cleanup code.
The rational is that we will likely want to reuse that code later on in other threads (import threads and maybe some plugins threads)

ldap/servers/slapd/attrsyntax.c

mreynolds389 · 2023-03-10T14:27:03Z

Still reviewing but I have a general comment:

This is technically still a WIP. Need to do some performance testing with it. I just wanted to get the working code into a PR :-)

* I would have put the code that check version change and duplicate the hash table (in connection.c) in a separate function (and in libslapd rather than in ns-slapd)
  And same thing for the cleanup code.
  The rational is that we will likely want to reuse that code later on in other threads (import threads and maybe some plugins threads)

Sounds good, I'll work on that next...

progier389

Except a few minor points, its looks fine.
I hope we will see the impact in performance test.

P.S: Avoid doing massive code cleanup in a PR impacting sensitive code:
It really makes the review much harder:
For example I am not absolutely sure about what was really changed in schema.c
because the cleanup just hide the functional changes ! -:(
( I think it is all around the DSE_SCHEMA_USE_GLOBAL flags but I am not absolutely sure.)

mreynolds389 · 2023-03-10T14:36:33Z

Except a few minor points, its looks fine. I hope we will see the impact in performance test.

P.S: Avoid doing massive code cleanup in a PR impacting sensitive code: It really makes the review much harder: For example I am not absolutely sure about what was really changed in schema.c because the cleanup just hide the functional changes ! -:( ( I think it is all around the DSE_SCHEMA_USE_GLOBAL flags but I am not absolutely sure.)

I know I know, the cleanup started off simple and small, and then it turned into a can of worms. But yes in schema.c the main change was to add a flag so we always update/access the global hash tables and not the local thread hash tables.

mreynolds389 · 2023-03-10T15:23:24Z

All changes made. I hope to get performance testing done next week.

progier389

LGTM

Firstyear · 2023-03-12T07:39:09Z

There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.

Wouldn't contention on an rwlock imply that there are attempts to frequently write to these values?

Rwlocks with .read() are effectively free since it becomes an atomic ref count.

How did you determine there was contention here?

mreynolds389 · 2023-03-13T12:54:02Z

There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.

Wouldn't contention on an rwlock imply that there are attempts to frequently write to these values?

Rwlocks with .read() are effectively free since it becomes an atomic ref count.

How did you determine there was contention here?

Thierry did this investigation in the parent Issue:

#4598

Using modrate, and removing these RWlock gave a 5% performance increase. I'm not sure what tool he used to measure the contention but that report is also in the parent issue.

progier389 · 2023-03-13T13:56:59Z

I do not think that we are speaking of the usual lock contention here (as the lock is never held in write mode)
but rather memory cache contention
FYI: I also noticed the same thing when I run the performance test about lmdb import.
In the final tests once dn cache was disabled, , according to '/usr/bin/perf c2c', the first four top contentions were in rw lock below attr_syntax_get_by_name_locking_optional (__pthread_rwlock_unlock and ___pthread_rwlock_rdlock)

Firstyear · 2023-03-15T02:37:12Z

I do not think that we are speaking of the usual lock contention here (as the lock is never held in write mode) but rather memory cache contention FYI: I also noticed the same thing when I run the performance test about lmdb import. In the final tests once dn cache was disabled, , according to '/usr/bin/perf c2c', the first four top contentions were in rw lock below attr_syntax_get_by_name_locking_optional (__pthread_rwlock_unlock and ___pthread_rwlock_rdlock)

Memory cache contention doesn't make sense though, because the pages will be in the shared state in the various cpu caches. By splitting this we'll be putting more pressure on the cpu caches because we are no longer sharing the pages.

So I'm still really curious about this and the assumption about the cause that is occuring.

progier389 · 2023-03-15T14:06:02Z

As I told the you the perf c2c tool reports these as a the top cache contention so there are contention even if we do not understand why it happens ! -;)
That said I can understand that you would like to know exactly why it is the case. I am a bit curious too (but not srprised as is not the first time that I have seen that a mostly read rw lock was surprisingly costly).
For my part, I learned to trust the "/usr/bin/perf" tool because it helped me to understand why in my first tests the lmdb online import was ten time slower than the off line one (while the import algorithm is exactly the same)
Anyway I guess we will see the result when we will be able to run the performance test on the new code.

Firstyear · 2023-03-16T00:36:16Z

I would certainly trust perf here, but I think understanding why something happens is going to give a better solution than just splitting up a structure and copying it (which might reduce the contention but it's going to cause data-duplication in the cache meaning we get worse cache sharing and overall worse performance).

So understanding the problem seems like a better first step :)

mreynolds389 · 2023-03-16T17:59:25Z

Initial Performance Results

This was all done on a single system (client and server) with 16 cores

Import

1-2% performance increase (only tested bdb)

Modrate

modrate --hostname localhost --port 389 --bindDN 'cn=dm' --bindPassword XXXXXXX --entryDN "uid=user[1-10000],ou=People,dc=example,dc=com" --attribute givenname --attribute mail --valueLength 12 --numThreads 5

No improvement

Searchrate

searchrate --hostname localhost --port 389 --bindDN 'cn=dm' --bindPassword XXXXXX --baseDN dc=example,dc=com --scope sub --filter FILTER --attribute givenName --attribute sn --attribute mail --numThreads 10

Single Filter: (uid=[1-10000])

19% improvement

Compound Filter: (&(uid=[1-10000])(objectclass=posixaccount))

15% improvement

Conclusion

The original issue said "modrate" got an improvement, but I don't see that. However the search improvement is significant!

mreynolds389 · 2023-03-16T18:11:02Z

Also... For the import test I also tried adding hashtable copies to the bdb_import_foreman(), bdb_import_producer(), and bdb_public_bdb_import_main() threads, but it did not seem to help import performance (with bdb).

mreynolds389 · 2023-03-16T20:23:27Z

Well I was playing around with the test cases and when I reran the same tests I ran above the performance improvement was very small. Sigh. I ran those tests multiple times with each version and the results were very consistent (15-19% improvement), now they are not. I think I need to wait until we get those dedicated lab machines to run these tests correctly (separating client from server).

mreynolds389 · 2023-03-29T13:09:41Z

OK testing on large lab machines shows no noticeable improvement with this fix :-( That doesn't mean this should be closed, as we are most likely hitting another bottle neck, so this fix "could" improve things down the road when other issues are addressed

Description: There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking. Then the worker can check for changes on each loop and rebuild the hashtables as needed. Did some code cleanup in schema.c relates: 389ds#4598 Reviewed by: progier(Thanks!)

progier389 reviewed Mar 10, 2023

View reviewed changes

ldap/servers/slapd/attrsyntax.c Outdated Show resolved Hide resolved

progier389 reviewed Mar 10, 2023

View reviewed changes

ldap/servers/slapd/attrsyntax.c Outdated Show resolved Hide resolved

progier389 approved these changes Mar 10, 2023

View reviewed changes

mreynolds389 force-pushed the attr_syntax_per_thread branch from a3ad9c7 to f06350c Compare March 10, 2023 15:22

progier389 approved these changes Mar 10, 2023

View reviewed changes

mreynolds389 force-pushed the attr_syntax_per_thread branch from f06350c to d628b50 Compare April 5, 2023 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693

Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693

mreynolds389 commented Mar 9, 2023

progier389 commented Mar 10, 2023

mreynolds389 commented Mar 10, 2023

progier389 left a comment

mreynolds389 commented Mar 10, 2023

mreynolds389 commented Mar 10, 2023

progier389 left a comment

Firstyear commented Mar 12, 2023

mreynolds389 commented Mar 13, 2023

progier389 commented Mar 13, 2023

Firstyear commented Mar 15, 2023

progier389 commented Mar 15, 2023

Firstyear commented Mar 16, 2023

mreynolds389 commented Mar 16, 2023 •

edited

mreynolds389 commented Mar 16, 2023 •

edited

mreynolds389 commented Mar 16, 2023

mreynolds389 commented Mar 29, 2023

Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693

Are you sure you want to change the base?

Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693

Conversation

mreynolds389 commented Mar 9, 2023

progier389 commented Mar 10, 2023

mreynolds389 commented Mar 10, 2023

progier389 left a comment

Choose a reason for hiding this comment

mreynolds389 commented Mar 10, 2023

mreynolds389 commented Mar 10, 2023

progier389 left a comment

Choose a reason for hiding this comment

Firstyear commented Mar 12, 2023

mreynolds389 commented Mar 13, 2023

progier389 commented Mar 13, 2023

Firstyear commented Mar 15, 2023

progier389 commented Mar 15, 2023

Firstyear commented Mar 16, 2023

mreynolds389 commented Mar 16, 2023 • edited

Initial Performance Results

Import

Modrate

Searchrate

Single Filter: (uid=[1-10000])

Compound Filter: (&(uid=[1-10000])(objectclass=posixaccount))

Conclusion

mreynolds389 commented Mar 16, 2023 • edited

mreynolds389 commented Mar 16, 2023

mreynolds389 commented Mar 29, 2023

mreynolds389 commented Mar 16, 2023 •

edited

mreynolds389 commented Mar 16, 2023 •

edited