You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When profiling rdkakfa we observe a significant amount of CPU time spent creating and destroying mutexes. Our test, which was built on gcc 4.8 and running on CentOS 7 showed around 15% of the time in the broker main thread was spent creating and destroying mutexes.
Our investigation showed that most of these were being created for ref counters on buffers. There is an atomic based implementation of ref counters, although it seems to only be enabled on windows builds. We experimented with using the atomics implementation whenever the compiler supported atomics (which it does for gcc 4.8) and we noticed a throughput increase of around 20%.
Additionally local queues were used for reading from the consumer queues. However the local queue object created and destroyed mutexes as well, even though this was no necessary. We experimented with not using mutexes for local queues, however there was no noticed improvement in doing so.
How to reproduce
We ran around 10,000 messages per second through our server which was running rdkafka. Ran perf to analyse how CPU time was being consumed.
Checklist
Please provide the following information:
librdkafka version (release number or git tag): v0.11.4.2
Refcounts were initially atomic-based but was changed in commit 880ae23 to mutexes, the commit message says that a proper performance comparison should be made between atomic and mutex based refcounts, but that was never done.
Two years later a windows version was changed to use atomics since locks are more costly on Windows: c417b55
We should probably switch the non-windows refcounts back to atomics as well, but we'll do some measurements first and this will not be included in the upcoming v0.11.5 release.
Description
When profiling rdkakfa we observe a significant amount of CPU time spent creating and destroying mutexes. Our test, which was built on gcc 4.8 and running on CentOS 7 showed around 15% of the time in the broker main thread was spent creating and destroying mutexes.
Our investigation showed that most of these were being created for ref counters on buffers. There is an atomic based implementation of ref counters, although it seems to only be enabled on windows builds. We experimented with using the atomics implementation whenever the compiler supported atomics (which it does for gcc 4.8) and we noticed a throughput increase of around 20%.
Additionally local queues were used for reading from the consumer queues. However the local queue object created and destroyed mutexes as well, even though this was no necessary. We experimented with not using mutexes for local queues, however there was no noticed improvement in doing so.
How to reproduce
We ran around 10,000 messages per second through our server which was running rdkafka. Ran perf to analyse how CPU time was being consumed.
Checklist
Please provide the following information:
debug=..
as necessary) from librdkafka - Not relevantThe text was updated successfully, but these errors were encountered: