Skip to content

[pulsar-client-cpp] Excessive locking cause significant performance degradation #116

@AndrewJD79

Description

@AndrewJD79

Describe the bug
Implementation of statistics in cpp client have two concurrency issues.

  1. ProducerStatsImpl (and ConsumerStatsImpl) classes use a single shared lock to protect access to internal data. The lock is taken on each sent or received message. Under high load this shared lock causes signficant contention and performance degradation.
    Profiler shows that sending and receiving threads block each-other.

original-profiling

Since sending and receving functions access different member subset they should be protected by different mutex or other approach should be selected.
As example after patching issue I've got about 1/3 throughtput improvement. As you can see on screenshot below threads are witing on I/O but not on mutexes.
pathed-profiling

  1. ProducerStatsImpl implementation has races between destructor and DeadlineTimer callback. Consider following scenario:

    1. ProducerStatsImpl destructor acquire the mutex
    2. DeadlineTimer calls calback flushAndReset and blocked on mutex
    3. ProducerStatsImpl calls timer.cancel and cancel any pending operation but it cannot cancel already executed callback at step 2
    4. ProducerStatsImpl destructor release mutex
    5. DeadlineTimer acquire the mutex
    6. ProducerStatsImpl destructor destroy object
    7. DeadlineTimer callback access to deallocated memory

Are you willing accept PR for issue number one or both?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions