Single Consumer Multiple Producers Ring Buffer

This repository contains a set of corrected and extended ring-buffer implementations for the single-consumer, multiple-producer (SCMP) setting.

It is modified from this implementation, which is a single-producer, single-consumer (SPSC) ring buffer. The original SPSC implementation is correct under its intended concurrency model, but it has concurrency bugs when used with multiple producers, primarily due to data races and incorrect publication ordering among concurrent producers.

In particular, when multiple producers try to reserve and publish slots concurrently, the original design does not provide a safe mechanism to:

serialize slot reservation (avoid multiple producers writing the same slot),
enforce in-order publication (ensure the consumer does not observe partially-written entries), or
coordinate tail/commit progression across producers.

This repository provides multiple alternative fixes and a set of further optimizations.

Directory Structure

.
├── data                # Results
├── include
│   ├── common.hpp      # Common functions
│   ├── lock.hpp        # Simple locking
│   ├── notify.hpp      # Wait-for-notification
│   ├── optimized.hpp   # Optimized implementation
│   ├── single.hpp      # Single producer (original)
│   ├── spin.hpp        # Busy waiting for prior commits
│   ├── tail.hpp        # Change tail pointer to non-atomic
│   ├── yield.hpp       # Yielding in spin lock
│   └── free.hpp        # Lock-free producer (same as `single` but with `&` wrapping)
└── src
    ├── main.cpp        # Driver application
    └── single.cpp      # Driver for single producer

To check differences between implementations, run diff directly. For example:

diff include/spin.hpp include/notify.hpp

Problem analysis

Concurrency issue

The target setting is multiple producers inserting into a shared ring buffer, with a single consumer draining it. In this setting, the original SPSC ring buffer (in include/single.hpp) is unsafe because it implicitly assumes only one producer updates producer-side indices and publishes writes.

Root cause: data race among multiple producers

With multiple producers, typical failure modes include:

Duplicate slot reservation: two producers choose the same slot index and overwrite each other.
Out-of-order publication: a later producer makes its slot visible before an earlier producer has finished writing, allowing the consumer to read incomplete or inconsistent data.
Incorrect index advancement: concurrent updates to producer-side indices are not coordinated, causing missed entries or buffer corruption.

Fix strategies implemented here

We implement three producer-side coordination strategies to resolve the above issues:

Locking (lock) Use a mutex to gain exclusive access to the producer critical section (reserve slot, write element, publish/advance tail).
- Pros: simplest correctness argument, good under low producer count.
- Cons: contention increases rapidly as producers scale.
Spin lock style commit ordering (spin) Producers atomically claim space (reserve a slot) and then busy spin until all earlier producers have published their writes, enforcing ordered commit.
- Pros: avoids kernel transitions; can be fast at moderate contention.
- Cons: wastes CPU under high contention.
Wait-for-notification (notify) Producers atomically claim space, then block (or wait efficiently) until they are notified that earlier producers have published, instead of spinning.
- Pros: reduces wasted CPU cycles under contention.
- Cons: higher coordination overhead; sensitive to notification strategy.

Further optimizations

In addition to correctness fixes, we provide an optimized configuration (include/optimized.hpp) with several micro- and macro-optimizations:

Replace modulo operations with bit masking When RING_SIZE is a power of two, replace:
- idx % RING_SIZE with:
- idx & (RING_SIZE - 1)
Make SafeTail non-atomic Under the chosen publication protocol, SafeTail can be demoted from atomic to non-atomic to reduce synchronization overhead (only safe if the specific ordering guarantees are preserved).
Relax memory barriers Some implementations include stronger-than-necessary fences. The optimized version relaxes barriers while preserving correctness (the trade-off is architecture- and compiler-sensitive, so review changes carefully when porting).
Adaptive strategy (spin → yield under contention) When contention is high, spinning becomes wasteful. The yield variant yields CPU after a threshold, improving system-wide throughput and reducing tail latency.
Use locking upon resource overcommitment When the system is overcommitted (for example, N cores with N+1 runnable producer threads), aggressive spinning can degrade performance. The optimized strategy can fall back to locking or blocking behavior in these regimes.

Build and run experiments

make all

Note

Change TOTAL_CORES in include/common.hpp to the number of cores on your machine (default: 32, it is the number of logical cores).

Caution

Use the -DARM flag to compile for ARM architecture.

Experiments

Testbed

All experiments were run on a bare-metal server with:

CPU: Intel Xeon E5-2630 v3 @ 2.40GHz
Cores: 16 physical CPUs (32 logical cores)
Memory: 64 GiB DRAM

Workload sweep

We sweep the number of producer threads from 1 to 32 in logarithmic steps, holding the consumer configuration fixed, to observe scaling behavior and contention regimes.

Measurement methodology

Warmup: ignore the first 5% of requests as warmup.
Repetitions: 3 runs per data point.
Aggregation: report the arithmetic mean across runs.
Uncertainty: show 95% confidence intervals.

Results

The figure below demonstrates how each strategy fixes the multi-producer data race compared to the original (buggy) baseline,

The below figure illustrates the performance impact of successive optimizations.

The figure below shows the ablation studies isolating each optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
include		include
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Single Consumer Multiple Producers Ring Buffer

Directory Structure

Problem analysis

Concurrency issue

Root cause: data race among multiple producers

Fix strategies implemented here

Further optimizations

Build and run experiments

Experiments

Testbed

Workload sweep

Measurement methodology

Results

About

Uh oh!

Languages

HongyuHe/ringbuffer

Folders and files

Latest commit

History

Repository files navigation

Single Consumer Multiple Producers Ring Buffer

Directory Structure

Problem analysis

Concurrency issue

Root cause: data race among multiple producers

Fix strategies implemented here

Further optimizations

Build and run experiments

Experiments

Testbed

Workload sweep

Measurement methodology

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages