The Kangaroo library is more than just a queue. It collects all data (exceptions, sensor data, ...) that should be collected. For each type a separate Kangaroo instance is created. Each instance has one or more export manager implementations (developed by user). These export manager can be extended by a converter (also developed by user). At the end it should standardize the collecting and exporting type. Updates on each export/converter implementations should have no impact in already implemented projects.
Add it to your .Net Framework or DotNet project. The library Kangaroo supports DotNet 10 and higher.
The Kangaroo.Benchmarks project uses BenchmarkDotNet to measure
Kangaroo's performance against equivalent approaches from the .NET BCL:
- TPL Dataflow (
System.Threading.Tasks.Dataflow) –BufferBlock<T>,BatchBlock<T>,ActionBlock<T>,TransformBlock<T>,BroadcastBlock<T> System.Threading.Channels– rawChannel<T>.Writer.TryWriteSystem.Collections.Concurrent.ConcurrentQueue<T>– simple lock-free queue
Benchmarks were executed on:
BenchmarkDotNet v0.14.0, macOS 26.4.1 (25E253) [Darwin 25.4.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK 10.0.103 | Runtime .NET 10.0.3 (Arm64 RyuJIT AdvSIMD)
Measures how fast N items can be added to each queue / buffer without any export
being triggered. Kangaroo is the baseline (Ratio = 1.00).
| Method | N | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|
| ConcurrentQueue Enqueue | 100 | 1.782 µs | 0.27 | 1 | - |
| Channel TryWrite | 100 | 3.690 µs | 0.56 | 2 | - |
| Dataflow BufferBlock.Post | 100 | 4.457 µs | 0.68 | 3 | - |
| Kangaroo AddData | 100 | 6.583 µs | 1.00 | 4 | - |
| ConcurrentQueue Enqueue | 1 000 | 18.048 µs | 0.28 | 1 | 34 720 B |
| Dataflow BufferBlock.Post | 1 000 | 37.202 µs | 0.58 | 2 | 19 200 B |
| Channel TryWrite | 1 000 | 39.034 µs | 0.61 | 2 | 34 720 B |
| Kangaroo AddData | 1 000 | 64.700 µs | 1.00 | 3 | 50 592 B |
| ConcurrentQueue Enqueue | 10 000 | 122.434 µs | 0.35 | 1 | 265 440 B |
| Channel TryWrite | 10 000 | 185.497 µs | 0.53 | 2 | 265 440 B |
| Dataflow BufferBlock.Post | 10 000 | 327.192 µs | 0.94 | 3 | 135 232 B |
| Kangaroo AddData | 10 000 | 348.151 µs | 1.00 | 4 | 396 000 B |
Takeaway – The abstraction overhead of Kangaroo's
AddData(category tagging, switch dispatch) costs roughly 2–4× compared to a rawChannel.TryWrite. This is the expected trade-off for a richer feature set.
Adds N items then drains and "exports" them all at once. Kangaroo manual export is the baseline.
| Method | N | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|
| ConcurrentQueue manual drain | 100 | 2.629 µs | 0.80 | 1 | 6.48 KB |
| Kangaroo batch export | 100 | 3.303 µs | 1.00 | 2 | 11.52 KB |
| Kangaroo size-triggered export | 100 | 4.131 µs | 1.25 | 3 | 11.52 KB |
| Dataflow BatchBlock→ActionBlock | 100 | 9.803 µs | 2.97 | 4 | 7.03 KB |
| ConcurrentQueue manual drain | 1 000 | 24.039 µs | 0.84 | 1 | 49.85 KB |
| Kangaroo batch export | 1 000 | 28.681 µs | 1.00 | 2 | 75.94 KB |
| Dataflow BatchBlock→ActionBlock | 1 000 | 30.447 µs | 1.06 | 3 | 28.19 KB |
| Kangaroo size-triggered export | 1 000 | 41.756 µs | 1.46 | 4 | 75.94 KB |
Takeaway – Kangaroo's synchronous batch export outperforms TPL Dataflow's
BatchBlock + ActionBlockpipeline (3× faster for N=100, on par for N=1 000) because it avoids the async scheduling overhead. A bareConcurrentQueuedrain is faster still, but provides none of Kangaroo's categorisation, filtering, or converter features.
Each item is processed immediately (MaxStoredObjects = 1).
Kangaroo direct export is the baseline.
| Method | N | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|
| Plain delegate loop | 100 | 426.0 ns | 0.10 | 1 | 3.13 KB |
| Kangaroo direct export | 100 | 4,391.7 ns | 1.00 | 2 | 17.08 KB |
| Dataflow ActionBlock (sequential) | 100 | 8,342.5 ns | 1.90 | 3 | 3.36 KB |
| Plain delegate loop | 1 000 | 4,202.8 ns | 0.10 | 1 | 31.25 KB |
| Dataflow ActionBlock (sequential) | 1 000 | 41,714.2 ns | 0.99 | 2 | 15.82 KB |
| Kangaroo direct export | 1 000 | 41,995.2 ns | 1.00 | 2 | 143.64 KB |
Takeaway – Kangaroo's synchronous per-item export is 1.9× faster than TPL Dataflow's sequential
ActionBlockat N=100. At N=1 000 both are essentially neck-and-neck, while Kangaroo provides richer categorisation and converter features.
Simulates repeated burst ingestion: items arrive in batches of BatchSize,
repeated BatchCount times. Kangaroo size-triggered burst is the baseline.
| Method | BatchSize | BatchCount | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|---|
| ConcurrentQueue threshold drain | 100 | 10 | 21.82 µs | 0.72 | 1 | 21.24 KB |
| Channel threshold drain | 100 | 10 | 22.05 µs | 0.73 | 1 | 19.92 KB |
| Kangaroo manual burst export | 100 | 10 | 27.02 µs | 0.89 | 2 | 39.72 KB |
| Kangaroo size-triggered bursts | 100 | 10 | 30.32 µs | 1.00 | 3 | 39.72 KB |
| Dataflow BatchBlock bursts | 100 | 10 | 35.07 µs | 1.16 | 4 | 15.3 KB |
| ConcurrentQueue threshold drain | 100 | 50 | 107.82 µs | 0.74 | 1 | 86.87 KB |
| Channel threshold drain | 100 | 50 | 109.42 µs | 0.75 | 1 | 85.55 KB |
| Kangaroo manual burst export | 100 | 50 | 131.80 µs | 0.91 | 2 | 165.03 KB |
| Dataflow BatchBlock bursts | 100 | 50 | 138.64 µs | 0.95 | 3 | 51.44 KB |
| Kangaroo size-triggered bursts | 100 | 50 | 145.33 µs | 1.00 | 4 | 165.03 KB |
| Dataflow BatchBlock bursts | 500 | 10 | 106.28 µs | 0.73 | 1 | 52.74 KB |
| ConcurrentQueue threshold drain | 500 | 10 | 108.66 µs | 0.74 | 2 | 96.62 KB |
| Channel threshold drain | 500 | 10 | 109.54 µs | 0.75 | 2 | 89.3 KB |
| Kangaroo manual burst export | 500 | 10 | 127.49 µs | 0.87 | 3 | 150.31 KB |
| Kangaroo size-triggered bursts | 500 | 10 | 145.97 µs | 1.00 | 4 | 150.31 KB |
| Dataflow BatchBlock bursts | 500 | 50 | 494.41 µs | 0.69 | 1 | 214.92 KB |
| ConcurrentQueue threshold drain | 500 | 50 | 539.66 µs | 0.76 | 2 | 412.24 KB |
| Channel threshold drain | 500 | 50 | 550.39 µs | 0.77 | 2 | 404.92 KB |
| Kangaroo manual burst export | 500 | 50 | 644.53 µs | 0.90 | 3 | 642.5 KB |
| Kangaroo size-triggered bursts | 500 | 50 | 712.62 µs | 1.00 | 4 | 642.5 KB |
Takeaway – Kangaroo's manual burst export stays within 0.87–0.91× of the automatic size-triggered baseline and consistently outperforms Dataflow's
BatchBlockat small batch sizes. At larger batches (500×50), Dataflow's lower per-item allocation gives it an edge in raw throughput.
Adds N items, then filters and converts them during export. Kangaroo filter+convert pipeline is the baseline.
| Method | N | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|
| Channel manual filter+convert | 1 000 | 23.39 µs | 0.71 | 1 | 34.01 KB |
| ConcurrentQueue manual filter+convert | 1 000 | 24.21 µs | 0.73 | 2 | 41.45 KB |
| Kangaroo filter+convert pipeline | 1 000 | 33.04 µs | 1.00 | 3 | 79.2 KB |
| Kangaroo filter+convert direct | 1 000 | 98.20 µs | 2.97 | 4 | 396.83 KB |
| Dataflow TransformBlock pipeline | 1 000 | 107.50 µs | 3.25 | 5 | 74.35 KB |
| ConcurrentQueue manual filter+convert | 10 000 | 242.16 µs | 0.65 | 1 | 450.64 KB |
| Channel manual filter+convert | 10 000 | 246.52 µs | 0.66 | 1 | 387.2 KB |
| Kangaroo filter+convert pipeline | 10 000 | 371.46 µs | 1.00 | 2 | 747.64 KB |
| Dataflow TransformBlock pipeline | 10 000 | 793.77 µs | 2.14 | 3 | 650.69 KB |
| Kangaroo filter+convert direct | 10 000 | 968.92 µs | 2.61 | 4 | 3 982.77 KB |
Takeaway – Kangaroo's batch filter+convert pipeline is 2–3× faster than Dataflow's
TransformBlockchain, thanks to avoiding async scheduling overhead. Manual implementations with raw queues are faster still but lack Kangaroo's built-in categorisation and converter infrastructure.
Routes N items to category-specific exporters. Kangaroo category routing + export is the baseline.
| Method | N | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|
| ConcurrentQueue dictionary routing | 1 000 | 22.18 µs | 0.52 | 1 | 62.31 KB |
| Channel dictionary routing | 1 000 | 27.64 µs | 0.65 | 2 | 48.96 KB |
| Kangaroo category routing + export | 1 000 | 42.51 µs | 1.00 | 3 | 100.5 KB |
| Kangaroo category direct export | 1 000 | 96.16 µs | 2.26 | 4 | 167.29 KB |
| Dataflow BroadcastBlock predicate routing | 1 000 | 760.91 µs | 17.90 | 5 | 94.29 KB |
| ConcurrentQueue dictionary routing | 10 000 | 209.27 µs | 0.40 | 1 | 602.27 KB |
| Channel dictionary routing | 10 000 | 271.54 µs | 0.52 | 2 | 476.92 KB |
| Kangaroo category routing + export | 10 000 | 526.26 µs | 1.00 | 3 | 959.54 KB |
| Kangaroo category direct export | 10 000 | 960.30 µs | 1.82 | 4 | 1 643.91 KB |
| Dataflow BroadcastBlock predicate routing | 10 000 | 7,976.20 µs | 15.16 | 5 | 1 069.64 KB |
Takeaway – Kangaroo's category routing is 15–18× faster than Dataflow's
BroadcastBlockwith predicate filtering. TheBroadcastBlockapproach copies every item to every target and filters afterwards, whereas Kangaroo dispatches directly to the correct exporter.
Multiple threads write ItemsPerProducer items concurrently.
Kangaroo concurrent AddData is the baseline.
| Method | Producers | Items/Producer | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|---|
| Channel concurrent TryWrite | 4 | 1 000 | 127.0 µs | 0.98 | 1 | 69.6 KB |
| Kangaroo concurrent AddData | 4 | 1 000 | 130.1 µs | 1.00 | 1 | 102.67 KB |
| Dataflow BufferBlock concurrent Post | 4 | 1 000 | 133.1 µs | 1.02 | 1 | 38.14 KB |
| ConcurrentQueue concurrent Enqueue | 4 | 1 000 | 268.7 µs | 2.07 | 2 | 69.18 KB |
| Kangaroo concurrent AddData | 4 | 10 000 | 1,179.0 µs | 1.00 | 1 | 1 550.37 KB |
| Channel concurrent TryWrite | 4 | 10 000 | 1,289.9 µs | 1.09 | 2 | 1 031.67 KB |
| Dataflow BufferBlock concurrent Post | 4 | 10 000 | 1,362.9 µs | 1.16 | 3 | 520.11 KB |
| ConcurrentQueue concurrent Enqueue | 4 | 10 000 | 3,097.9 µs | 2.63 | 4 | 1 031.25 KB |
| Channel concurrent TryWrite | 8 | 1 000 | 279.1 µs | 0.94 | 1 | 134.91 KB |
| Kangaroo concurrent AddData | 8 | 1 000 | 297.7 µs | 1.00 | 2 | 200.01 KB |
| Dataflow BufferBlock concurrent Post | 8 | 1 000 | 306.3 µs | 1.03 | 2 | 71.44 KB |
| ConcurrentQueue concurrent Enqueue | 8 | 1 000 | 1,881.9 µs | 6.32 | 3 | 134.49 KB |
| Channel concurrent TryWrite | 8 | 10 000 | 2,234.6 µs | 0.96 | 1 | 2 060.57 KB |
| Kangaroo concurrent AddData | 8 | 10 000 | 2,325.3 µs | 1.00 | 2 | 3 086.93 KB |
| Dataflow BufferBlock concurrent Post | 8 | 10 000 | 2,593.8 µs | 1.12 | 3 | 1 035.09 KB |
| ConcurrentQueue concurrent Enqueue | 8 | 10 000 | 20,059.1 µs | 8.63 | 4 | 2 057.06 KB |
Takeaway – Under contention (4–8 producers), Kangaroo matches
ChannelandBufferBlockthroughput while dramatically outperformingConcurrentQueue, which suffers from lock contention at higher thread counts (6–9× slower).
Each item is delivered to multiple exporters (fan-out). Kangaroo multi-exporter fan-out is the baseline.
| Method | Exporters | N | Mean | Ratio | Rank | Allocated |
|---|---|---|---|---|---|---|
| ConcurrentQueue sequential fan-out | 2 | 1 000 | 23.86 µs | 0.83 | 1 | 49.85 KB |
| Kangaroo multi-exporter fan-out | 2 | 1 000 | 28.87 µs | 1.00 | 2 | 83.8 KB |
| Channel cloned fan-out | 2 | 1 000 | 38.81 µs | 1.34 | 3 | 37.74 KB |
| Kangaroo multi-exporter direct | 2 | 1 000 | 53.80 µs | 1.86 | 4 | 143.66 KB |
| Dataflow BroadcastBlock fan-out | 2 | 1 000 | 777.22 µs | 26.93 | 5 | 65.75 KB |
| ConcurrentQueue sequential fan-out | 2 | 10 000 | 248.06 µs | 0.79 | 1 | 415.86 KB |
| Kangaroo multi-exporter fan-out | 2 | 10 000 | 312.48 µs | 1.00 | 2 | 801.92 KB |
| Channel cloned fan-out | 2 | 10 000 | 375.73 µs | 1.20 | 3 | 264.37 KB |
| Kangaroo multi-exporter direct | 2 | 10 000 | 537.32 µs | 1.72 | 4 | 1 409.29 KB |
| Dataflow BroadcastBlock fan-out | 2 | 10 000 | 7,329.01 µs | 23.45 | 5 | 692.57 KB |
| ConcurrentQueue sequential fan-out | 5 | 1 000 | 23.94 µs | 0.76 | 1 | 49.85 KB |
| Kangaroo multi-exporter fan-out | 5 | 1 000 | 31.72 µs | 1.00 | 2 | 107.38 KB |
| Kangaroo multi-exporter direct | 5 | 1 000 | 83.17 µs | 2.63 | 3 | 143.73 KB |
| Channel cloned fan-out | 5 | 1 000 | 95.11 µs | 3.01 | 4 | 94.32 KB |
| Dataflow BroadcastBlock fan-out | 5 | 1 000 | 3,647.36 µs | 115.25 | 5 | 299.97 KB |
| ConcurrentQueue sequential fan-out | 5 | 10 000 | 248.46 µs | 0.77 | 1 | 415.86 KB |
| Kangaroo multi-exporter fan-out | 5 | 10 000 | 323.00 µs | 1.00 | 2 | 1 036.43 KB |
| Kangaroo multi-exporter direct | 5 | 10 000 | 817.64 µs | 2.53 | 3 | 1 409.36 KB |
| Channel cloned fan-out | 5 | 10 000 | 939.70 µs | 2.91 | 4 | 660.88 KB |
| Dataflow BroadcastBlock fan-out | 5 | 10 000 | 36,404.35 µs | 112.71 | 5 | 3 082.89 KB |
Takeaway – Kangaroo is 23–115× faster than Dataflow's
BroadcastBlockfan-out, delivering the largest relative advantage in the entire benchmark suite. As the exporter count grows from 2 to 5, Kangaroo's advantage widens becauseBroadcastBlockduplicates work per target while Kangaroo iterates exporters directly.
Across 102 benchmarks covering 8 distinct scenarios, Kangaroo demonstrates competitive performance against raw .NET primitives while providing significantly richer functionality (categorisation, converters, multi-export, size-triggered batching).
| Scenario | vs. Dataflow | vs. Raw Primitives |
|---|---|---|
| Queueing throughput | ~1.1× faster | ~2–4× slower (expected) |
| Batch export | ~3× faster (N=100) | ~1.2× slower |
| Direct per-item export | ~1.9× faster (N=100) | ~10× slower vs. plain loop |
| Burst export | ~1.2× faster (small batch) | ~1.1–1.4× slower |
| Converter pipeline | ~2–3× faster | ~1.4–1.5× slower |
| Category routing | ~15–18× faster | ~1.9–2.5× slower |
| Concurrent producers | on par | 1× (matches Channel) |
| Multi-exporter fan-out | ~23–115× faster | ~1.2–1.3× slower |
The overall pattern: Kangaroo trades modest raw queueing throughput for a richer feature set, but recovers that cost and more in every end-to-end pipeline benchmark where those features matter. In the scenarios that most closely resemble real-world usage — batch export, converter pipelines, category routing, concurrent ingestion, and multi-exporter fan-out — Kangaroo matches or significantly outperforms the TPL Dataflow equivalent.
dotnet run --project Kangaroo.Benchmarks -c Release -- --filter "Kangaroo.Benchmarks*" --job shortFull BenchmarkDotNet artifacts (HTML, CSV, JSON) are written to
BenchmarkDotNet.Artifacts/results/.
Create a fork of the project into your own reposity. Make all your necessary changes and create a pull request with a description on what was added or removed and details explaining the changes in lines of code. If approved, project owners will merge it.
This project is under MIT-Licence (see LICENSE file).
Please file bugs and issues on the Github issues page for this project. This is to help keep track and document everything related to this repo. The code and documentation are released with no warranties or SLAs and are intended to be supported through a community driven process.