ZeroPool

A high-performance buffer pool for Rust with constant-time allocations

Why ZeroPool?

Loading GPT-2 checkpoints took 200ms. Profiling showed 70% was buffer allocation, not I/O. With ZeroPool: 53ms (3.8x faster).

ZeroPool provides constant ~8ns allocation time regardless of buffer size, thread-safe operations, and near-linear multi-threaded scaling.

Quick Start

use zeropool::BufferPool;

let pool = BufferPool::new();

// Get a buffer
let mut buffer = pool.get(1024 * 1024); // 1MB

// Use it
file.read(&mut buffer)?;

// Return it
pool.put(buffer);

Performance Highlights

Metric	Result
Allocation latency	14.6ns (constant, any size)
vs bytes (1MB)	2.6x faster
Multi-threaded (8 threads)	32 TiB/s
Multi-threaded speedup	2.1x faster than previous version
Real workload speedup	3.8x (GPT-2 loading)

Key Features

Constant-time allocation - ~15ns whether you need 1KB or 1MB

Thread-safe - Only 1ns slower than single-threaded pools, but actually concurrent

Auto-configured - Adapts to your CPU topology (4-core → 4 shards, 20-core → 32 shards)

Simple API - Just get() and put()

Architecture

Thread 1     Thread 2     Thread N
┌─────────┐  ┌─────────┐  ┌─────────┐
│ TLS (4) │  │ TLS (4) │  │ TLS (4) │  ← 75% hit rate
└────┬────┘  └────┬────┘  └────┬────┘
     └──────────┬─────────────┘
                ↓
       ┌────────────────┐
       │ Sharded Pool   │              ← Power-of-2 shards
       │ [0][1]...[N]   │              ← Minimal contention
       └────────────────┘

Fast path: Thread-local cache (lock-free, ~15ns) Slow path: Thread-affinity shard selection (better cache locality) Optimization: Power-of-2 shards enable bitwise AND instead of modulo

Configuration

use zeropool::BufferPool;

let pool = BufferPool::builder()
    .tls_cache_size(8)               // Buffers per thread
    .min_buffer_size(512 * 1024)     // Keep buffers ≥ 512KB
    .max_buffers_per_shard(32)       // Max pooled buffers
    .num_shards(16)                  // Override auto-detection
    .build();

Defaults (auto-configured based on CPU count):

Shards: 4-128 (power-of-2, ~1 shard per 2 cores)
TLS cache: 2-8 buffers per thread
Min buffer size: 1MB
Max per shard: 16-64 buffers

Memory Pinning

Lock buffer memory in RAM to prevent swapping:

use zeropool::BufferPool;

let pool = BufferPool::builder()
    .pinned_memory(true)
    .build();

Useful for high-performance computing, security-sensitive data, or real-time systems. May require elevated privileges on some systems. Falls back gracefully if pinning fails.

Eviction Policy

Choose between simple LIFO or intelligent CLOCK-Pro buffer eviction:

use zeropool::{BufferPool, EvictionPolicy};

let pool = BufferPool::builder()
    .eviction_policy(EvictionPolicy::ClockPro)  // Better cache locality (default)
    .build();

let pool_lifo = BufferPool::builder()
    .eviction_policy(EvictionPolicy::Lifo)     // Simple, lowest overhead
    .build();

CLOCK-Pro (default): Uses access counters to favor recently-used buffers, preventing cache thrashing in mixed-size workloads. ~8 bytes overhead per buffer.

LIFO: Simple last-in-first-out eviction. Minimal memory overhead, best for uniform buffer sizes.

Benchmarks

Single-Threaded (allocation + deallocation)

Size	ZeroPool	No Pool	Lifeguard	Sharded-Slab	Bytes
1KB	14.9ns	7.7ns	7.0ns	50.0ns	8.4ns
64KB	14.6ns	46.3ns	6.9ns	88.3ns	49.1ns
1MB	14.6ns	37.7ns	6.9ns	80.2ns	39.4ns

Constant latency across all sizes. 2.6x faster than bytes for large buffers (1MB).

Multi-Threaded (throughput)

Threads	ZeroPool	No Pool	Sharded-Slab	Speedup vs Previous
2	14.2 TiB/s	2.7 TiB/s	1.3 TiB/s	1.38x ⚡
4	25.0 TiB/s	5.1 TiB/s	2.6 TiB/s	1.34x ⚡
8	32.0 TiB/s	7.7 TiB/s	3.9 TiB/s	1.14x ⚡

Near-linear scaling with thread-local shard affinity. 8.2x faster than sharded-slab at 8 threads.

Real-World Pattern

Buffer reuse (1MB buffer, 1000 get/put cycles):

ZeroPool: 6.1µs total (6.1ns/op)
No pool: 600ns/op
Lifeguard: 172ns/op

~98x faster than no pooling for realistic buffer reuse patterns.

Run yourself:

cargo bench

Test System

CPU: Intel i9-10900K @ 3.7GHz (10 cores, 20 threads, 5.3GHz turbo)
RAM: 32GB DDR4
OS: Linux 6.14.0

How It Works

Thread-local caching (75% hit rate)

Lock-free access to recently used buffers
No atomic operations on fast path
Zero cache-line bouncing

Thread-local shard affinity

Each thread consistently uses the same shard (cache locality)
shard = hash(thread_id) & (num_shards - 1) (no modulo)
Minimal lock contention + better CPU cache utilization
Auto-scales with CPU count

First-fit allocation

O(1) instead of O(n) best-fit
Perfect for predictable I/O buffer sizes

Zero-fill skip

Safe unsafe via capacity-checked set_len()
I/O immediately overwrites buffers
~40% of the speedup vs Vec::with_capacity

Thread Safety

BufferPool is Clone and thread-safe:

let pool = BufferPool::new();

for _ in 0..4 {
    let pool = pool.clone();
    std::thread::spawn(move || {
        let buf = pool.get(1024);
        // Each thread gets its own TLS cache
        pool.put(buf);
    });
}

Safety Guarantees

Uses unsafe internally to skip zero-fills:

// SAFE: capacity checked before set_len()
if buffer.capacity() >= size {
    unsafe { buffer.set_len(size); }
}

Why this is safe:

Capacity verified before length modification
Buffers immediately overwritten by I/O operations
No reads before writes in typical I/O patterns

All unsafe code is documented and audited.

Use Cases

High-performance I/O - io_uring, async file loading, network buffers

LLM inference - Fast checkpoint loading (real-world: 200ms → 53ms for GPT-2)

Data processing - Predictable buffer sizes, high allocation rate

Multi-threaded servers - Concurrent buffer allocation without contention

License

Dual licensed under Apache-2.0 or MIT.

Contributing

PRs welcome! Please include benchmarks for performance changes.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
benches		benches
examples		examples
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
roadmap.md		roadmap.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZeroPool

Why ZeroPool?

Quick Start

Performance Highlights

Key Features

Architecture

Configuration

Memory Pinning

Eviction Policy

Benchmarks

Single-Threaded (allocation + deallocation)

Multi-Threaded (throughput)

Real-World Pattern

Test System

How It Works

Thread Safety

Safety Guarantees

Use Cases

License

Contributing

About

Uh oh!

Releases 6

Packages

Languages

License

botirk38/zeropool

Folders and files

Latest commit

History

Repository files navigation

ZeroPool

Why ZeroPool?

Quick Start

Performance Highlights

Key Features

Architecture

Configuration

Memory Pinning

Eviction Policy

Benchmarks

Single-Threaded (allocation + deallocation)

Multi-Threaded (throughput)

Real-World Pattern

Test System

How It Works

Thread Safety

Safety Guarantees

Use Cases

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages