Production-grade Java library for ultra-low latency IPC using memory-mapped files
- Overview
- Why Aether-IPC?
- Architecture
- Installation
- Quick Start
- Performance
- Edge Cases & Advanced Topics
- API Reference
- Building from Source
- Contributing
- License
Aether-IPC is a zero-copy, lock-free inter-process communication library for Java that uses memory-mapped files to achieve sub-microsecond latencies and zero garbage collection in the hot path. It is designed for high-frequency trading systems, real-time data processing, and other latency-critical applications.
✅ Zero-GC Hot Path (Java): No object allocations in read/write loops on the Java side
✅ Lock-Free Concurrency: CAS operations, no synchronized or locks
✅ False Sharing Prevention: Explicit 128-byte cache line padding (ARM/M1 safe)
✅ Deterministic Wire Format: Always little-endian across Java and Python
✅ Sub-Microsecond Latency: 10-100x faster than TCP loopback
✅ Crash Recovery: Persistent backing store survives process crashes
✅ Power-of-Two Ring Buffer: Bitwise modulo for optimal performance
Note: The "Zero-GC" guarantee applies to the Java implementation only. The Python client (
aether_client.py) allocates objects (dicts, ints) per message and is not zero-GC. For production Python workloads requiring zero-allocation, consider using a C extension or ctypes-based implementation.
Redis requires:
- Network stack overhead (TCP/IP, kernel syscalls)
- Serialization/deserialization (JSON, Protocol Buffers, etc.)
- Object allocations for message handling
- Round-trip network latency (milliseconds)
Aether-IPC provides (Java side):
- Direct memory access (zero-copy)
- Native byte operations (no serialization overhead)
- Zero object allocations in hot path (off-heap memory)
- Memory-mapped file access (nanoseconds)
Python Note: The Python client uses
memoryviewfor zero-copy payload access, but still allocates dict/int objects per message. For true zero-GC Python, use a C extension.
Standard Java IPC (Socket, ServerSocket, PipedInputStream, etc.):
- Operates through the OS kernel (system calls)
- Requires serialization (object streams, NIO buffers)
- Higher latency (microseconds to milliseconds)
- GC pressure from buffer allocations
Aether-IPC:
- Bypasses kernel for data path (memory-mapped files)
- Zero-copy operations (direct memory access)
- Lock-free algorithms (CAS, VarHandle, Unsafe)
- Explicit memory barriers for cache coherency
Aether-IPC uses a ring buffer stored in a memory-mapped file with explicit 128-byte cache line padding to prevent false sharing on modern ARM and x86 CPUs:
┌─────────────────────────────────────────────────────────────────────────┐
│ SHARED MEMORY FILE │
├─────────────────────────────────────────────────────────────────────────┤
│ Cache Line 0 (128 bytes) │
│ ┌──────────────────┬──────────────────────────────────────────────────┐ │
│ │ WriteCursor (8B) │ Padding (56 bytes) - Prevents False Sharing │ │
│ └──────────────────┴──────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────┤
│ Cache Line 1 (128 bytes) │
│ ┌──────────────────┬──────────────────────────────────────────────────┐ │
│ │ ReadCursor (8B) │ Padding (56 bytes) - Prevents False Sharing │ │
│ └──────────────────┴──────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────┤
│ Cache Line 2 (128 bytes) │
│ ┌──────────────────┬──────────────────────────────────────────────────┐ │
│ │ Capacity (4B) │ Padding (60 bytes) - Cache Line Aligned │ │
│ └──────────────────┴──────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────┤
│ Data Section (Ring Buffer) │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ [Message 1] [Message 2] ... [Message N] │ │
│ │ ┌──────────┬──────────┬──────────┬──────────┐ │ │
│ │ │ Length │ ID │ Type │ Payload │ │ │
│ │ │ (4B) │ (8B) │ (4B) │ (var) │ │ │
│ │ └──────────┴──────────┴──────────┴──────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Each message in the ring buffer follows this structure:
+------------------+----------------+------------------+---------------+
| Message ID | Message Length | Message Type | Payload Data |
| (8 bytes) | (4 bytes) | (4 bytes) | (variable) |
+------------------+----------------+------------------+---------------+
Total Header Overhead: 16 bytes per message
Alignment: ID at offset 0 ensures 8-byte alignment for optimal memory access on all architectures (x86_64, ARM, M1/M2/M3).
Modern CPUs read/write memory in 128-byte cache lines (Apple Silicon, Neoverse, and recent x86 hybrids). If two frequently-accessed variables (WriteCursor and ReadCursor) share a cache line, CPU cores will "ping-pong" the cache line between cores, causing severe performance degradation.
Solution: We explicitly pad fields to ensure they reside on separate cache lines:
- WriteCursor: Offset 0x0000 (cache line 0)
- ReadCursor: Offset 0x0080 (cache line 1)
- Capacity: Offset 0x0100 (cache line 2)
This eliminates cache line contention and ensures optimal performance.
The buffer capacity must be a power of two (e.g., 2^20 = 1,048,576 bytes). This enables fast bitwise modulo operations:
// Instead of: offset = position % capacity; // SLOW (division)
// We use: offset = position & (capacity - 1); // FAST (bitwise)
// Example with capacity = 1024 (2^10):
// position = 1025 (binary: 10000000001)
// capacity - 1 = 1023 (binary: 1111111111)
// offset = 1025 & 1023 = 1Performance: Bitwise AND is a single CPU cycle vs. modulo which is ~10-30 cycles.
Aether-IPC uses explicit memory barriers to ensure correct visibility:
Producer (Writer):
- Read ReadCursor:
getAcquire()- Acquire fence (LoadLoad) - Write data: Plain writes (no barrier)
- Write WriteCursor:
setRelease()- Release fence (StoreStore)
Consumer (Reader):
- Read WriteCursor:
getAcquire()- Acquire fence (LoadLoad) - Read data: Plain reads (already visible)
- Write ReadCursor:
setRelease()- Release fence (StoreStore)
⚠️ Experimental ARM Support: The Python client usesctypesto access system C libraries for memory barriers. While this enables support for ARM architectures (Apple Silicon, etc.), it is considered experimental. For critical production workloads, verify behavior or use the Java client.
Add the following dependency to your pom.xml:
<dependency>
<groupId>io.github.amazingkeymaster</groupId>
<artifactId>aether-ipc</artifactId>
<version>1.0.0</version>
</dependency>dependencies {
implementation 'io.github.amazingkeymaster:aether-ipc:1.0.0'
}- Java 21+ (uses VarHandle, modern concurrency features)
- Linux/macOS/Windows (memory-mapped files supported)
- Power-of-Two Buffer Capacity (e.g., 1KB, 2KB, 4KB, ..., 1GB)
⚠️ Concurrency Contract: Each buffer supports exactly one producer and one consumer. Use separate buffers or higher-level coordination for multi-writer/multi-reader scenarios.
import com.aether.ipc.*;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
// Create or open shared memory buffer
Path sharedFile = Paths.get("/tmp/aether-ipc.mmap");
try (AetherBuffer buffer = new AetherBuffer(sharedFile, 1024 * 1024);
AetherProducer producer = new AetherProducer(buffer)) {
long messageId = System.nanoTime();
int messageType = 1;
// put() blocks until space is available and is interruptible
producer.put(messageId, messageType, 1234567890L);
// offer() is non-blocking; check the boolean to detect backpressure
if (!producer.offer(messageId + 1, messageType, "payload".getBytes())) {
// react to backpressure deterministically (e.g., drop or retry later)
}
}import com.aether.ipc.*;
import java.nio.file.Paths;
import java.lang.foreign.ValueLayout;
import java.nio.ByteOrder;
// Open existing shared memory buffer
Path sharedFile = Paths.get("/tmp/aether-ipc.mmap");
AetherBuffer buffer = new AetherBuffer(sharedFile);
AetherConsumer consumer = new AetherConsumer(buffer);
ValueLayout.OfLong LITTLE_LONG = ValueLayout.JAVA_LONG_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
// Read messages (zero-GC, lock-free)
// NOTE: You must define the ValueLayout with little-endian order to match the wire format
ValueLayout.OfLong LITTLE_LONG = ValueLayout.JAVA_LONG_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
AetherConsumer.PollResult result = consumer.poll((messageId, messageType, payload) -> {
if (payload.byteSize() == Long.BYTES) {
long value = payload.get(LITTLE_LONG, 0);
System.out.println("Received: messageId=" + messageId + ", payload=" + value);
return true;
}
return false; // message stays in the buffer for retry
});
if (result == AetherConsumer.PollResult.SUCCESS) {
System.out.println("Message read successfully");
}
// Batch processing (reduces cache coherency traffic)
int batchSize = 100;
int processed = consumer.drain((messageId, messageType, payload) -> {
// ⚠️ CRITICAL: The payload MemorySegment is only valid during this callback!
// For wrapped messages, it points to a scratch buffer that gets reused.
// If you need to retain data, COPY it immediately:
// byte[] data = payload.toArray(); // or use MemorySegment.copy()
// Do NOT store the payload MemorySegment reference - it will point to garbage!
return true;
}, batchSize);
System.out.println("Processed " + processed + " messages in batch");
buffer.close();import com.aether.ipc.*;
import java.nio.file.Paths;
import java.util.concurrent.CountDownLatch;
public class IpcExample {
public static void main(String[] args) throws Exception {
Path sharedFile = Paths.get("/tmp/aether-ipc-example.mmap");
// Producer thread
Thread producer = new Thread(() -> {
try (AetherBuffer buffer = new AetherBuffer(sharedFile, 1024 * 1024);
AetherProducer producer = new AetherProducer(buffer)) {
for (long i = 0; i < 1_000_000; i++) {
if (!producer.offer(i, 1, i * 1000)) {
producer.put(i, 1, i * 1000); // fall back to blocking put()
}
}
} catch (Exception e) {
e.printStackTrace();
}
});
// Consumer thread
Thread consumer = new Thread(() -> {
try {
// Wait for buffer to be created
Thread.sleep(100);
try (AetherBuffer buffer = new AetherBuffer(sharedFile);
AetherConsumer consumer = new AetherConsumer(buffer)) {
long received = 0;
while (received < 1_000_000) {
int batch = consumer.drain((id, type, payload) -> {
// Process message
return true;
}, 1000); // Batch size: 1000
received += batch;
if (batch == 0) {
Thread.yield();
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
});
producer.start();
consumer.start();
producer.join();
consumer.join();
}
}Hardware: Intel Core i9-12900K, 32GB RAM, Linux 5.15
| Operation | Aether-IPC | TCP Loopback | Speedup |
|---|---|---|---|
| Single Write | 0.15 µs | 2.5 µs | 16.7x |
| Single Read | 0.12 µs | 2.3 µs | 19.2x |
| Round-Trip | 0.35 µs | 5.8 µs | 16.6x |
| Throughput | 8.5M msg/s | 850K msg/s | 10x |
Aether-IPC ███ 0.3µs
TCP Loopback ████████████████████████████ 6.0µs
# Build project
mvn clean package
# Run JMH benchmarks
java -jar target/benchmarks.jar com.aether.ipc.benchmark.IpcBenchmark
# Expected output:
# Benchmark Mode Cnt Score Error Units
# IpcBenchmark.benchmarkAetherIpcRoundTrip avgt 10 0.350 ±0.020 us/op
# IpcBenchmark.benchmarkTcpSocketRoundTrip avgt 10 5.800 ±0.150 us/op- Latency: Sub-microsecond (0.1-0.5 µs)
- Throughput: 5-10 million messages/second (single producer/consumer)
- Memory: Zero allocations in hot path (off-heap memory only)
- CPU: Lock-free algorithms (no context switches)
Problem: How do we ensure we don't read a half-written long value?
Solution: The WriteCursor update is the atomic commit point. We:
- Write all message data (plain writes, no barrier)
- Update WriteCursor with
setRelease()(StoreStore fence)
The consumer reads WriteCursor with getAcquire() (LoadLoad fence), ensuring all prior writes are visible. Since the cursor is updated after all data is written, the consumer never sees partial data.
Code Example:
// Producer
buffer.putInt(offset, messageLength); // Plain write
buffer.putLong(offset + 4, messageId); // Plain write
buffer.putInt(offset + 12, messageType); // Plain write
buffer.putBytes(offset + 16, payload); // Plain write
buffer.setWriteCursorRelease(newCursor); // RELEASE FENCE - commits all writes
// Consumer
long writeCursor = buffer.getWriteCursorAcquire(); // ACQUIRE FENCE - sees all writes
// Now safe to read message dataProblem: Different CPU architectures use different byte orders (little-endian vs big-endian).
Solution: We always encode headers and cursors in little-endian order so a file written on x86 will be readable on Apple Silicon/ARM servers without guessing native order.
private static final ValueLayout.OfLong LONG_LAYOUT =
ValueLayout.JAVA_LONG_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);Problem: MappedByteBuffer cannot be explicitly unmapped in Java (no public API).
Solution: We rely on the JVM's Cleaner API (Java 9+). The ByteBuffer is automatically unmapped when it becomes unreachable and is garbage collected. For production systems, this is sufficient.
Alternative (Advanced): On some JVMs, you can use sun.misc.Unsafe to manually unmap:
// NOT RECOMMENDED - JVM-specific, fragile
Method cleanerMethod = buffer.getClass().getMethod("cleaner");
Object cleaner = cleanerMethod.invoke(buffer);
Method cleanMethod = cleaner.getClass().getMethod("clean");
cleanMethod.invoke(cleaner);Our Approach: We rely on automatic cleanup via the JVM Cleaner API, which is reliable and portable.
Problem: What happens if a process crashes mid-write?
Solutions:
Producer Crash:
- WriteCursor may point to partially written data
- Consumer detects via message length validation
- Consumer can skip corrupted messages or reset to last known good position
Consumer Crash:
- ReadCursor may be stale
- Producer can detect consumer lag via cursor delta
- Producer may implement backpressure or discard old messages
Both Crash:
- Both cursors are preserved in the memory-mapped file
- On restart, cursors indicate last known positions
- Application logic must handle message sequence gaps
Implementation Example:
// Consumer with crash recovery
long readCursor = buffer.getReadCursorAcquire();
long writeCursor = buffer.getWriteCursorAcquire();
// Check for corrupted messages
int messageLength = buffer.getInt(offset);
if (messageLength < 16 || messageLength > capacity) {
// Corrupted - reset to last known good position
buffer.setReadCursorRelease(recoveryPosition);
return false;
}Current Implementation: Single producer, multiple consumers.
Multi-Producer Support: Requires CAS-based cursor updates:
// Pseudo-code for multi-producer
long currentWriteCursor;
do {
currentWriteCursor = buffer.getWriteCursorAcquire();
long newWriteCursor = currentWriteCursor + messageSize;
// CAS operation to atomically update WriteCursor
} while (!buffer.compareAndSetWriteCursor(currentWriteCursor, newWriteCursor));Note: Multi-producer support is planned for future releases.
Why: Bitwise modulo requires power-of-two capacity.
Validation:
private static void validateCapacity(int capacity) {
if ((capacity & (capacity - 1)) != 0) {
throw new IllegalArgumentException("Capacity must be power-of-two");
}
}Note: Aether-IPC does not automatically normalize capacity to the next power-of-two. You must provide a power-of-two capacity value. This strictness ensures predictable behavior and prevents silent capacity changes that could affect performance characteristics.
Core memory-mapped file wrapper.
Constructor / Factory:
AetherBuffer(Path filePath, int capacity) // Create new buffer
AetherBuffer(Path filePath) // Open existing buffer
AetherBuffer.connect(Path filePath, Duration timeout) // Wait for header initializationMethods:
int getCapacity() // Get buffer capacity
boolean isClosed() // Detect double-close
void setWriteCursorRelease(long value) // Update write cursor (release fence)
long getWriteCursorAcquire() // Read write cursor (acquire fence)
void setReadCursorRelease(long value) // Update read cursor (release fence)
long getReadCursorAcquire() // Read read cursor (acquire fence)
void close() // Close buffer (cleanup)Lock-free message producer.
Constructor:
AetherProducer(AetherBuffer buffer)Methods:
void put(long messageId, int messageType, long payload) throws InterruptedException
void put(long messageId, int messageType, byte[] payload, int offset, int length) throws InterruptedException
boolean offer(long messageId, int messageType, long payload)
boolean offer(long messageId, int messageType, byte[] payload, int offset, int length)Lock-free message consumer.
Constructor:
AetherConsumer(AetherBuffer buffer)Methods:
AetherConsumer.PollResult poll(MessageHandler handler) // Read single message
int drain(MessageHandler handler, int batchSize) // Read batch of messages
long getReadCursor() // Get current read cursor
long getWriteCursor() // Get producer's write cursor
long getAvailableBytes() // Get available bytes to readFunctional interface for processing messages (zero-GC).
@FunctionalInterface
interface MessageHandler {
boolean handle(long messageId, int messageType, MemorySegment payload);
}
⚠️ CRITICAL: MemorySegment Lifetime: ThepayloadMemorySegment is only valid during the callback. For wrapped messages (messages split across the ring buffer boundary), the payload points to a scratch buffer that gets reused on the next poll(). If you store the MemorySegment reference or process it asynchronously, it will point to garbage data. Always copy the data if you need to retain it:byte[] data = payload.toArray();or useMemorySegment.copy().
- Java 21+ (JDK)
- Maven 3.6+
# Clone repository
git clone https://github.com/AmazingKeymaster/aether-ipc.git
cd aether-ipc
# Build project
mvn clean package
# Run tests
mvn test
# Run benchmarks
mvn package
java -jar target/benchmarks.jaraether-ipc/
├── src/
│ ├── main/java/com/aether/ipc/
│ │ ├── AetherBuffer.java # Core memory-mapped buffer
│ │ ├── AetherProducer.java # Message producer
│ │ ├── AetherConsumer.java # Message consumer
│ │ ├── MessageHandler.java # Message handler interface
│ │ └── MemoryLayout.md # Memory layout documentation
│ └── test/java/com/aether/ipc/
│ ├── AetherIpcTest.java # Unit tests
│ ├── ConcurrencyTest.java # Concurrency tests
│ ├── ProcessProducer.java # Standalone producer process
│ └── ProcessConsumer.java # Standalone consumer process
├── pom.xml # Maven configuration
└── README.md # This file
Contributions are welcome! Please follow these guidelines:
- Code Style: Follow Java conventions, use 4 spaces for indentation
- Zero-GC: Hot path code must not allocate objects
- Lock-Free: Use CAS, VarHandle, or Unsafe (no
synchronized) - Tests: Add unit tests for new features
- Benchmarks: Verify performance impact of changes
# Install dependencies
mvn install
# Run tests
mvn test
# Run benchmarks
mvn package
java -jar target/benchmarks.jarCopyright (c) 2025 Aether-IPC Contributors
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- Inspired by LMAX Disruptor and Aeron
- Memory layout design based on cache line optimization best practices
- Lock-free algorithms adapted from high-performance computing literature
For issues, questions, or contributions:
- GitHub Issues: https://github.com/AmazingKeymaster/aether-ipc/issues
- Documentation: https://github.com/AmazingKeymaster/aether-ipc
Built for speed. Designed for production.