Skip to content

AlphaSudo/rimg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rule-image

Compile-once, memory-map-many: a Java AOT rule-image compiler and runtime for immutable datasets.

Take a large, read-mostly dataset — rule sets, policy graphs, feature flags, GeoIP ranges, signature databases — compile it ahead-of-time into a flat binary .rimg image, and memory-map it at runtime with zero heap cost. Built on Java FFM (MemorySegment), benchmarked with Virtual Threads, and designed with a forward path to Valhalla value-class views.


The headline number

Metric Heap (POJO) baseline rule-image mapped Delta
Heap after load 853.80 MB 8.07 MB 105× less heap
Load / open time 6,683 ms 145 ms 45× faster startup
Reload / swap time 6,595 ms 278 ms 23× faster reload

GeoIP-style showcase, 5 million synthetic entries. Full report: measurements/geoip-memory-showcase/win-5m-thin/report.md

At 100k entries with inflated metadata payloads:

Metric Heap baseline rule-image mapped
Heap after load 1.47 GB 7.33 MB
Heap delta 1.46 GB freed

Full report: measurements/geoip-memory-showcase/win-100k-fat-23_5k/report.md


What it does

┌──────────┐    ┌──────────┐    ┌──────────┐     ┌──────────┐
│ Extract  │──▶│ Compile  │───▶│  Map     │───▶│  Serve   │
│ (Postgres│    │ JSON →   │    │ mmap +   │     │ zero-    │
│  / JSON) │    │  .rimg   │    │ FFM API  │     │ alloc    │
└──────────┘    └──────────┘    └──────────┘     │ lookups  │
                                                 └──────────┘
  1. Extract — pull rows from PostgreSQL (or feed normalized JSON directly)
  2. Compile — build a versioned .rimg binary with MPHF index + optional Bloom filter
  3. MapFileChannel.map() + Arena.ofShared()MemorySegment
  4. Serve — absolute-offset reads, no heap allocation on the hot path
  5. Swap — atomic hot-swap under load, zero-downtime refresh

Honest benchmarks

Where rule-image wins: memory, startup, reload

Workload Heap Mapped Ratio
5M thin GeoIP entries — heap 853 MB 8 MB 105×
100k fat-payload entries — heap 1.47 GB 7.33 MB 200×
5M thin — load time 6.6s 145ms 45×
100k fat — reload time 6.5s 278ms 23×

Where heap still wins: warm lookup latency

Benchmark Heap Mapped Ratio
JMH single warm lookup 21 ns/op 79 ns/op 3.7× slower
JMH composed (N=10) 400 ns/op 684 ns/op 1.7× slower
JMH composed (N=100) 4.3 µs/op 8.4 µs/op 1.9× slower

Full JMH matrix: measurements/week3-report.md

Versus Netflix Hollow (head-to-head on Linux)

Benchmarked against Netflix Hollow on the same GeoIP-like workload. Reports:

Versus shared-store miss paths (FF4J Redis / JDBC)

Path Avg latency
FF4J uncached Redis 301 µs
FF4J uncached network JDBC 679 µs
FF4J warm Redis cache 0.85 µs
rule-image warm lookup sub-µs

The strongest case for rule-image: replacing repeated remote metadata fetch with a local compiled snapshot.

Hot-swap chaos test

  • 10,000 concurrent virtual-thread readers
  • Forced swap every 500ms for 5 continuous minutes
  • Zero segfaults, zero stale reads, zero lost evaluations

Quick start

Prerequisites: JDK 26 (Temurin recommended), Gradle wrapper included.

# Clone
git clone https://github.com/AlphaSudo/rimg.git
cd rimg

# Build + test
./gradlew test

# Run the GeoIP memory showcase (100k entries)
./gradlew :geoip-showcase:run --args="--entries 100000 --lookups 10000 --warmup-lookups 2000"

# Compile a custom .rimg from JSON
./gradlew :compiler:run --args="--input fixtures/rules-10000.json --out rules.rimg"

# Inspect a compiled image
./gradlew :inspector:run --args="stats --input rules.rimg"

Windows (PowerShell)

.\gradlew.bat test
powershell -ExecutionPolicy Bypass -File .\scripts\Run-GeoIpMemoryShowcase.ps1 -Entries 100000 -Lookups 10000 -WarmupLookups 2000

Linux benchmarks

./gradlew test :benchmark:jmhJar
./scripts/bench-linux-harness.sh
sudo ./scripts/bench-linux-cold-cache.sh

Architecture

Modules

Module Purpose
format-spec Binary layout spec, Bloom filter, hash utilities
compiler AOT compiler: JSON → .rimg with MPHF index
runtime Mapped reader, header validation, Linux page-fault mitigations
codegen Schema-driven accessor generator
benchmark JMH microbenchmarks (warm, cold, concurrent, composed)
inspector CLI: hexdump, stats, format validation
service-harness Feature-flagged service with heap|rimg + platform|virtual modes
load-driver HTTP load generator for the service harness
geoip-showcase Dedicated heap-vs-mapped memory showcase
hollow-showcase Netflix Hollow comparison harness
record-showcase Real-data heap-vs-mapped benchmark
postgres-export PostgreSQL → normalized JSON exporter

Binary format (v0.4)

┌────────────────────────────┐
│ Header (magic, version,    │
│   CRC32, SHA-256, flags)   │
├────────────────────────────┤
│ MPHF bucket seeds [int[]]  │
│ Slot offset table [long[]] │
│ Bloom filter (optional)    │
├────────────────────────────┤
│ Entry region               │
│  key + ruleId + priority   │
│  + action (packed, LE)     │
└────────────────────────────┘

Full spec: format-spec/SPEC.md

Page-fault mitigations

Virtual threads + mmap have a real operational subtlety: hard page faults stall the carrier thread (not fixed by JEP 491). This repo implements and benchmarks:

  • Pre-touch — sequential walk, one byte per 4 KiB page
  • madvise(WILLNEED) — FFM downcall, best tested cold-path mitigation
  • mlock — pin pages in RAM for strict tail-latency SLAs
  • Carrier pool tuningjdk.virtualThreadScheduler.parallelism sweep

Details: docs/ADR.md §6.1


Who this is for

Best-fit workload shapes:

  • Feature/config evaluation services
  • Policy and authorization lookup
  • Pricing, routing, eligibility, decision-table services
  • Replay/state-transition services with mostly-static dispatch metadata
  • Large immutable lookup datasets (GeoIP-style range mappings)

Poor fit: CRUD-heavy, write-heavy, or highly dynamic per-request data.


Real-data pipeline

The repo includes a working end-to-end path from a real PostgreSQL database:

  1. Export with postgres-export → normalized JSON
  2. Compile with compiler.rimg
  3. Benchmark with record-showcase → heap-vs-mapped comparison

Example config: examples/postgres-rule-export-config.example.json


What we do NOT claim

  • "This is faster than everything" — warm lookup still loses to heap
  • "Virtual threads + mmap always win" — they don't; see the data
  • "Production-ready for all workloads" — this is a PoC with real evidence

The honest positioning: rule-image wins on memory shape, startup, reload, and cold/miss-path avoidance. If your current heap path is already warm and cheap, this is not automatically better. The data is in the repo — judge for yourself.


Deeper reading

Document What it covers
Architecture Decision Record Full technical rationale, prior art matrix, risk analysis
Architecture guide Detailed comparison methodology and benchmark interpretation
Blog post draft "Mmap + Virtual Threads + Panama: A Pattern, Not a Revolution"
Valhalla engagement Value-class view pattern for valhalla-dev
Format specification Binary layout v0.4
Week 3 report Full Phase 3 benchmark artifacts
Phase 3 closeout What was proven, what remains
Decision docs 7 decision records (hot-swap, layout, trust, go/no-go)

Requirements

  • JDK 26 (Temurin). JDK 22+ minimum (FFM GA). JDK 24+ recommended (JEP 491).
  • Linux for full feature support (madvise, mlock, perf stat)
  • Windows works for development and most benchmarks; no madvise/mlock

License

Apache License 2.0

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors