Most stream processing systems recompute too much.
Ripple only recomputes what actually changed.
→ Distributed, type-safe, incremental stream processing in OCaml
→ 250ns stabilization at 10K symbols
→ 2M+ events/sec throughput
→ Deterministic replay and recovery
Built from first principles after studying incremental computation models used in production trading infrastructure.
Traditional systems:
price update → recompute entire window → serialize full record → latency spike
Ripple:
price update → recompute 3 affected nodes → serialize delta → done in 250ns
10,000 symbols in the graph.
One trade arrives.
3 nodes recompute. 9,997 don't.
Ripple is built on incremental computation graphs:
- Nodes represent computations
- Edges represent dependencies
- Changes propagate only to affected nodes
- Stabilization is deterministic and minimal
Instead of recomputing everything, we recompute exactly what changed.
Input Events → [Leaf Nodes] → [Map] → [Incr Fold] → [Output] → Deltas
(set value) (pure) (O(1) update) (diff) (bin_prot)
A min-heap tracks dirty nodes in topological order. The incremental fold subtracts the old value, adds the new. O(1) per changed parent — not O(N), not O(log N). O(1). Regardless of graph size.
Ripple is composed of:
- Graph Engine — incremental dependency tracking with heap-based propagation
- Delta Layer — idempotent state transitions with algebraic guarantees
- Schema System — type-safe schemas with compile-time compatibility checking
- Transport — sequence-ordered distributed propagation with CRC-32C integrity
- Checkpointing — deterministic replay and recovery
- Coordinator / Workers — horizontal scaling via consistent hashing
Design principles:
- Deterministic execution (injectable effects for time, randomness)
- Minimal recomputation (O(1) incremental fold)
- Type-safe boundaries (schemas derived from types)
- Replayable state (crash at any point, recover correctly)
lib/
├── kernel/ Effect injection, domain types (Trade, Vwap)
├── graph/ Core engine: heap-based stabilization, incremental fold
├── schema/ Type-safe schemas, delta algebra, compatibility checker
├── wire/ Binary protocol with CRC-32C integrity
├── transport/ Sequence-ordered delta buffer with gap detection
├── checkpoint/ Snapshot/restore with pluggable stores (memory, disk, S3)
├── window/ Tumbling/sliding/session windows, watermark tracking
├── time_series/ Aggregators: count, sum, mean, vwap, stddev
├── topology/ Pipeline composition and validation
├── observability/ Prometheus metrics, W3C tracing, introspection, alerts
├── coordinator/ Consistent hashing, partition assignment, failure detection
├── worker/ Lifecycle state machine, engine loop
├── rpc/ Async RPC delta transport
├── connector/ Source/sink connectors (file, Kafka)
└── ripple/ Top-level facade
| Operation | Measured |
|---|---|
| Stabilization (10K symbols) | 250 ns |
| bin_prot serde roundtrip | 82 ns (12M+/sec) |
| Schema compatibility check | 128 ns |
| VWAP pipeline throughput | 2.16M events/sec |
| 6M event replay recovery | 2.1 seconds |
| Heap growth over 1M events | 0.1% |
- Deterministic replay — same inputs always produce same outputs
- Idempotent updates —
apply(d, apply(d, v)) = apply(d, v) - Bounded recomputation — O(depth) per event, not O(graph size)
- Crash recovery — 100/100 random crash points recover correctly
| Category | Count | What |
|---|---|---|
| Inline expect tests | 117 | Every module |
| Property-based tests | 11 | Algebraic laws verified across 6,500+ random inputs |
| Load tests | 4 | Sustained throughput, memory stability, latency, O(1) recomputation |
| Chaos tests | 3 | Crash at 100 random points, all produce correct output |
| Integration tests | 1 | Docker Compose with Redpanda (Kafka) + MinIO (S3) |
Pre-commit hook gates every commit against benchmark regression. Nothing lands if performance degrades.
The incremental computation model is incredibly powerful — but it's been locked inside single-process libraries.
Ripple is an attempt to:
- Bring incremental computation to distributed systems
- Preserve determinism at scale
- Stop recomputing work that doesn't need recomputing
This is not a wrapper around existing stream processors. This is a different execution model.
Ripple is not:
- A Kafka clone
- A Spark alternative
- A batch processing system
Ripple is:
- An incremental computation engine
- A deterministic stream processor
- A system that minimizes work instead of scaling it
Deltas are composable, associative, and idempotent:
apply(Set(v), _) = Ok v -- replacement
apply(d, apply(d, v)) = apply(d, v) -- idempotent
apply(diff(old, new), old) = Ok new -- roundtrip
compose(d, Remove) = Remove -- annihilation
compose(d, Set(v)) = Set(v) -- right identity
apply(compose(d1,d2), v) = apply(d2, apply(d1, v)) -- compatibility
This gives you effectively-once semantics without distributed transactions.
# Install dependencies
opam install core ppx_jane ppx_expect ppx_inline_test core_bench async
# Build
make build
# Run tests
make test
# Run the VWAP demo (2M+ events/sec)
make demo
# Start a worker
make worker
curl localhost:9100/health # OK
curl localhost:9100/metrics # Prometheus format
# CLI
dune exec ripple-cli -- info
dune exec ripple-cli -- inspect schemas| Target | What |
|---|---|
make build |
Compile |
make test |
All tests (117 inline + property + load + chaos) |
make bench |
Benchmarks |
make check |
Build + test + benchmark gate |
make demo |
VWAP pipeline, 100K events |
make worker |
Start worker process |
make docs |
Build documentation |
make post |
Generate project summary with live numbers |
make install-hooks |
Install pre-commit hook |
MIT
