Skip to content

copyleftdev/ripple

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ripple

Most stream processing systems recompute too much.

Ripple only recomputes what actually changed.

→ Distributed, type-safe, incremental stream processing in OCaml
→ 250ns stabilization at 10K symbols
→ 2M+ events/sec throughput
→ Deterministic replay and recovery

Built from first principles after studying incremental computation models used in production trading infrastructure.

Ripple live simulation — 50 symbols, only affected nodes recompute


What this looks like

Traditional systems:

price update → recompute entire window → serialize full record → latency spike

Ripple:

price update → recompute 3 affected nodes → serialize delta → done in 250ns
10,000 symbols in the graph.
One trade arrives.
3 nodes recompute. 9,997 don't.

Core Idea

Ripple is built on incremental computation graphs:

  • Nodes represent computations
  • Edges represent dependencies
  • Changes propagate only to affected nodes
  • Stabilization is deterministic and minimal

Instead of recomputing everything, we recompute exactly what changed.

Input Events → [Leaf Nodes] → [Map] → [Incr Fold] → [Output] → Deltas
                (set value)   (pure)   (O(1) update)  (diff)    (bin_prot)

A min-heap tracks dirty nodes in topological order. The incremental fold subtracts the old value, adds the new. O(1) per changed parent — not O(N), not O(log N). O(1). Regardless of graph size.


Architecture

Ripple is composed of:

  • Graph Engine — incremental dependency tracking with heap-based propagation
  • Delta Layer — idempotent state transitions with algebraic guarantees
  • Schema System — type-safe schemas with compile-time compatibility checking
  • Transport — sequence-ordered distributed propagation with CRC-32C integrity
  • Checkpointing — deterministic replay and recovery
  • Coordinator / Workers — horizontal scaling via consistent hashing

Design principles:

  • Deterministic execution (injectable effects for time, randomness)
  • Minimal recomputation (O(1) incremental fold)
  • Type-safe boundaries (schemas derived from types)
  • Replayable state (crash at any point, recover correctly)
lib/
├── kernel/          Effect injection, domain types (Trade, Vwap)
├── graph/           Core engine: heap-based stabilization, incremental fold
├── schema/          Type-safe schemas, delta algebra, compatibility checker
├── wire/            Binary protocol with CRC-32C integrity
├── transport/       Sequence-ordered delta buffer with gap detection
├── checkpoint/      Snapshot/restore with pluggable stores (memory, disk, S3)
├── window/          Tumbling/sliding/session windows, watermark tracking
├── time_series/     Aggregators: count, sum, mean, vwap, stddev
├── topology/        Pipeline composition and validation
├── observability/   Prometheus metrics, W3C tracing, introspection, alerts
├── coordinator/     Consistent hashing, partition assignment, failure detection
├── worker/          Lifecycle state machine, engine loop
├── rpc/             Async RPC delta transport
├── connector/       Source/sink connectors (file, Kafka)
└── ripple/          Top-level facade

Performance

Operation Measured
Stabilization (10K symbols) 250 ns
bin_prot serde roundtrip 82 ns (12M+/sec)
Schema compatibility check 128 ns
VWAP pipeline throughput 2.16M events/sec
6M event replay recovery 2.1 seconds
Heap growth over 1M events 0.1%

Guarantees

  • Deterministic replay — same inputs always produce same outputs
  • Idempotent updates — apply(d, apply(d, v)) = apply(d, v)
  • Bounded recomputation — O(depth) per event, not O(graph size)
  • Crash recovery — 100/100 random crash points recover correctly

Quality Bar

Category Count What
Inline expect tests 117 Every module
Property-based tests 11 Algebraic laws verified across 6,500+ random inputs
Load tests 4 Sustained throughput, memory stability, latency, O(1) recomputation
Chaos tests 3 Crash at 100 random points, all produce correct output
Integration tests 1 Docker Compose with Redpanda (Kafka) + MinIO (S3)

Pre-commit hook gates every commit against benchmark regression. Nothing lands if performance degrades.


Why Ripple Exists

The incremental computation model is incredibly powerful — but it's been locked inside single-process libraries.

Ripple is an attempt to:

  • Bring incremental computation to distributed systems
  • Preserve determinism at scale
  • Stop recomputing work that doesn't need recomputing

This is not a wrapper around existing stream processors. This is a different execution model.


Positioning

Ripple is not:

  • A Kafka clone
  • A Spark alternative
  • A batch processing system

Ripple is:

  • An incremental computation engine
  • A deterministic stream processor
  • A system that minimizes work instead of scaling it

Delta Algebra

Deltas are composable, associative, and idempotent:

apply(Set(v), _)              = Ok v                  -- replacement
apply(d, apply(d, v))         = apply(d, v)           -- idempotent
apply(diff(old, new), old)    = Ok new                -- roundtrip
compose(d, Remove)            = Remove                -- annihilation
compose(d, Set(v))            = Set(v)                -- right identity
apply(compose(d1,d2), v)      = apply(d2, apply(d1, v))  -- compatibility

This gives you effectively-once semantics without distributed transactions.


Quick Start

# Install dependencies
opam install core ppx_jane ppx_expect ppx_inline_test core_bench async

# Build
make build

# Run tests
make test

# Run the VWAP demo (2M+ events/sec)
make demo

# Start a worker
make worker
curl localhost:9100/health     # OK
curl localhost:9100/metrics    # Prometheus format

# CLI
dune exec ripple-cli -- info
dune exec ripple-cli -- inspect schemas

Makefile

Target What
make build Compile
make test All tests (117 inline + property + load + chaos)
make bench Benchmarks
make check Build + test + benchmark gate
make demo VWAP pipeline, 100K events
make worker Start worker process
make docs Build documentation
make post Generate project summary with live numbers
make install-hooks Install pre-commit hook

License

MIT

About

Distributed, type-safe, incremental stream processing framework in OCaml. 250ns stabilization, 2M+ events/sec, heap-based dirty propagation, idempotent delta algebra.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors