-
-
Notifications
You must be signed in to change notification settings - Fork 0
Core Concepts
The three things eventferry is doing under the hood, in plain language: the outbox pattern, the three components that implement it, and the invariants they uphold. Read this once before any other page — every later section assumes you know which actor owns which step.
You have a database transaction that changes business state (INSERT INTO orders). You also need to publish an event about that change (orders.created on Kafka). The naive approach — commit the transaction, then publish — has a real failure mode: the process can die between the two, and the broker never hears about an order that absolutely happened. The reverse — publish first, then commit — has the opposite failure: the broker hears about an order that, on retry, the database rejects.
The outbox pattern fixes this by writing the event into the same database transaction as the business change, into a dedicated table. Either both rows commit, or neither does. A separate process (the "relay") reads the outbox table and publishes its rows to the broker, marking them as delivered when the broker acks. If the relay dies mid-batch, the rows it didn't ack stay claimable; another instance picks them up where the previous left off.
No dual-write. No lost messages. No out-of-order delivery within a single aggregate. Every event reaches the broker at least once.
eventferry splits the work across three roles. Each has a small, well-defined surface and can be swapped independently.
┌──────────────────────────────────────────────────────────────────┐
│ │
│ APPLICATION │
│ ─────────── │
│ │ │
│ │ inside a business transaction: │
│ │ store.enqueue(tx, { topic, aggregateId, payload }) │
│ ▼ │
│ ┌─────────┐ │
│ │ STORE │ ◄── PostgresStore | MysqlStore │
│ └─────────┘ │
│ │ │
│ │ outbox row committed alongside the business change │
│ ▼ │
│ ┌────────────┐ │
│ │ DATABASE │ │
│ └────────────┘ │
│ │ │
│ │ claimBatch / markDone / markFailed / requeue │
│ ▲ │
│ ┌─────────┐ │
│ │ RELAY │ ◄── core/Relay | PostgresStreamingRelay | │
│ └─────────┘ MysqlBinlogRelay │
│ │ │
│ │ publish(messages) │
│ ▼ │
│ ┌────────────┐ │
│ │ PUBLISHER │ ◄── KafkaPublisher │
│ └────────────┘ │
│ │ │
│ │ send via kafkajs or @confluentinc/kafka-javascript │
│ ▼ │
│ ┌──────────┐ │
│ │ BROKER │ │
│ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Owns the outbox table. Knows the SQL dialect, the schema, the indexes, and the claim query that survives concurrent relays without double-claiming.
PostgresStore and MysqlStore are the built-in implementations. Both speak the same contract — anything that satisfies the OutboxStore interface drops in (kept abstract in @eventferry/core/types.ts).
The store does only persistence. It does not publish. It does not retry. It does not know what a "broker" is.
The orchestrator. Polls (or is notified) for claimable rows, hands them to the publisher, and writes back the result — markDone on success, markFailed (with attempts + next retry) on failure, requeue on backpressure.
Three flavors ship today:
-
Relay(core, polling) — works against any store. DefaultpollIntervalMs: 1000, defaultbatchSize: 100. Cheap, reliable, sub-second latency with default settings. -
PostgresStreamingRelay— same store contract, but driven by Postgres logical replication (pgoutput). Sub-millisecond wake on commit, no polling cost. -
PostgresNotifyWaker— middle ground:LISTEN/NOTIFYwakes the polling relay on insert, so you keep the polling claim path but skip the poll interval when nothing's pending. -
MysqlBinlogRelay— equivalent for MySQL, driven byROW-mode binlog.
The relay knows nothing about Kafka either — it sees a Publisher interface and treats every broker the same way.
Talks to the broker. KafkaPublisher is the built-in implementation; it wraps either kafkajs (pure JS) or @confluentinc/kafka-javascript (librdkafka) — your choice via driver: "kafkajs" | "confluent".
The publisher classifies errors (retriable / poison / fatal / fenced / backpressure / quota), routes dead-lettered messages to ${topic}.dlq with enriched headers, and exposes hooks for OpenTelemetry / logging / health checks. See Kafka Publisher for the full surface.
A single outbox row moves through this state machine:
enqueue (inside business tx)
│
▼
┌─────────┐
│ pending │
└─────────┘
│
│ relay.claimBatch
▼
┌────────────┐
│ processing │ ─── publisher acks ───► ┌──────┐
└────────────┘ │ done │
│ └──────┘
│ │
│ publish fails │ purgeDone (optional)
▼ ▼
┌────────┐ (deleted)
│ failed │ ─── attempts > maxAttempts ─► ┌──────┐
└────────┘ │ dead │
▲ └──────┘
│ │
│ due retry, claimable again │ DLQ also published
│ ▼
└───────────────────────────────► Kafka topic.dlq
Status codes are integers in the database (pending=0, processing=1, done=2, failed=3, dead=4) — small, indexable, cheap. The TypeScript layer maps to string literals for ergonomics.
These are the guarantees eventferry holds. Production tests in the integration suite verify each one against real Postgres + MySQL containers.
store.enqueue(tx, msg) writes the outbox row inside the caller's transaction. The store accepts the active transaction handle (pg's PoolClient, mysql2's connection) — it doesn't open its own. If your business COMMIT succeeds, the event row is durable; if you ROLLBACK, the event vanishes too.
The claim query uses SKIP LOCKED (both Postgres and MySQL 8+) so concurrent relays never block each other and never see the same row twice. Bounce-tested: 6 parallel claimers over 30 rows yield exactly 30 claims, zero duplicates.
The claim query only returns a row if no other row for the same aggregate is currently processing or due-earlier. Translation: if you produce three orders.created events for the same aggregateId, only the first becomes claimable until it's done. Concurrent claimers competing on a single aggregate see exactly one row.
You can't accidentally publish event #2 before event #1 for the same aggregate. The downstream consumer's invariants (snapshot rebuilds, materialized views) get to assume this and break loudly if it's ever violated — but it never is.
Across different aggregates, the relay parallelizes freely. You can fan out a relay to 10 instances and they'll all run concurrently without ordering hazards.
When the relay claims a batch, it stamps each row with claimed_at = now(). If the relay crashes before calling markDone / markFailed, the row sits at processing forever — except the reaper (built into the same claim query) treats any row whose claimed_at is older than claimTimeoutMs as up-for-grabs again.
Default claimTimeoutMs: 60_000. Tune it so it's larger than your slowest typical batch publish (otherwise a slow run gets reclaimed mid-flight by another instance — double publish + the first instance's markDone becomes a no-op). The Postgres and MySQL reaper both use server-side intervals (now() - interval '60 seconds') so application clock skew is irrelevant.
The default contract. If the relay calls publisher.publish(batch), the broker either acks (row → done) or rejects (row → failed → retry → eventually DLQ). The only way an event row never reaches the broker is if it's poison from the start (MESSAGE_TOO_LARGE) or rejected by authentication — both go straight to DLQ, where you can ticket / replay / investigate.
For exactly-once-to-broker semantics, opt into the transactional producer (transactional: true, transactionalId: "..."). See Transactions and EOS for the full story.
- Configure the store → Postgres Adapter or MySQL Adapter
- Configure the publisher → Kafka Publisher
- Make events typed end-to-end → Type-Safe Events
- Classify failures correctly → Reliability and Error Handling
- Run multiple relays in production → Operations Guide
Repository · Issues · npm: @eventferry/all · MIT
Get going
Adapters
Type & schema
Security
Operational
- Transactions and EOS
- Admin Operations
- Observability
- Consuming Events
- Dead-Letter Queue
- Reliability and Error Handling
Operations
Reference