Skip to content

Core Concepts

SametGoktepe edited this page Jun 17, 2026 · 1 revision

The three things eventferry is doing under the hood, in plain language: the outbox pattern, the three components that implement it, and the invariants they uphold. Read this once before any other page — every later section assumes you know which actor owns which step.


The outbox pattern in one paragraph

You have a database transaction that changes business state (INSERT INTO orders). You also need to publish an event about that change (orders.created on Kafka). The naive approach — commit the transaction, then publish — has a real failure mode: the process can die between the two, and the broker never hears about an order that absolutely happened. The reverse — publish first, then commit — has the opposite failure: the broker hears about an order that, on retry, the database rejects.

The outbox pattern fixes this by writing the event into the same database transaction as the business change, into a dedicated table. Either both rows commit, or neither does. A separate process (the "relay") reads the outbox table and publishes its rows to the broker, marking them as delivered when the broker acks. If the relay dies mid-batch, the rows it didn't ack stay claimable; another instance picks them up where the previous left off.

No dual-write. No lost messages. No out-of-order delivery within a single aggregate. Every event reaches the broker at least once.


The three components

eventferry splits the work across three roles. Each has a small, well-defined surface and can be swapped independently.

┌──────────────────────────────────────────────────────────────────┐
│                                                                  │
│   APPLICATION                                                    │
│   ───────────                                                    │
│       │                                                          │
│       │  inside a business transaction:                          │
│       │  store.enqueue(tx, { topic, aggregateId, payload })      │
│       ▼                                                          │
│   ┌─────────┐                                                    │
│   │  STORE  │  ◄── PostgresStore | MysqlStore                    │
│   └─────────┘                                                    │
│       │                                                          │
│       │  outbox row committed alongside the business change      │
│       ▼                                                          │
│   ┌────────────┐                                                 │
│   │  DATABASE  │                                                 │
│   └────────────┘                                                 │
│       │                                                          │
│       │  claimBatch / markDone / markFailed / requeue            │
│       ▲                                                          │
│   ┌─────────┐                                                    │
│   │  RELAY  │  ◄── core/Relay | PostgresStreamingRelay |         │
│   └─────────┘      MysqlBinlogRelay                              │
│       │                                                          │
│       │  publish(messages)                                       │
│       ▼                                                          │
│   ┌────────────┐                                                 │
│   │ PUBLISHER  │  ◄── KafkaPublisher                             │
│   └────────────┘                                                 │
│       │                                                          │
│       │  send via kafkajs or @confluentinc/kafka-javascript      │
│       ▼                                                          │
│   ┌──────────┐                                                   │
│   │  BROKER  │                                                   │
│   └──────────┘                                                   │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Store

Owns the outbox table. Knows the SQL dialect, the schema, the indexes, and the claim query that survives concurrent relays without double-claiming.

PostgresStore and MysqlStore are the built-in implementations. Both speak the same contract — anything that satisfies the OutboxStore interface drops in (kept abstract in @eventferry/core/types.ts).

The store does only persistence. It does not publish. It does not retry. It does not know what a "broker" is.

Relay

The orchestrator. Polls (or is notified) for claimable rows, hands them to the publisher, and writes back the result — markDone on success, markFailed (with attempts + next retry) on failure, requeue on backpressure.

Three flavors ship today:

  • Relay (core, polling) — works against any store. Default pollIntervalMs: 1000, default batchSize: 100. Cheap, reliable, sub-second latency with default settings.
  • PostgresStreamingRelay — same store contract, but driven by Postgres logical replication (pgoutput). Sub-millisecond wake on commit, no polling cost.
  • PostgresNotifyWaker — middle ground: LISTEN/NOTIFY wakes the polling relay on insert, so you keep the polling claim path but skip the poll interval when nothing's pending.
  • MysqlBinlogRelay — equivalent for MySQL, driven by ROW-mode binlog.

The relay knows nothing about Kafka either — it sees a Publisher interface and treats every broker the same way.

Publisher

Talks to the broker. KafkaPublisher is the built-in implementation; it wraps either kafkajs (pure JS) or @confluentinc/kafka-javascript (librdkafka) — your choice via driver: "kafkajs" | "confluent".

The publisher classifies errors (retriable / poison / fatal / fenced / backpressure / quota), routes dead-lettered messages to ${topic}.dlq with enriched headers, and exposes hooks for OpenTelemetry / logging / health checks. See Kafka Publisher for the full surface.


Lifecycle of an event row

A single outbox row moves through this state machine:

       enqueue (inside business tx)
            │
            ▼
       ┌─────────┐
       │ pending │
       └─────────┘
            │
            │  relay.claimBatch
            ▼
       ┌────────────┐
       │ processing │ ─── publisher acks ───► ┌──────┐
       └────────────┘                          │ done │
            │                                  └──────┘
            │                                      │
            │  publish fails                       │  purgeDone (optional)
            ▼                                      ▼
       ┌────────┐                             (deleted)
       │ failed │ ─── attempts > maxAttempts ─► ┌──────┐
       └────────┘                                │ dead │
            ▲                                    └──────┘
            │                                        │
            │  due retry, claimable again            │  DLQ also published
            │                                        ▼
            └───────────────────────────────►  Kafka topic.dlq

Status codes are integers in the database (pending=0, processing=1, done=2, failed=3, dead=4) — small, indexable, cheap. The TypeScript layer maps to string literals for ergonomics.


The invariants

These are the guarantees eventferry holds. Production tests in the integration suite verify each one against real Postgres + MySQL containers.

Atomic enqueue

store.enqueue(tx, msg) writes the outbox row inside the caller's transaction. The store accepts the active transaction handle (pg's PoolClient, mysql2's connection) — it doesn't open its own. If your business COMMIT succeeds, the event row is durable; if you ROLLBACK, the event vanishes too.

Exactly one relay claims each row

The claim query uses SKIP LOCKED (both Postgres and MySQL 8+) so concurrent relays never block each other and never see the same row twice. Bounce-tested: 6 parallel claimers over 30 rows yield exactly 30 claims, zero duplicates.

Strict head-of-aggregate ordering

The claim query only returns a row if no other row for the same aggregate is currently processing or due-earlier. Translation: if you produce three orders.created events for the same aggregateId, only the first becomes claimable until it's done. Concurrent claimers competing on a single aggregate see exactly one row.

You can't accidentally publish event #2 before event #1 for the same aggregate. The downstream consumer's invariants (snapshot rebuilds, materialized views) get to assume this and break loudly if it's ever violated — but it never is.

Across different aggregates, the relay parallelizes freely. You can fan out a relay to 10 instances and they'll all run concurrently without ordering hazards.

Crash-safe claim timeout

When the relay claims a batch, it stamps each row with claimed_at = now(). If the relay crashes before calling markDone / markFailed, the row sits at processing forever — except the reaper (built into the same claim query) treats any row whose claimed_at is older than claimTimeoutMs as up-for-grabs again.

Default claimTimeoutMs: 60_000. Tune it so it's larger than your slowest typical batch publish (otherwise a slow run gets reclaimed mid-flight by another instance — double publish + the first instance's markDone becomes a no-op). The Postgres and MySQL reaper both use server-side intervals (now() - interval '60 seconds') so application clock skew is irrelevant.

At-least-once delivery

The default contract. If the relay calls publisher.publish(batch), the broker either acks (row → done) or rejects (row → failed → retry → eventually DLQ). The only way an event row never reaches the broker is if it's poison from the start (MESSAGE_TOO_LARGE) or rejected by authentication — both go straight to DLQ, where you can ticket / replay / investigate.

For exactly-once-to-broker semantics, opt into the transactional producer (transactional: true, transactionalId: "..."). See Transactions and EOS for the full story.


Reading the rest of the wiki

Clone this wiki locally