Skip to content

@effect/cluster: Shard acquisition loop never completes with BunClusterSocket SQL storage #6155

@Necmttn

Description

@Necmttn

Bug Report

Versions

  • effect: 3.19.15
  • @effect/cluster: 0.56.1
  • @effect/platform-bun: 0.87.1
  • @effect/sql-pg: 0.50.1
  • Bun: 1.3.10
  • PostgreSQL: 17

Description

When using BunClusterSocket.layer({ storage: "sql" }), the runner registers in PostgreSQL and receives shard assignments, but never transitions to entity/cron registration. The shard acquisition loop runs indefinitely.

Expected behavior

Runner starts, acquires shards, starts ClusterCron entities, logs "tick" every 10s.

Actual behavior

Runner logs Shard acquisition loop / New shard assignments / RunnerStorage sync forever. Entities never register. The runner is alive but non-functional.

Reproduction

git clone https://github.com/noktadev/effect-cluster-shard-stall-repro
cd effect-cluster-shard-stall-repro
bun install
bun run db:up
bun run start
# Wait 30+ seconds - no "tick" output appears

Repo: https://github.com/noktadev/effect-cluster-shard-stall-repro

Minimal code

import { ClusterCron, ClusterWorkflowEngine } from "@effect/cluster";
import { BunClusterSocket, BunRuntime } from "@effect/platform-bun";
import { PgClient } from "@effect/sql-pg";
import { Cron, Effect, Either, Layer, Redacted } from "effect";

const TickCron = ClusterCron.make({
    name: "TickCron",
    cron: Cron.parse("*/10 * * * * *").pipe(Either.getOrThrow),
    execute: Effect.log("tick"),
});

const PgLive = PgClient.layer({
    url: Redacted.make("postgres://postgres:postgres@localhost:25432/cluster_test"),
});

const RunnerLive = BunClusterSocket.layer({ storage: "sql" });

const EnvLayer = Layer.mergeAll(TickCron, ClusterWorkflowEngine.layer).pipe(
    Layer.provideMerge(RunnerLive),
    Layer.provideMerge(PgLive),
);

Layer.launch(EnvLayer).pipe(BunRuntime.runMain);

Observations

  • PostgreSQL cluster_runners table shows the runner registered with healthy: true
  • Shard assignments ARE received (visible in DEBUG logs as New shard assignments with shard IDs)
  • The runner never progresses past shard acquisition to entity registration
  • SingleRunner.layer({ runnerStorage: "memory" }) works perfectly
  • Truncating cluster_runners/messages/replies/locks and restarting sometimes (but not always) resolves the stall
  • Ghost runners accumulate across restarts (no deregistration on pod/process shutdown)
  • Discovered in production Kubernetes with Bun-based runners

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions