Skip to content

Server initialization fails for RPC and alarm entry points after hibernation #329

@agcty

Description

@agcty

How we found this

We have a Y.js collaboration server that extends Server with RPC methods for document operations (read, patch, etc.). Chat agents in separate Durable Objects call these RPC methods to modify shared documents on behalf of users.

Initially we used idFromName() + .get() to obtain stubs for the Y.Doc DOs. This worked — until a DO hibernated. After hibernation, RPC calls would execute but changes wouldn't broadcast to connected WebSocket clients, because onStart() (which rehydrates connection tracking) never ran.

We switched to getServerByName() which fixed the broadcasting issue by triggering setName() via the x-partykit-room header fetch. But this introduced a new problem: importing from partyserver caused Vite's SSR dependency optimizer to pre-bundle the agents package (which depends on partyserver). In that pre-bundle context, cloudflare:workers is unavailable, so DurableObject is undefined and the Agent class fails with "Class extends value undefined is not a constructor or null".

We dug into the partyserver source to understand why idFromName doesn't work after hibernation and found the root cause: #_name is only restored via HTTP paths, never for RPC or alarm entry points. We're now working around this in our own subclass by persisting the room name to ctx.storage (details below), which lets us go back to plain idFromName + .get().

Description

After a Durable Object hibernates and wakes via an RPC call or alarm, Server.#_name is undefined and onStart() never runs. This means:

  • RPC methods execute on an uninitialized server instance
  • alarm() calls #initialize() without restoring #_name, so any code in onStart() that reads this.name throws
  • Any state set up in onStart() (connection tracking, persistence callbacks, etc.) is missing

The root cause is that #_name is only restored through two paths, both of which require HTTP:

  1. fetch() — reads x-partykit-room from request headers
  2. webSocketMessage/Close/Error — reads connection.server from the WS attachment

RPC and alarm bypass both paths. The DO wakes, constructs a fresh instance, and the method executes immediately with no name and no initialization.

Reproduction

class MyServer extends Server {
  static options = { hibernate: true }

  async onStart() {
    console.log("initialized:", this.name)
    // Set up connection tracking, persistence, etc.
  }

  // Public RPC method
  async getData() {
    // After hibernation, this.name throws:
    // "Attempting to read .name on MyServer before it was set"
    return { room: this.name, data: "..." }
  }
}

// Caller:
const id = env.MY_SERVER.idFromName("room-123")
const stub = env.MY_SERVER.get(id)
await stub.getData() // Fails after hibernation

alarm() variant

class MyServer extends Server {
  static options = { hibernate: true }

  async onStart() {
    // this.name is undefined here when waking from alarm
    await this.ctx.storage.setAlarm(Date.now() + 60_000)
  }

  onAlarm() {
    console.log("alarm for room:", this.name) // throws
  }
}

The alarm() handler at index.ts:795-802 calls #initialize() without setting #_name first.

Current workaround

Use getServerByName() before making RPC calls. This sends a fetch with x-partykit-room to trigger setName(). But this:

  • Requires an extra HTTP round-trip before every RPC interaction
  • Forces callers to import from partyserver, which can cause bundler issues (e.g., Vite SSR pre-bundling pulls in the agents package which extends DurableObject from cloudflare:workers, failing in non-Worker contexts)
  • Defeats the purpose of RPC (direct method calls without HTTP overhead)

Suggested fix

Persist the room name to ctx.storage and restore it on cold start:

const ROOM_KEY = "__partyserver:room"

class Server extends DurableObject {
  constructor(ctx, env) {
    super(ctx, env)
    ctx.blockConcurrencyWhile(async () => {
      const name = await ctx.storage.get(ROOM_KEY)
      if (name) await this.setName(name)
    })
  }

  async setName(name) {
    // ... existing logic ...
    this.#_name = name
    // Persist for cold-start recovery
    if (this.#status !== "started") {
      await this.ctx.storage.put(ROOM_KEY, name)
      await this.#initialize()
    }
  }
}

This:

  • Gates all entry points (RPC, alarm, fetch, WebSocket) via blockConcurrencyWhile until initialization completes
  • setName() is already idempotent for the same name, so WebSocket handlers that also call it are a harmless no-op
  • Cost is one storage.get() per cold start
  • Makes getServerByName()'s fetch hack unnecessary for callers that only need RPC
  • Fixes the alarm() bug as well

This aligns with the existing TODO comments in the codebase:

  • Line 60: // TODO: fix this to use RPC
  • Line 415: // TODO: this is a hack to set the server name, it'll be replaced with RPC later

Our local workaround

We implemented the storage-based fix in our own Server subclass (YServer), which lets us use plain idFromName + .get() again:

const ROOM_KEY = "__partyserver:room"

class YServer extends Server {
  constructor(ctx, env) {
    super(ctx, env)
    ctx.blockConcurrencyWhile(async () => {
      const name = await ctx.storage.get(ROOM_KEY)
      if (name) await this.setName(name)
    })
  }

  async onStart() {
    await this.ctx.storage.put(ROOM_KEY, this.name)
    // ... rest of initialization (load doc, rehydrate connections, etc.)
  }
}

This works well but ideally belongs in Server itself so all subclasses benefit — and so alarm() is also fixed without every consumer needing to implement the same pattern.

Relation to workerd#2240

The partyserver source references cloudflare/workerd#2240 as the reason the x-partykit-room header hack exists — Durable Objects don't expose their name via the runtime API. If ctx.id.name were available, Server could read it directly and none of this would be needed.

Until that platform fix lands, persisting the name to ctx.storage is the pragmatic workaround. It's the standard DO pattern (in-memory state is a cache, storage is the source of truth) and costs one storage.get() per cold start.

Environment

  • partyserver: 0.1.5
  • Cloudflare Workers with hibernatable WebSockets

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions