Experimental feature: Dynamic worker loading #4383

kentonv · 2025-06-20T02:19:01Z

This is an experimental-only feature and will probably remain so until we play around with it a bit. Similar to facets (#4123), the interface is likely to change based on testing. This is also strictly an implementation in workerd -- the production runtime would need a very different implementation.

A Worker loader binding allows you to start an isolate from code provided at runtime, and then talk to that isolate.

let id = "foo";
let worker = env.LOADER.get(id, () => {
  // This callback is called if the given worker ID is not loaded yet.
  // (If a Worker with the same ID is already loaded by this loader, the
  // existing worker is returned without calling the callback.)

  // The callback returns the Worker's code. It can be async.
  return {
    compatibilityDate: "2025-06-01",

    // Specify the main module, whose exports are the entrypoints.
    mainModule: "foo.js",

    // Specify a map of all modules. This is an object mapping module
    // names to source code.
    modules: {
      // Normal ES modules are specified just as strings.
      "foo.js":
        "export default {\n" +
        "  fetch(req, env, ctx) { return new Response('Hello'); }\n" +
        "}\n",
      // Non-ES module types are supported too, by specifying an object
      // with one property that indicates the type.
      "some.wasm": { wasm: WASM_DATA },
    },

    // The `env` is specified as an arbitrary object. This gets serialized and
    // passed into the Worker.
    env: {
      SOME_ENV_VAR: 123
    },

    // You can override the global outbound. When the worker calls global
    // `fetch()` (or `connect()`), it will be directed to this service binding.
    globalOutbound: env.MY_OUTBOUND_WORKER,
  };
});

// Now you can get its entrypoint.
let defaultEntrypoint = worker.getEntrypoint();
await defaultEntrypoint.fetch(...);

// You can get non-default entrypoints as well.
let someEntrypoint = worker.getEntrypoint("SomeEntrypointClass");

// You can also get DO classes. These can be used with facets (#4123).
let someClass = worker.getDurableObjectClass("SomeDoClass");

The test has more examples.

TODO (this PR):

Add a bunch more tests.

TODO (later PRs, probably):

Eviction if idle workers.
Extend env so you can actually provide resource bindings. Currently it can only contain simple serializable values.
Support hooking up tails and cache API.

kentonv · 2025-06-20T02:22:25Z

FYI @irvinebroque @threepointone @rohinlohe

jasnell · 2025-06-20T17:36:26Z

Definitely happy to see this progressing but I definitely think it's rather unfortunate for us to have introduce a bespoke syntax/API for this. It would be nice if there were alignment here with either/both Node.js worker threads and web workers API.

kentonv · 2025-06-20T17:51:16Z

It would be nice if there were alignment here with either/both Node.js worker threads and web workers API.

The API design here is intended to directly expose how our platform works. We could consider keeping it private, while building compatibility wrappers for specific standard APIs.

I suspect, though, that there will be a lot of friction trying to apply various standard APIs that were designed for different use cases. Both Node worker threads and Web Workers are primarily intended to allow parallelization, offloading work to a background thread. The dynamic worker loader API in this PR explicitly does not do that -- when you call into the worker, it runs in the _same_thread. Instead, our feature is more about sandboxing and dynamic code loading.

Moreover, it's an explicit design goal that the dynamic worker loader should be able to load any real Worker code that you could otherwise deploy with wrangler, etc. That pretty much necessitates that our API include concepts like bindings and compat dates, which obviously don't exist in any standard worker-loading API.

irvinebroque · 2025-06-23T23:03:33Z

it runs in the _same_thread

Wonder if there is a mode where it shouldn't? What if I'm building a platform?

src/workerd/server/workerd.capnp

src/workerd/server/server.c++

src/workerd/api/worker-loader.h

src/workerd/api/worker-loader.c++

src/workerd/api/worker-loader-test.js

jasnell

Overall, LGTM. Tests can be expanded. Otherwise rest is nits.

src/workerd/api/worker-loader.h

src/workerd/api/worker-loader.c++

src/workerd/api/actor-state.c++

kentonv · 2025-06-29T18:55:43Z

@irvinebroque

it runs in the _same_thread

Wonder if there is a mode where it shouldn't? What if I'm building a platform?

I think there's a lot of room for us to add features to support parallelism, but they are probably orthogonal to this feature...

kentonv · 2025-06-29T20:28:47Z

More tests added:

kentonv · 2025-06-30T15:19:45Z

Fixed cyclic reference bug introduced by anonymous loaders: b4a7f48

Then rebased on main.

src/workerd/api/worker-loader.c++

jasnell · 2025-06-30T15:35:56Z

Grr... thanks Github

This must be vestigial, but at present it isn't used anywhere. If we needed to keep it around, we'd probably want to clear it in `unlink()`, like we do with `WorkerService::waitUntilTasks`.

Previously, the FacetManager interface took an actor class channel number as input when creating a facet. This was a little odd as it was the only place where channel numbers could be exchanged for channels outside of IoChannelFactory itself. Now, IoChannelFactory has a `getActorClass(uint channel)` method that returns an object representing the actor class. This object has no methods, it can only be passed into `FacetManager`. This makes the code in `server.c++` look a little better. More importantly, though, it opens the possibility of facets being implemented by actor classes that didn't strictly come from channel numbers. Dynamic isolate loading, in particular, will create ActorClassChannels dynamically! Also, Dynamic Dispatch may be extended to support actor classes.

Somewhat similar to the previous commit, we will soon need a uniform way to refer to outgoing "fetchers", whether they be service bindings (defined by a channel number), dynamic dispatch targets, dynamically-loaded worker entrypoints, or even DO stubs. This will especially be needed so that we can hook in a way to embed these in the `env` object passed to a dynamic isolate.

I realize it would be even better to go all the way to `kj::Rc` here, but that would create a lot more noise, also requiring a coordinated internal change, so I'm punting that for the moment.

This can create reference cycles, so we verify that cycles have been broken during the unlink() phase at shutdown.

Many objects in `server.c++` would hold a regular reference to a parent `Service`. We should make these strong references now. With this change, it should now be the case that if someone holds onto a `SubrequestChannel`, it prevents the underlying `Service` from being destroyed. This will be important when we implement dynamic isolate loading, which will presumably involve evicting isolates before server shutdown.

Turns out we don't actually need to know that these are `Service`s anymore. Well, except in a weird special case involving tails, but that case was _already_ doing dynamic downcasting to query the type, so we can just do more of that.

(Not hooked up yet, though.)

This implements a Worker loader I/O channel for workerd. But as of this commit there's still no binding that actually populates these channels.

The binding is declared like: `(workerLoader = (id = "foo"))` Similar to memory cache bindings, if the same ID is used in different bindings within the same workerd process, they will bind to the same loader, meaning they share the same LRU cache of workers. (Well, at present there's no LRU, the isolates just stay alive forever, but...)

Dynamic isolates will typically hook up to an already-extant `SubrequestChannel`.

This is an alternative to a simple channel number, needed with dynamically-loaded classes.

On review, this name is not actually "displayed" anywhere. In fact, it's not used for much of anything, except maybe some internal error messages.

In this case, it doesn't share a cache with any other loader.

Loading the same isolate name again should not call the callback.

Prompt: ``` deps/workerd/src/workerd/api/worker-loader-test.js contains tests for the new "dynamic worker loader" feature, but needs more test cases. Please add the following tests: - Test non-string modules types. The source code of a module, instead of being a string, can be an object, like `{text: "blah"}`, to represent a module type other than an ES module. The types are `js` (ES module, but more explicit), `cjs` (CommonJS module), `text` (just a string), `data` (just an ArrayBuffer), and `json` (arbitrary JS object as long as it is JSON-serializable). - Test setting compat date / flags works. - Test code loader callback that does something asynchronous (can just wait for a 1ms timeout) - Test what happens if the code getter callback throws an exception (it should propagate to the caller when it tries to call into the worker). Please add the tests to the end of the file in the order listed above. The worker loader API, if you need to refer to it, is defined in deps/workerd/src/workerd/api/worker-loader.h (in JSG format). ``` I significantly rewrote the compatDateFlags test (Claude's version didn't really test dates), and made a couple other minor tweaks, but otherwise this is mostly Claude's work at a cost of $0.29. This actually caught a bug, fixed up in a928d92.

kentonv · 2025-06-30T15:47:57Z

Grr... thanks Github

That's weird, it lets me load it in the browser (after a click).

jasnell · 2025-06-30T15:50:45Z

I switched to github's new preview UX that they are testing and apparently they reduced the threshold on size of the diff. Switching back to the original UX and it works.

kentonv requested a review from jasnell June 20, 2025 02:19

kentonv requested review from a team as code owners June 20, 2025 02:19