Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement experimental module fallback service for testing #1423

Merged
merged 1 commit into from
Dec 8, 2023

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Nov 21, 2023

Adds the ability to load modules lazily from a local http fallback service. The fallback service is a simple http server that returns the Worker::Module configuration for a module dynamically rather than embedded within the worker configuration. This is only usable for local dev. See samples/module_fallback for an example.

Not all modules will use the fallback service. In the worker configuration, a module has to be explicitly configured to use the fallback. Also, because of the nature of the ESM modules and the way v8 handles them, there really is no hot-module-reloading. Once a module is loaded for a worker it cannot be replaced without restarting the worker. Currently that's only possible by either shutting workerd down or using the --watch feature which watches for changes in the source configuration file on disk. We'll need to try work around that separately.

The module loading using the fallback service is lazy-ish. The fallback service won't be engaged until the module is actually imported but since most imports are static it will appear as if the fallback service is being accessed immediately on worker startup for most cases. You'll see the laziness of the loader in action if you're using dynamic imports or require (e.g. import(...) or require(...)).

The fallback service is a simple http server that must return a properly JSON serialized form of the Worker::Module configuration as specified by the workerd.capnp schema. The request URL will be the specifier of the module in a file:// URL form (e.g. a module specifier of foo will be passed to the fallback service as file:///foo. No other headers or details are passed in the GET request.

For now, every request to the fallback service spins up its own kj::Thread that is joined synchronously with the current thread. We could update that to use a single persistent thread with a queue but the current approach is simpler and raw performance really isn't a concern at this point. We can always tweak that more later.

@mrbbot
Copy link
Contributor

mrbbot commented Nov 21, 2023

Thanks for putting up this PR! 🙂 It looks like we still need to specify all modules ahead-of-time, even if their contents are then fetched from the fallback service. For Vitest, we won't know which modules we need to load upfront, so would need a mechanism that requests from the module fallback service for all modules that couldn't be found.

Ideally we'd also have support for the following:

  • Distinguishing requires from import .../await import() so we can use an appropriate resolution algorithm and load the right version of a package's code
  • Splitting the referencing URL and specifier in the request to the fallback service, so we can easily tell if we're trying to load a Node.js built-in module (e.g. node:vm)

@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch from 45e589c to acf7491 Compare November 21, 2023 17:02
@jasnell
Copy link
Member Author

jasnell commented Nov 21, 2023

Ok, updated to support fallback for unknown modules. Also added a header to the fallback request that indicates import or require. The example also shows that unknown built-ins like node:vm can be handled by the fallback service as well. The referencing specifier is not included here, instead the specifier that you get is the import specifier resolved against the reference.

Copy link
Contributor

@mrbbot mrbbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this out with a bunch of differently typed modules including returning modules from the fallback service that make further requests to the fallback service. 🙂 All seems to work nicely, with Node 19.4.0. With Node 20.9.0, it looks like a Host header is required on requests, or else Node will return a 400 response.

src/workerd/server/workerd.capnp Outdated Show resolved Hide resolved
@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch from acf7491 to 4714dbb Compare November 21, 2023 18:57
@mrbbot
Copy link
Contributor

mrbbot commented Nov 27, 2023

Hey again! 👋 I've now got a MVP of the Vitest integration working using the fallback service and unsafe eval bindings. 🎉
Whilst developing this I hit a few more issues. I'm guessing most of these are fundamental restrictions of workerd's current module system, and could be fixed by your planned refactor.

  1. Returning a module from the fallback service with a different name than the requested path doesn't really work. The originally requested path is used when resolving modules from the returned module. This makes it tricky to implement Node-like module resolution. As an example, if module a/index.mjs contained import thing from "my-library", and the fallback service responded with...

    {
      "name": "node_modules/my-library/index.mjs",
      "esModule": "import dep from './dep.mjs'; export default dep;"
    }

    ...this would fail, as ./dep.mjs would be resolved relative to a/my-library, not node_modules/my-library/index.mjs. I'm working around this by returning a a/my-library "shim module" in this case, that re-exports ../node_modules/my-library/index.mjs causing a request to the fallback service with the correct path.

  2. Requests to the module fallback service don't differentiate between the referencing path and specifier, they only give you the URL of the module to load. Again, this makes it tricky to implement Node-like resolution, especially with things like import "vite-node/client" and import "@vitest/utils/error". I'm working around this by trying the basename of the URL's path as the specifier, then trying to add another segment of the path if that fails to resolve (i.e. first try client, then vite-node/client). This doesn't correctly disambiguate between modules like @vitest/expect & expect or vite-node/source-map & source-map, all of which are valid modules, and are currently hardcoded.

  3. Trying to import a node:* builtin outside the root will make a request to the module fallback service first, before trying to fetch from the root. For example, importing node:assert in a/b/c/index.mjs will try fetch /a/b/c/node:assert from the fallback service, then if that fails try to load /node:assert which succeeds. I'm working around this by just returning a 404 response from the fallback service if I detect this case, but ideally no request would be made. Notably, it looks like workerd only has this "try import from root on failure" behaviour for imports, not requires. For requires, I'm currently using the "shim module" workaround from earlier.

  4. Circular require() is not supported. Modules like chai try to require the root of their module in submodules: https://github.com/chaijs/chai/blob/744a16e1cc4e8a9c6d4499e1e520a0bc4c80ec18/lib/chai/utils/addProperty.js#L7. This isn't supported in workerd, but is supported in Node. I'm working around this by bundling chai with esbuild before returning it from the module fallback service.

  5. Fetching too many modules from the module fallback service causes the module registry's entries table to reallocate invalidating some pointers used when initialising synthetic modules, causing a use-after-free segfault. See https://gist.github.com/mrbbot/3247fb8fa0979caf546f6b5691f71f27 for a reproduction and ASAN stack trace. This is probably the biggest issue right now, as I can't workaround this without patching workerd, and should probably be fixed before this PR is merged. Right now, I've "fixed" the issue by preallocating a large entries table, but obviously this isn't the right fix.

    --- a/src/workerd/jsg/modules.h
    +++ b/src/workerd/jsg/modules.h
    @@ -378,7 +378,9 @@ class ModuleRegistryImpl final: public ModuleRegistry {
     public:
       KJ_DISALLOW_COPY_AND_MOVE(ModuleRegistryImpl);
    
    -  ModuleRegistryImpl(CompilationObserver& observer) : observer(observer) { }
    +  ModuleRegistryImpl(CompilationObserver& observer) : observer(observer) {
    +    entries.reserve(10000);
    +  }
    
       static kj::Own<ModuleRegistryImpl<TypeWrapper>> install(
           v8::Isolate* isolate,

@jasnell
Copy link
Member Author

jasnell commented Nov 27, 2023

@mrbbot ...

Returning a module from the fallback service with a different name than the requested path doesn't really work...

This becomes a bit tricky to implement but not impossible. It would largely mean that an individual ModuleInfo could be indexed multiple times in the entries table with different names. Not a huge refactor but I think the changes needed for this would also fix the reallocation issue so definitely worth doing.

Requests to the module fallback service don't differentiate between the referencing path and specifier, they only give you the URL of the module to load.

Just to make sure I understand, you'd like the request to the fallback service to include both the specifier of the module being imported/required and the specifier of the referent? Should be possible with a few additional minor tweaks.

For example, importing node:assert in a/b/c/index.mjs will try fetch /a/b/c/node:assert from the fallback service, then if that fails try to load /node:assert which succeeds

This one should be fixed with the refactor that treats specifiers as proper URLs but I'll verify that. This is a bit of a larger change so I might not do anything for this one immediately and just handle it as part of the larger refactor.

Circular require() is not supported. Modules like chai try to require the root of their module in submodules

Yeah, this is one that I've been wanting to revisit for a while. For now since you have a workaround let's leave it as is. After the larger refactor is done I'll revisit circular requires/imports in general.

@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch from 4714dbb to 6e13a3f Compare November 27, 2023 22:47
@mrbbot
Copy link
Contributor

mrbbot commented Nov 27, 2023

...the request to the fallback service to include both the specifier of the module being imported/required and the specifier of the referent?

@jasnell Yep, something like that. Here are some examples of the ideal behaviour I'm looking for:

  • import thing from "./thing.mjs" inside a/b/c/index.mjs should send file:///a/b/c/index.mjs as the referencing path, ./thing.mjs as the specifier, and import as the resolve method.
  • require("my-module") inside a/index.cjs should send file:///a/index.cjs as the referencing path, my-module as the specifier, and require as the resolve method
  • import "@vitest/utils/error" inside a/b/c/index.mjs should send file:///a/b/c/index.mjs as the referencing path, @vitest/utils/error as the specifier, and import as the resolve method
  • import {} from "#async_hooks" inside a/index.mjs should send file:///a/index.mjs as the referencing path, #async_hooks as the specifier, and import as the resolve method (seen in packages using import maps)

Essentially, rather than receiving something like file:///a/b/c/@vitest/utils/error where it's not clear which module needs to be loaded, I'd like the @vitest/utils/error to be separate.

If it's helpful, this is what my module fallback service implementation looks like at the moment too: https://github.com/cloudflare/workers-sdk/blob/bcoll/vitest-pool-workers/packages/vitest-pool-workers/src/pool/module-fallback.ts

@jasnell
Copy link
Member Author

jasnell commented Nov 27, 2023

ok, I'll likely be able to work up changes for this by end of day tomorrow.

@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch from 6e13a3f to fa9e8ca Compare November 30, 2023 21:21
@jasnell
Copy link
Member Author

jasnell commented Dec 1, 2023

@mrbbot ... ok, I've pushed up some update fixup commits that hopefully address most if not all of your needs. Be aware that it changes quite a bit.

  • The request url is now in the form /?specifier={specifier}&referrer={referrer}. It should be clear which is which. These will end up being percent-encoded. Look at the example fallback.js for an example of how to extract the values. Each of these will still end up being absolute specifiers but having both should hopefully give you the information you need. It's a bit of a larger change to give the raw, unresolved specifier and I'd rather avoid it for now.
  • An effort to detect the node:..., cloudflare:..., and workerd:... prefixes is now made so that those should always come through as just the bare specifiers without any additional prefixing.
  • The fallback service can return a 301 to alias one import specifier to another. For instance, if you know that the specifier /foo actually maps to /bar, then you can return a 301 whose localtion header is /bar. This mapping is cached so that future lookups on /foo will automatically map to resolving /bar instead of asking the fallback service again.
  • The fallback service does not have to return the "name" property in the module info detail. If it does, it MUST match the specifier given in the request. This is an additional sanity check that is entirely optional. Use the 301 redirect to map multiple specifiers to the same module.

I don't think it hits all of your cases in https://github.com/cloudflare/workerd/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc#issuecomment-1828786854 but it hopefully gets close and we can continue to iterate if necessary.

In the future, once my planned module registry refactor is done, some of the details on this may change. Your code will likely need to be able to deal with both the old and the new. My plan there is to introduce a x-version header in the request to the fallback service that will identify which version of the module registry is sending the request so you can differentiate. That's not there yet but it'll come with the refactor later on.

Copy link
Contributor

@mrbbot mrbbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell Thanks for making those changes. I was able to massively simplify my fallback service implementation. 😃 I've added some comments for some minor tweaks I needed to make to get things working.

Your code will likely need to be able to deal with both the old and the new.

I'm not too worried about this. miniflare pins its workerd version already, so we can just update the code when workerd changes. 👍

src/workerd/server/server.c++ Show resolved Hide resolved
src/workerd/server/server.c++ Show resolved Hide resolved
src/workerd/jsg/modules.c++ Show resolved Hide resolved
@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch 3 times, most recently from 9b4c699 to 3a61d6a Compare December 7, 2023 21:15
@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch 2 times, most recently from b4e66b6 to d3110f7 Compare December 8, 2023 16:33
Copy link
Contributor

@mrbbot mrbbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested, added the changes I needed to make to get this working as suggestions. 👍

src/workerd/jsg/modules.h Outdated Show resolved Hide resolved
src/workerd/jsg/modules.h Show resolved Hide resolved
src/workerd/server/server.c++ Outdated Show resolved Hide resolved
src/workerd/server/server.c++ Show resolved Hide resolved
src/workerd/server/server.c++ Outdated Show resolved Hide resolved
@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch 2 times, most recently from da328ed to d51e631 Compare December 8, 2023 20:46
Copy link
Contributor

@mrbbot mrbbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this works with the current Vitest pool implementation without any additional patches. 🎉

@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch from d51e631 to 542e54a Compare December 8, 2023 21:25
Adds the ability to load modules lazily from a local http fallback
service. The fallback service is a simple http server that returns
the Worker::Module configuration for a module dynamically rather
than embedded within the worker configuration. This is only usable
for local dev. See samples/module_fallback for an example.

Co-authored-by: MrBBot <bcoll@cloudflare.com>
@jasnell jasnell force-pushed the jsnell/workerd-module-fallback-service branch from 542e54a to d14f17f Compare December 8, 2023 21:59
@jasnell
Copy link
Member Author

jasnell commented Dec 8, 2023

Ok, internal CI run looks good and shows this can land without requiring internal changes to land at the same time. Will get this merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants