Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[babel 8] Move @babel/register transform to a separate worker #14025

Merged
merged 11 commits into from
Dec 29, 2021

Conversation

nicolo-ribaudo
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo commented Dec 5, 2021

Q                       A
Fixed Issues? Closes #12814
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

Moving Babel to a worker allows us to run it asynchronously, which means that:

  • it supports .mjs config files
  • it lets us convert @babel/core to ESM while still supporting this CJS synchronous require hook
  • it makes it easier to avoid transforming plugins/files used by @babel/register with themselves (this check will be unnecessary in the future)

This PR is heavily inspired by #13199: it uses the same strategy to communicate synchronously (with Atomics.wait/Atomics.notify + receiveMessageOnPort), and it has the same architecture with LocalClient/WorkerClient to temporarily (until Babel 8) support both running in the same thread and in a separate thread.

This PR doesn't expose @babel/register/experimental-worker yet because it's .npmignored; I will open a semver-minor PR to expose it after that this one is merged (similarly to how #13199 was then exposed by #13398).

Unfortunately this diff won't be easy to review, so here is a commit-by-commit description:

  • Convert @babel/register to CommonJS
    • This package will never work as native ESM, so we should stop developing it as ESM (compiled to CJS).
    • It's ironic that I also had to migrate it from TS to JS, but it's because TS doesn't support (yet?) using .cts files (or .ts files without exports) when also using the --isolatedModules options. These files were almost untyped, so it's not a big loss.
    • GitHub shows packages/babel-register/src/browser.ts as deleted, but it has just been renamed to packages/babel-register/src/browser.js.
  • Support running @babel/register asynchronously in a worker
    1. packages/babel-register/src/cache.js was moved to packages/babel-register/src/worker/cache.js without any change. The diff doesn't show the rename because I left a "proxy" CJS package in the old location just in case soneone (for example, our tests) relies on it.
    2. The contents of packages/babel-register/src/node.js have mostly been moved to packages/babel-register/src/hook.js and the non-worker code paths (with all the Module._cache protection logic) should be the same; the differences are that it now calls into LocalClient rather than directly using @babel/core, and compileHook has been renamed to compileBabel7. Some parts of src/node.js have been moved to packages/babel-register/src/worker/transform.js because they need to run in the worker.
  • Test against reentrance adds a test for the behavior explained in the src/is-in-register-worker.js comment, but I also run those tests with the non-worker implementation since it doesn't hurt.
  • Test both implementations tests the new worker implementation (and the old one). I suggest disabling whitespace diff while reading this commit since all the tests have been indented.
  • Test with mjs config file adds a test to check that Babel can run asynchronously.
  • Don't expose @babel/register/experimental-worker yet this is just to avoid exposing the new implementation, so that we can have different commits for the Babel 8 support and for the new Babel 7 feature (so that one can go in the changelog, and the other commit can be prefixed with [babel 8] to make it easier to find while writing the Babel 8 changelog). Also, this PR blocks the ESM migration so I don't want to wait for the next minor 😛
  • [babel 8] Drawback: plugins must be stored in separate files This is the trade-off when using workers: you cannot pass functions to them. Luckily Babel plugins are not usually defined inline (either they are represented by strings, or they are in a config file), so most of our users won't notice the restriction. For those who do, the solution is to move the plugin to a separate file or to a config file.

@nicolo-ribaudo
Copy link
Member Author

nicolo-ribaudo commented Dec 5, 2021

@bmeck I know that you'd like to move ESM loaders to a separate thread so that they can also work for CJS require hooks; you might be interested in this as a real-world use case.

@babel-bot
Copy link
Collaborator

babel-bot commented Dec 5, 2021

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/50224/

@nicolo-ribaudo

This comment has been minimized.

@nicolo-ribaudo
Copy link
Member Author

The remaining failures are probably because workers don't flush their console output synchronously 🤔

@nicolo-ribaudo
Copy link
Member Author

nicolo-ribaudo commented Dec 5, 2021

To make sure that the console.logs from the worker are printed before the subsequent ones I tried to wait for the stdout to be flushed before waking up the main thread, however stdout is not flushed until the main thread wakes up and yields to the event loop (causing a deadlock).

This is because, by default Node.js first logs messages from the main thread, then yields to the event loop and logs messages from the worker:

Example

In this code the console.log("In worker!") call happens first, however it logs

Received Hi :)
In worker!
const {
  isMainThread,
  parentPort,
  Worker,
  MessageChannel,
  receiveMessageOnPort,
} = require("worker_threads");

if (isMainThread) {
  const worker = new Worker(__filename);

  const mc = new MessageChannel();
  const signal = new Int32Array(new SharedArrayBuffer(4));
  signal[0] = 0;

  worker.postMessage({ port: mc.port1, signal }, [mc.port1]);
  Atomics.wait(signal, 0, 0);

  const { message } = receiveMessageOnPort(mc.port2);
  console.log("Received", message);

  worker.unref();
} else {
  parentPort.on("message", ({ port, signal }) => {
    console.log("In worker!");

    port.postMessage("Hi :)");
    Atomics.store(signal, 0, 1);
    Atomics.notify(signal, 0);
  });
}

The reason for this behavior is that console.log in workers is always scheduled to the main thread asynchronously, while console.log in the main thread is sometimes synchronous (it depends on the OS and on how the node program is executed).

For now I'll accept this behavior and modify our tests not to rely on a specific ordering, and I'll look for a proper fix* if it ever ends up being a real-world problem (I don't think usually plugins console.log things that must absolutely be printed before anything else?).

* The proper fix is to intercept console.* calls, pass them to the main thread when waking it up and logging them immediately.

@nicolo-ribaudo nicolo-ribaudo force-pushed the register-experimental-worker branch 3 times, most recently from dffc56d to 7c41309 Compare December 5, 2021 22:11
static #worker_threads_cache;
/** @type {typeof import("worker_threads")} */
static get #worker_threads() {
return (WorkerClient.#worker_threads_cache ??= require("worker_threads"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use return require("worker_threads")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh right, require caches internally.

const output = await spawnNodeAsync(
[testFileLog],
path.dirname(testFileLog),
{ NODE_OPTIONS: `-r ${registerFile}` },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ NODE_OPTIONS: `-r ${registerFile}` },
{ NODE_OPTIONS: `--require ${registerFile}` },

});
it("works with the --require flag", async () => {
const output = await spawnNodeAsync(
["-r", registerFile, testFileLog],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
["-r", registerFile, testFileLog],
["--require", registerFile, testFileLog],

@bmeck
Copy link
Contributor

bmeck commented Dec 6, 2021

@nicolo-ribaudo it would be good to know exactly what moving to a Worker enables. Implementation to be ESM vs CJS is likely not compelling to various Node.js core contributors so any other expectations would be good. I doubt any performance benefits would come from this since it doesn't look like there will be a shared cache so likely it is some other driving force?

@nicolo-ribaudo
Copy link
Member Author

The driving force is that currently some parts of Babel are optionally async (for example config loading, but we have already the infrastructure to allow using async plugin/preset factories and parsers), and even if it's written in CJS Babel already supports ESM plugins which can only be loaded asynchronously.

@babel/register is currently the only exception in the Babel ecosystem where you must make sure that everything you use in Babel can run synchronously; by moving it to a worker we lift that restriction and make it behave like the other Babel packages.

test.js Outdated
@@ -0,0 +1,7 @@
const { isMainThread, Worker } = require("worker_threads");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be removed.

babel.config.js Outdated
{
test: ["packages/babel-register/**/*.js"].map(normalize),
sourceType: "script",
parserOpts: { allowReturnOutsideFunction: true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this option only needed by packages/babel-register/src/experimental-worker.js? If so can we modify the source so that we don't need to universally apply allowReturnOutsideFunction?

Comment on lines 7 to 8
* If @babel/register is imported using the -r/--require flag, we
* must the worker will have the same flag and we must avoid registering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If @babel/register is imported using the -r/--require flag, we
* must the worker will have the same flag and we must avoid registering
* If @babel/register is imported using the -r/--require flag,
* the worker will have the same flag and we must avoid registering

@nicolo-ribaudo
Copy link
Member Author

I re-reviewed this (usually I don't review my own PRs, but it has been almost a month since I wrote this code), and I'm now merging it with a single ✔️ since it shouldn't affect the Babel 7 behavior anyway.

@nicolo-ribaudo nicolo-ribaudo changed the title [babel 8] Move @babel/register transforms to a separate worker [babel 8] Move @babel/register transform to a separate worker Dec 29, 2021
@nicolo-ribaudo nicolo-ribaudo merged commit e77e3de into babel:main Dec 29, 2021
@nicolo-ribaudo nicolo-ribaudo deleted the register-experimental-worker branch December 29, 2021 15:33
@overlookmotel
Copy link
Contributor

Sorry I totally missed this PR before the commit landed. @nicolo-ribaudo A few questions:

Performance?

Do you have any figures on how much slower it performs in a worker? (I assume it will be at least somewhat slower due to added overhead of communication back and forth with worker)

Reduce worker communication volume

Could performance be improved by keeping the source map cache (maps var) inside the worker?

  • client.transform() would return the transformed code only (not the map).
  • map would be stored in memory inside the worker.
  • Add client method client.retrieveMap() to get source map object for a file from the worker.
  • retrieveSourceMap() callback passed to sourceMapSupport.install() would call client.retrieveMap(filename) to get the map on demand.

i.e. Source map object is only passed from worker to main thread if/when it's needed, rather than upfront for every file which is transformed. Unless the application throws an error, this will not happen at all.

Internal module cache

In Babel 8, you intend to remove the code to maintain a separate module cache for modules required by Babel internally.

While the internal module cache logic can safely be removed from compile.js, I think nodeWrapper.js is still not completely safe to remove.

In main thread, hook.js requires 'source-map-support' package. source-map-support in turn requires a dependency 'from-buffer'. This happens before babel-register is active, so from-buffer is not transpiled, and the untranspiled version will be left in the module cache.

If application code calls require('from-buffer'), Babel's transform will not be executed on the file, which it should be.

Same applies to 'source-map' package, and it's dependency 'whatwg-url'.

@overlookmotel
Copy link
Contributor

Actually, would it be more performant to deal with the code cache in main thread instead of in the worker?

Usually, the majority of files will be unchanged since babel-register was last run, so transformed code can just be retrieved from the cache. If cache was handled in the main thread, this would avoid a large number of round trips to the worker.

If overhead of calling into the worker is significant, I imagine this could avoid much of the penalty.

@nicolo-ribaudo
Copy link
Member Author

nicolo-ribaudo commented Jan 6, 2022

Do you have any figures on how much slower it performs in a worker? (I assume it will be at least somewhat slower due to added overhead of communication back and forth with worker)

We investigated this when we moved parsing in @babel/eslint-parser to a worker, and the performance impact has been neutral (#13199 (comment)). I will investigate how moving the cache of @babel/register affects it.

In Babel 8, you intend to remove the code to maintain a separate module cache for modules required by Babel internally.

While the internal module cache logic can safely be removed from compile.js, I think nodeWrapper.js is still not completely safe to remove.

In main thread, hook.js requires 'source-map-support' package. source-map-support in turn requires a dependency 'from-buffer'. This happens before babel-register is active, so from-buffer is not transpiled, and the untranspiled version will be left in the module cache.

We can lazy-require source-map-support, so that it's imported after setting up @babel/register. It feels safer to touch Module/require internals as less as possible!

@nicolo-ribaudo nicolo-ribaudo added this to In progress in Move to native ES modules via automation Jan 14, 2022
@nicolo-ribaudo nicolo-ribaudo moved this from In progress to Done in Move to native ES modules Jan 14, 2022
@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Apr 8, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 8, 2022
@nicolo-ribaudo nicolo-ribaudo added this to the v8.0.0 milestone Aug 8, 2023
@nicolo-ribaudo nicolo-ribaudo added PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release and removed babel 8 labels Aug 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: register PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release PR: Needs Docs
Development

Successfully merging this pull request may close these issues.

None yet

7 participants