Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ESM node processes being unable to fork into other scripts #1814

Merged
merged 11 commits into from
Jul 13, 2022

Conversation

devversion
Copy link
Contributor

Currently, Node processes instantiated through the --esm flag result
in a child process being created so that the ESM loader can be
registered. This works fine and is reasonable.

The child process approach to register ESM hooks currently prevents
the NodeJS fork method from being used because the execArgv
propagated into forked processes causes ts-node (which is also
propagated as child exec script -- this is good because it allows nested
type resolution to work) to always execute the original entry-point,
causing potential infinite loops because the designated fork module
script is not executed as expected.

This commit fixes this by not encoding the entry-point information into
the state that is captured as part of the execArgv. Instead the
entry-point information is always retrieved from the parsed rest command
line arguments in the final stage (phase4).

Fixes #1812.

@devversion devversion marked this pull request as ready for review June 24, 2022 13:58
@codecov
Copy link

codecov bot commented Jun 24, 2022

Codecov Report

Merging #1814 (58ddc33) into main (ad01f49) will increase coverage by 0.07%.
The diff coverage is 97.29%.

Impacted Files Coverage Δ
src/bin.ts 87.70% <95.00%> (+0.54%) ⬆️
src/child/argv-payload.ts 100.00% <100.00%> (ø)
src/child/child-entrypoint.ts 92.85% <100.00%> (+1.19%) ⬆️
src/child/spawn-child.ts 73.68% <100.00%> (-6.32%) ⬇️
src/configuration.ts 85.60% <0.00%> (-0.47%) ⬇️

@devversion
Copy link
Contributor Author

@cspotcode Just FYI: I re-pushed because I didn't realize older versions are tested as well. Switched the tests from the node next resolution to just esnext and added another test for nodenext that is guarded for TS 4.7.

The nightly node failures seem to also happen in main and I will ignore these as part of this PR.

@cspotcode
Copy link
Collaborator

Yeah the nightly node failure is due to some ESM changes they're making, you can ignore those. Also, please don't hesitate to ping me for guidance on any other test failures, to avoid frustration. Sometimes having such a big test matrix means it can be difficult for a new contributor to understand what's going on, and I might have some hard-earned occult intuition to help out.

@devversion
Copy link
Contributor Author

Okay, I pushed again. Should fix the previous CI failures due the use of import.meta.url in TS versions which don't know about it. Other than that, I think the tests should be passing now.

https://github.com/TypeStrong/ts-node/runs/7043357305?check_suite_focus=true#step:15:609 is a little confusing to me, but that might have been just flakiness. I tried with the same Node version and couldn't reproduce.

@devversion
Copy link
Contributor Author

Hey, just FYI that this PR is ready from my side now. No rush, please let me know if I can do anything. Thanks!

@cspotcode cspotcode added this to the 10.8.3 milestone Jul 2, 2022
@@ -48,7 +48,7 @@ export function main(
const state: BootstrapState = {
shouldUseChildProcess: false,
isInChildProcess: false,
entrypoint: __filename,
tsNodeScript: __filename,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this renamed to indicate that this is the path to ts-node's bootstrapper, but not the user's desired entrypoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I renamed that mostly to avoid the ambiguity here (at least ambiguity to me). Happy to revert if you would want it the old way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming is good, makes sense to me.

* Stages before (phase 4) can and will be cached by the child process through the Brotli
* configuration and entry-point information is only reliable in the final phase. More
* details can be found in here: https://github.com/TypeStrong/ts-node/issues/1812.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding all these comments.

It makes sense to me that we must repeat resolution in phase 4. Are there any situations where we know, without any risk, that we can skip this duplicate resolution? It would be a performance optimization.

If we cannot safely do this, then we shouldn't. But if there's a safe way to do this, then I might try my hand at adding it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could re-use this when the argvResult has not changed. We only need the rerun this function due to the patches in the child loader entry-point. We could indicate this change through a flag and request re-computation.

This feels like a micro optimization through, not worth it personally as this is just some simple logical expressions.

src/bin.ts Show resolved Hide resolved
@cspotcode
Copy link
Collaborator

Thanks @devversion, I have a couple questions but otherwise this is excellent work.

src/bin.ts Outdated

/** Unresolved. May point to a symlink, not realpath. May be missing file extension */
const entryPointPath = executeEntrypoint
? resolve(cwd, restArgs[0])
Copy link
Collaborator

@cspotcode cspotcode Jul 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cwd might be the wrong path, if the child process was forked with a different working directory than the parent. The cwd option has always been confusing, let me see if I can figure out why we're using it in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an issue with the cwd option. It's quite ambiguous: It doesn't actually control the working directory of the invoked process (kind of surprising). It's mainly used for the script to be resolved and e.g. the tsconfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cspotcode I added a new commit (which might be considered a breaking change? to me it sounds like a fix):

  • The --cwd option now also sets the working directory properly
  • The working directory is properly preserved when we transition into the child process for ESM.
  • Added test scenarios for non-ESM --cwd and also for the scenario you outlined in this comment (fork cwd)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to double-check. The cwd option has never been great, IMO, and as far as I remember, is only there for backwards compatibility to support some legacy use-cases.

It makes ts-node's logic pretend to be in that cwd but without actually changing the process's cwd. So it affects how ts-node's --cwdMode locates the tsconfig, and affects paths in typechecking errors. --cwdMode isn't even the default anymore because it was confusing and the newer default is better. Just goes to show, I don't love the cwd option.

As confusing as it may be, pretty sure we should not be changing it to affect process.cwd(). But I'll have to double-check the old PRs where I cleaned up this logic; it was a couple years ago. I'm not at my computer; will check later today.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for projects to intentionally launch child processes in a different cwd, right? Either by passing the cwd directly to fork options, or by changing the parent's working directory before forking. We'll need to support that.

Copy link
Contributor Author

@devversion devversion Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Forking into different working directories should still work. I added a test for this in ESM, but should also make sure with CJS.

We are using the cwd option in a couple of places, and naturally I also expected it to change the initial working directory, similar to how e.g. yarn --cwd works. This allows me to directly call TS node scripts e.g. in the package.json without needing more code or an extra script.

I assume folks also expected the same here, because the option works that way in e.g. yarn and the docs kind of describe it that way.

Copy link
Contributor Author

@devversion devversion Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this, and added a test. Regardless of what you decide, the additional tests can be helpful. Happy to revert the cwd changes. I believe the semantics should be that way, or the option should be removed altogether (even though it would actually be useful that way -- like yarn --cwd). Proposed semantics:

  • --cwd changes the working directory before the user entry-point is invoked
  • The working directory flag is omitted when forking into child processes. It's up to the user then/or defaults to process.cwd() (which might be the initial --cwd if specified)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think removal makes most sense to reduce complexity. The option predates my involvement with the project, it's confusing, and node nor tsc has an equivalent --cwd option. There's also programmatic registration scenarios, where ts-node is being registered into an existing process and probably shouldn't be changing the cwd. Could make the cwd option a CLI-only thing, but it's really just messy feature creep.

I make frequent use of git -C so I understand the appeal of --cwd in general. But in this case, it feels like a maintenance liability to keep it around.

Can't remove till the next major, though, cuz it'd be a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it definitely makes some things more difficult. Not too bad IMO though.

Anyway though, I agree that the removal is a good consideration. What are the next steps for this PR?

Copy link
Collaborator

@cspotcode cspotcode Jul 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing left is: entrypoint resolution in the first child should use --cwd, vs subsequent forked children should use process.cwd().

The first child process spawned by our --esm flag needs to obey the CLI's --cwd option for backwards compatibility. (ugh annoying) So if you are in /project and you run ts-node --cwd foo bar.ts then the entrypoint needs to be resolved to /project/foo/bar.ts

Subsequently forked children should behave like vanilla node and resolve their entrypoint relative to process.cwd()
If you process.chdir('/home/user') and do fork('script.ts'), the child's entrypoint needs to resolve to /home/user/script.ts.


I think this means we need to pass a boolean to child processes telling them if they are the first child spawned by --esm, or they are a subsequent child spawned by fork(). child-entrypoint.ts will need to modify the execArgv array to pass this flag to subsequent children. One method is a new --is-first-child=true|false flag, another method is to modify and re-compress the brotli payload.


Maybe there are other benefits to re-compressing the brotli payload: #1831
I'm not suggesting we implement #1831 in this PR, more thinking that re-compressing the brotli payload in this PR will have benefits for future work.

src/bin.ts Outdated
return callInChild(state);
// Note: When transitioning into the child-process after `phase2`,
// the updated working directory needs to be preserved.
return callInChild(state, state.phase2Result.targetCwd);
Copy link
Contributor Author

@devversion devversion Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explicit cwd pass-through is not strictly needed since spawn would just use the working directory we switched to, but I think being explicit is helping with readability and also makes this safer

Currently, Node processes instantiated through the `--esm` flag result
in a child process being created so that the ESM loader can be
registered. This works fine and is reasonable.

The child process approach to register ESM hooks currently prevents
the NodeJS `fork` method from being used because the `execArgv`
propagated into forked processes causes `ts-node` (which is also
propagated as child exec script -- this is good because it allows nested
type resolution to work) to always execute the original entry-point,
causing potential infinite loops because the designated fork module
script is not executed as expected.

This commit fixes this by not encoding the entry-point information into
the state that is captured as part of the `execArgv`. Instead the
entry-point information is always retrieved from the parsed rest command
line arguments in the final stage (`phase4`).

Fixes TypeStrong#1812.
…hild process

Currently the `--esm` option does not necessarily do what the
documentation suggests. i.e. the script does not run as if the working
directory is the specified directory.

This commit fixes this, so that the option is useful for TSConfig
resolution, as well as for controlling the script working directory.

Also fixes that the CWD encoded in the bootstrap brotli state for the
ESM child process messes with the entry-point resolution, if e.g. the
entry-point in `child_process.fork` is relative to a specified `cwd`.
@devversion
Copy link
Contributor Author

@cspotcode Ah thanks for helping out here. I was actually working quite some time today and yesterday on always using the brotli payload (even without esm). I have something for this pretty soon, but I'm not sure if we want to land it as part of this PR.

It's a breaking change as you realized as well because it will always use the same preloaded project. I think that is worth doing though.

@devversion
Copy link
Contributor Author

do you want me to push that here or how should we proceed? The cwd option will behave like before (like you suggested)

@cspotcode
Copy link
Collaborator

cspotcode commented Jul 12, 2022

@cspotcode Ah thanks for helping out here. I was actually working quite some time today and yesterday on always using the brotli payload (even without esm). I have something for this pretty soon, but I'm not sure if we want to land it as part of this PR.

It's a breaking change as you realized as well because it will always use the same preloaded project. I think that is worth doing though.

Oh that's great, probably it should be a separate PR tied to #1831 to be shipped in the next major. I think as soon as this PR lands and ships, I should switch gears fully to powering through all the breaking changes for the next major. That'll also drop support for ancient node and TS versions, so it's simplify the development process.

Sorry if I was stepping on your toes a bit, TBH sometimes PRs get dropped when people get tired of my notes about backwards-compat, so I just carry them over the finish line myself. :)


My changes:

I removed the sanitizeArgv logic in favor of #1834 and/or #1831
I removed the process.chdir()
I modified the child process to re-compress the brotli payload to tell grandchild processes that they should resolve entrypoints relative to process.cwd() like vanilla node. I tried my best to put comments linking this logic to #1834 for continued discussion

If the tests pass, then I think this is ready to merge and publish. But please do let me know if I missed anything.

@cspotcode
Copy link
Collaborator

Oops, I forgot to update the tests since they use --cwd, I assume that's why CI failed. I also haven't code-reviewed them yet, I might have time tonight for that. I'll post here before I attempt further code changes to avoid conflicting with your work.

});

test.suite(
'with NodeNext TypeScript resolution and `.mts` extension',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this .mts variant still required? I see that the previous test case used regexp matching to avoid import.meta.url. But if we can avoid that issue via transpileOnly, then I think this test case is identical to the previous one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely yes. Although these tests are still a little different:

  • One loads the worker through an absolute path
  • One loads the worker relatively

@cspotcode
Copy link
Collaborator

To avoid duplicated effort: I've started fixing the tests, and I'll push what I have soon.

process.exitCode = 1;

const projectDir = dirname(fileURLToPath(import.meta.url));
const workerProcess = fork(join(projectDir, 'worker.mts'), [], {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this test case use an absolute path for the worker's entrypoint? Is it testing something that hasn't occurred to me? I figure it should be same as the other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See response above. One test, with .mts I used import.meta.url to construct an absolute path to the worker. The other tests were not using import.meta (and couldn't --without transpile-only) and use a relative path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, sounds like all the things we need to test are:

  1. entrypoint script runs with correct process.cwd()
  2. fork() entrypoint resolves with absolute path
  3. fork() entrypoint resolves with relative path relative to process.cwd(), not parent's ts-node --cwd flag

...for both CJS and ESM, with 3. disabled in the non-ESM case since that bug will be fixed by other tickets.

I'll update the PR now.

disable non --esm test with comment about known bug and link to tickets
make tests set cwd for fork() call, to be sure it is respected and not overridden by --cwd
@cspotcode
Copy link
Collaborator

@devversion I think this is ready to merge and publish. I removed the .mts tests because, as far as I can tell, it is sufficient to test .ts ESM. I'd appreciate your review on that change, but otherwise I think we're ready to go.

As soon as this is merged, I can publish 10.9.0 (https://github.com/TypeStrong/ts-node/milestone/16), and then I think I can start merging breaking changes for 11.0.0 (will combine the https://github.com/TypeStrong/ts-node/milestone/13 and https://github.com/TypeStrong/ts-node/milestone/17 milestones)

@cspotcode cspotcode merged commit 32d07e2 into TypeStrong:main Jul 13, 2022
@devversion
Copy link
Contributor Author

@cspotcode Thanks for fixing up the rest. I was working on it already but you came first. Much appreciated.

crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Jul 18, 2022
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [ts-node](https://typestrong.org/ts-node) ([source](https://github.com/TypeStrong/ts-node)) | devDependencies | minor | [`10.8.2` -> `10.9.1`](https://renovatebot.com/diffs/npm/ts-node/10.8.2/10.9.1) |

---

### Release Notes

<details>
<summary>TypeStrong/ts-node</summary>

### [`v10.9.1`](https://github.com/TypeStrong/ts-node/releases/tag/v10.9.1)

[Compare Source](TypeStrong/ts-node@v10.9.0...v10.9.1)

**Fixed**

-   Workaround nodejs bug introduced in 18.6.0 ([#&#8203;1838](TypeStrong/ts-node#1838)) [@&#8203;cspotcode](https://github.com/cspotcode)
    -   Only affects projects on node >=18.6.0 using `--esm`
    -   Older versions of node and projects without `--esm` are unaffected

https://github.com/TypeStrong/ts-node/milestone/18?closed=1

### [`v10.9.0`](https://github.com/TypeStrong/ts-node/releases/tag/v10.9.0)

[Compare Source](TypeStrong/ts-node@v10.8.2...v10.9.0)

**Added**

-   `--project` accepts path to a directory containing a `tsconfig.json` ([#&#8203;1829](TypeStrong/ts-node#1829), [#&#8203;1830](TypeStrong/ts-node#1830)) [@&#8203;cspotcode](https://github.com/cspotcode)
    -   previously it required an explicit filename
-   Added helpful error message when swc version is too old to support our configuration ([#&#8203;1802](TypeStrong/ts-node#1802)) [@&#8203;cspotcode](https://github.com/cspotcode)
-   Added `experimentalTsImportSpecifiers` option which allows using voluntary `.ts` file extensions in import specifiers (undocumented except for [API docs](https://typestrong.org/ts-node/api/interfaces/CreateOptions.html#experimentalTsImportSpecifiers)) ([#&#8203;1815](TypeStrong/ts-node#1815)) [@&#8203;cspotcode](https://github.com/cspotcode)

**Fixed**

-   Fixed bug where `child_process.fork()` would erroneously execute the parent's entrypoint script, not the intended child script ([#&#8203;1812](TypeStrong/ts-node#1812), [#&#8203;1814](TypeStrong/ts-node#1814)) [@&#8203;devversion](https://github.com/devversion)
-   Fixed support for jsx modes `"react-jsx"` and `"react-jsxdev"` in swc transpiler ([#&#8203;1800](TypeStrong/ts-node#1800), [#&#8203;1802](TypeStrong/ts-node#1802)) [@&#8203;cspotcode](https://github.com/cspotcode)
-   Fixed support for import assertions in swc transpiler ([#&#8203;1817](TypeStrong/ts-node#1817), [#&#8203;1802](TypeStrong/ts-node#1802)) [@&#8203;cspotcode](https://github.com/cspotcode)
-   Fixed bug where calling `repl.evalCode()` with code not ending in a newline would not update the typechecker accordingly ([#&#8203;1764](TypeStrong/ts-node#1764), [#&#8203;1824](TypeStrong/ts-node#1824)) [@&#8203;cspotcode](https://github.com/cspotcode)

https://github.com/TypeStrong/ts-node/milestone/16?closed=1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox.

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzMi4xMTEuMSIsInVwZGF0ZWRJblZlciI6IjMyLjExMi4wIn0=-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1461
Reviewed-by: Epsilon_02 <epsilon_02@noreply.codeberg.org>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESM flag breaks NodeJS child process fork
2 participants