Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite to make package scope traversing clearer #3

Merged
merged 4 commits into from
Nov 26, 2018
Merged

Conversation

GeoffreyBooth
Copy link
Owner

@GeoffreyBooth GeoffreyBooth commented Nov 22, 2018

So I started rewriting the proposal to make clear that the searching up the path for the nearest package.json applied to not just absolute URL specifiers but also to relative specifiers and deep imports, and it was getting very repetitive so I ditched the previous structure. The proposal got a lot shorter as a result, which I assume must be a good thing. I don’t think I actually changed anything that was being proposed, just that the text is now much better organized (I hope); @jkrems and @guybedford please let me know if any of my rewriting screwed anything up.

@bmeck and @robpalme I added some language about exceptions being thrown for “double import” of files, please feel free to expand or rewrite as you see fit.

Formatted version for easy reading: https://github.com/GeoffreyBooth/node-import-file-specifier-resolution-proposal/blob/rewrite/README.md (basically just reread the entire “Proposal” section).

README.md Outdated Show resolved Hide resolved
README.md Outdated
import S3 from 'aws-sdk/clients/s3.js';
// aws-sdk has a top-level folder named clients containing a CommonJS file named s3.js
```
There is the possibility of `import` and `createRequireFromPath` both importing the same file into the same package scope, potentially the former as ESM and the latter as CommonJS. Node needs to throw an exception in such situations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example - do we throw on the first ES module import to a file that was imported through commonjs require, or do we throw on the first require to something that is an ES module? But then doesn't that stop dual mode packages from working? I don't get this logic.

Can we make it less assertive until this is clear at least.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what @robpalme was bringing up in #1. I don’t have an answer.

I can revise this to be more like “this is something we will need to work out” for now.

Else
Load the file as CommonJS.
Else we reach the file system root without finding a package.json
Load the file as ESM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you clearly state: There are (at least) two parts to module resolution: location and interpretation., which I completely agree with.

Then in this spec you conflate the two.

Perhaps lets maintain that separation in the specs entirely. While the package exports proposal does need to check the package boundary / format, that could still be a separate process (and a shared cache in implementations).

So I was thinking of something more like:

**GET_MODULE_FORMAT(url)**:
1. Let _pjson_ be READ_PACKAGE_BOUNDARY(url)
1. If _pjson_ is _undefined_ or has an _exports_ or _mode_ then,
  1. Return "esm"
1. Return "cjs"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to update the resolution spec to cover all this, assuming you like this proposal.

Where am I conflating the two? I was trying to avoid discussing location other than when necessary, like to explain the concept of package scopes. The stuff about entry points and dual-mode packages should arguably go in the package exports proposal, or we could merge the two.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guybedford can you clarify if any the changes I would want to make here please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth I guess my point is that the short spec I wrote above is exactly equivalent to what you have written.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you think including it here is relevant and if so does it replace something else?

cc/ @GeoffreyBooth

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guybedford So re locating vs interpreting, I'm skipping the locating of the file to be imported, e.g. foo.js, but I'm describing the search for package.json and therefore the package scope. I think the latter is important to include here, as it's a new concept and really the central concept to this proposal. I think it should be spelled out in more detail and more plain language than the spec style you wrote above, to make sure people understand; but the spec snippet you wrote here should certainly go in the resolution spec document, or we could include it here in addition to the plain language psuedocode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I didn't reread the spec properly. What you have now is effectively the READ_PACKAGE_BOUNDARY function I was pointing to there. Looks good.

Copy link
Owner Author

@GeoffreyBooth GeoffreyBooth Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, whew 😀 I'm also thinking that the term package scope makes more sense than package boundary. If you think of block or function scopes in JavaScript, that boundary would be the {, which is rarely what we want to be talking about; we're usually discussing the scope inside the boundary. I think the same applies here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth okay, this makes more sense now. But let me make sure I got it:

  1. in scope Given a non-absolute specifier, convert to absolute URL.
  2. in scope Starting from the parent folder for the absolute URL of the imported module:
    a. Request to check for and load package.json from current folder. (actual loading out of scope)
    b. if found and parsed, return the parsed data and the path to the package.json.
    c. if found and not loaded, return with error details for the invalid package.json
    d. If not found:
    1. If no parent folder exists, return with "no package scope" (not sure)
    2. Else move to parent folder and continue from 2a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth if you have time, I want to make sure I got it right before I make a mess 😄

README.md Outdated

#### `import` statements of CommonJS files within CommonJS packages (“deep imports”)
The ESM and CommonJS versions of a dual-mode package are really two distinct packages, and would be treated as such by Node if both were imported. Unless there is a legitimate use case for such behavior, though, we expect Node would throw an exception if a user tries to import the same package in both modes into the same package scope. Different modes of the same package _should_ be importable if a package boundary separates them, for example if a user’s project imports ESM `lodash` and that project has a dependency which itself imports CommonJS `lodash`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation that the parent is now affecting access control to modules? This is a new concept...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rationalise the dual identities by saying that package.json forwards to (two, or more) independent files/modules, i.e. package identity doesn't exist, module identity is what matters.

The corrolary of this is that, for dual-mode packages, we must prevent ESM entrypoints colliding with the main/CJS entrypoint, as that would result in a single identity for the package.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rationalise the dual identities by saying that package.json forwards to (two, or more) independent files/modules, i.e. package identity doesn’t exist, module identity is what matters.

By “module” here, do you mean a package (a folder with source files and a package.json) or a single file, like a .js file that exports something? I’m not really following this sentence.

I look at it as a “dual-mode” package really being two packages, based on its two entry points. This is no different really than the lots of packages today that have multiple entry points, like the "browser" key that points to a browser build. They’re separate “packages” that happen to overlap in the same folder with the same package.json.

for dual-mode packages, we must prevent ESM entrypoints colliding with the main/CJS entrypoint, as that would result in a single identity for the package.

Say you did create a package.json with "main": "index.js" and "exports": "index.js", following the package exports proposal. If index.js contained any import or export statements, requireing the package from CommonJS would fail; likewise if index.js contained any references to the CommonJS globals like require or module.exports. I suppose it could guard those references, like if (exports) exports = ..., so that it wouldn’t throw in ESM mode but would actually export something in CommonJS mode. But really at this point it’s people trying to game the system, and since Node already lets them override the module loading machinery I’m not sure it’s worth it to try to hard to stomp out such things. We could stop it by throwing an error if the ESM and CJS entry points both point to the same file, if we want to be very safe.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation that the parent is now affecting access control to modules? This is a new concept…

How so? What does “parent” mean here?

All I meant by this is that say I have an app like this:

~/app/package.json
~/app/app.js
~/app/node_modules/lodash (dual-mode ESM/CJS package)
~/app/node_modules/request/package.json
~/app/node_modules/request/index.js
~/app/node_modules/request/node_modules/lodash (dual-mode ESM/CJS package)

In app.js, my ESM app has import 'lodash'. And then in request/index.js, Request has require('lodash'). In the context of my app, I just pulled in the ESM lodash; while Request gets the CJS lodash. The two imports are in separate package scopes, so the two packages get different lodashes.

Or say it’s current Node and everything is CJS, but the same folders and files as above. ~/app/package.json includes "lodash": "2.0.0" while request/package.json includes "lodash": "1.0.0". My app would get Lodash 2.0.0, while Request would get Lodash 1.0.0. We already have this concept of separate dependency versions in separate package scopes; we would just be extending it from version numbers to include parse goals (ESM or CJS).

And just as my app couldn’t require both Lodash 1.0.0 and 2.0.0, it shouldn’t be able to import/require both the ESM and CJS versions of Lodash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just as my app couldn’t require both Lodash 1.0.0 and 2.0.0, it shouldn’t be able to import/require both the ESM and CJS versions of Lodash.

Your app can require both Lodash 1.0.0 and Lodash 2.0.0 if they are installed locally under different aliases.

The problem with this one is that its a race as to which one will get the valid package and which one will error, which seems brittle, error-prone and hard to debug. I don't know that there is a way to tackle this without it being a race, unless we entirely disable one or the other (which was my comment).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how a race could happen? We were discussing this in #1. Wouldn't Node resolve all the import statements before evaluating any code, and therefore if the code contains any requires they'll always be evaluated second?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you can have dynamic import. You can also run CommonJS first via -r or other process hooks like in loaders.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the solution? Just accept that race conditions could happen if the user is determined enough? If they want to avoid them, they should be careful not to mix the esm and cjs versions of the same package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that we should at least not enforce a non-deterministic solution here.

Personally I would be fine with import getting the ESM version and require getting the CJS version, I think that would align with most user intuitions for dual mode packages. If we really don't want this then the solution is simple - don't have dual mode packages at all!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shift the discussion of “what should we do in this case” to #1? And keep this focused on what the language in this PR should say. For now, how about I change the language to something like “it remains to be decided what Node should do in this case.”

README.md Outdated

An ESM JavaScript file importing a CommonJS package, where the entire import specifier is the CommonJS package name, uses the specifier resolution algorithm that CommonJS uses now to resolve the main entry point of the package. For example, where `underscore` and `pad` are both CommonJS packages:
These are currently unsupported but reserved for future use. Browsers support specifiers like `'/app.js'` to be relative to the base URL of the page, whereas in CommonJS a specifier starting with `/` refers to the root of the file system. We would like to find a solution that conforms Node closer to browsers for `/`-leading specifiers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we now have the concept of a package boundary, I'd suggest defining "/" to mean package root. It would throw if used by a file not contained in any package.

We use this definition in the module loader I maintain. It is a popular feature. The motivating use-case is to avoid hard-to-read prefixes, e.g. "../../../.."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be weary of defining "/" as the package root as it is not a clearly browser compatible pattern at all as there is no way browsers could ever implement this.

I agree this is a worthy use case, but it should be done at least with en eye to browser compatibility.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robpalme Mapping / to package root was the original draft of this proposal (you can look in the git history) but we removed it because we weren’t sure of the implications of that for browser compatibility, hence the current language.

I think there’s pretty much no reason to map / to file system root, as that would generally only work on the author’s system; and if you still really wanted it, you could still achieve it using a file:/// URL. So the question would be how to provide a mapping to package root, whether via / or ~ or something else, that’s compatible (or at least not a conflict) with browsers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add something to the effect of that:

While this proposal does not offer a solution to the common challenges where modules within a package may want to refer to modules relative to the root of the package, specifically through a special prefix like / or ~... etc. It should be noted that the proposal intends to retain the resolution behaviours that current make it possible for modules to use their own package name as a bare specifier to achieve similarly suited outcomes.

If this seems to fit, then the question is, is it correct 😊

Copy link
Collaborator

@SMotaal SMotaal Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had import: and the idea that packages are domains means that import:/… is the same as import://this-package/ (ie package root), and that import:that-package/ is actually import://that-package/. However, assuming we want import: scheme default to // as the prefix of a bare pathname then import:/ and import:this-package/ are both shorthands for import://this-package/ just like import:that-package is for import://that-package/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, there are many ways to achieve this one!

The package exports proposal could even be extended to support it - exports: { "mypkg": "./" }.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, and I think that people looking for that convenience to connotational are mostly those who have relied on tooling that handled that while transpiling their source (or special/hybrid platforms) so it is really outside the scope of this proposal.

That said, do you think it at all justified to at least try hinting at this it is with the a statement along the lines of my suggestion above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, it's nice to mention this one and that we are compatible with various approaches here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just note that / already has significant meaning to URLs in the file: scheme. It goes to the file: URL's base namespace which could be a host:

> new URL('/', 'file:///app').href
'file:///'
> new URL('//', 'file:///app').href
'file:///'
> new URL('/', 'file://laptop/app').href
'file://laptop/'
> new URL('//', 'file://desktop/app').href
'file://desktop/'

Per relevant tests you can see https://github.com/web-platform-tests/wpt/blob/4ba3cd24c69a859d8c7864d15958d2656ef16173/url/resources/urltestdata.json#L5539

Windows drive letters have some special behavior and drop the host though due to what appears to be legacy issues.

README.md Outdated

#### `import` statements of CommonJS files within CommonJS packages (“deep imports”)
The ESM and CommonJS versions of a dual-mode package are really two distinct packages, and would be treated as such by Node if both were imported. Unless there is a legitimate use case for such behavior, though, we expect Node would throw an exception if a user tries to import the same package in both modes into the same package scope. Different modes of the same package _should_ be importable if a package boundary separates them, for example if a user’s project imports ESM `lodash` and that project has a dependency which itself imports CommonJS `lodash`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rationalise the dual identities by saying that package.json forwards to (two, or more) independent files/modules, i.e. package identity doesn't exist, module identity is what matters.

The corrolary of this is that, for dual-mode packages, we must prevent ESM entrypoints colliding with the main/CJS entrypoint, as that would result in a single identity for the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants