Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to sourceType:script when parsing files. #7501

Closed

Conversation

loganfsmyth
Copy link
Member

Q                       A
Fixed Issues? Fixes #6242
Patch: Bug Fix?
Major: Breaking Change? Y
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR
Any Dependency Changes?
License MIT

Alright, let's have this discussion since #6242 never really took off, and most of my TODOs in there have landed.

I am of the opinion that Babel 7 should parse .js files using the script parse goal, and .mjs files with the module parse goal. In Babel 6, and 7 up to present, the default has been module. I also want to make clear that you can still easily set sourceType:module in your Babel config to keep the current behavior, but the important question here is the default, because it influences many other parts of Babel's transformations.

There are a few primary issues surrounding using module as the default:

  • Many perfectly valid Script files will fail to parse. This means that users attempting to run Babel on things in node_modules for instance will fail to parse.
  • Defaulting to module means Babel converts top-level this to undefined and injects use strict, which can break when run against arbitrary files
  • When performing transformations, Babel needs to know whether to inject import or require, and it does so based on the sourceType. Defaulting to module means we may inject import into valid CommonJS modules, breaking them.
  • When writing files to disk with babel-cli we'd ideally add .mjs to files that are modules, and .js to scripts. If we default to module then we'd potentially output .mjs extensions on CommonJS files.

These issues become more problematic when you consider that people are more frequently running Babel across arbitrary node_modules dependencies. Realistically this pattern is only going to increase in frequency for forseeable future.

Node now follows this same pattern, with .js meaning CommonJS and .mjs meaning ESM. There is potential for a field to land in package.json that would allow toggling this behavior, and once that lands we could for instance consider a sourceType: packageMode option or something, but even with that the default in Node will remain .js for CommonJS.

Given all of this, it seems like it would be in the best interest of Babel to follow that same pattern, by requiring users to either use .mjs or opt into specific ESM behavior in their Babel configuration.

@loganfsmyth loganfsmyth added PR: Spec Compliance 👓 A type of pull request used for our changelog categories PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release labels Mar 5, 2018
@@ -39,6 +39,7 @@ program.option(
"List of extensions to hook into [.es6,.js,.es,.jsx,.mjs]",
collect,
);
program.option("--source-type [script|module|unambiguous]", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be made to assume "module" for .mjs files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.mjs files are hard-coded to module currently, so this would only effect other types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense.

@loganfsmyth
Copy link
Member Author

Also curious to hear from @gaearon if you have thoughts. This would certainly affect CRA though realistically you may already just want to set sourceType: unambiguous since that is what Webpack does.

@hulkish
Copy link
Contributor

hulkish commented Mar 5, 2018

If this lands as currently proposed, I suggest reporting a warning for when sourceType is not defined in .babelrc.

Or, we could just keep scriptType: module as default and warn for the opposite when .mjs extension is detected? This would only apply to babel-cli, i suppose (not quite as helpful).

@loganfsmyth
Copy link
Member Author

Oh that's something I can clarify. The error right now is quite helpful. If Babel is parsing as script and hits an import or export, the message is:

SyntaxError: the-file-path.js: 'import' and 'export' may appear only with 'sourceType: "module"'
Consider renaming the file to '.mjs', or setting sourceType:module or sourceType:unambiguous in your Babel config for this file.

Which I think is helps. We can certainly expand that too if we find it is needed.

@babel-bot
Copy link
Collaborator

babel-bot commented Mar 5, 2018

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/7119/

@loganfsmyth
Copy link
Member Author

@jdalton I know this is something you have thoughts on. Where do you stand on this these days?

@Jessidhia
Copy link
Member

How are TypeScript files going to be handled?

I assume it's safe to hardcode those to modules too.

@loganfsmyth
Copy link
Member Author

@Kovensky Fair, I'd be fine with hard-coding .ts and .tsx to be sourceType:module too. It would be a question of if that would ever need overriding, and I hon't know the answer to that.

@Jessidhia
Copy link
Member

It would need overriding if babel supports import Identifier = and export = syntax. The only way for an --isolatedModules typescript file to not be a module is if they only have assign-style import/exports.

@loganfsmyth
Copy link
Member Author

@Kovensky How does TS itself differentiate these cases? I assume there is a flag in their config file?

@Jessidhia
Copy link
Member

TS specifies itself as unambiguous grammar. --isolatedModules (which is a prerequisite of the babel transform) makes it so that no global scripts are allowed; every file must be either a commonjs (with import Identifier = or export = syntax) or ES-like module, but they're still differentiated based on whether ES import or export are present 😞

@jdalton
Copy link
Member

jdalton commented Mar 6, 2018

\cc @weswigham @DanielRosenwasser

@loganfsmyth
Copy link
Member Author

Maybe we should consider having Babylon just ignore the sourceType open when the typescript Babylon plugin is enabled, and have it handle the unambiguous logic itself? Then we could make the necessary changes to babel-core to allow Babel's typescript plugin to automatically only activate itself when the file extension is .ts/.tsx? We've talked about that type of toggling already so that you could safely enable the Flow and Typescript in the same project without them conflicting or requiring the users to use overrides themselves.

@weswigham
Copy link

weswigham commented Mar 6, 2018

In the browser, .js is king (and the default), and in node, tons of effort is (actively) being poured into making .js as close to the default as possible (mode flags, loaders, etc - and still not finalized). I believe it's premature to call .mjs acceptable as the default mode of operation for any transpiler (I know we're holding off on it at typescript because 1. we really dislike .mjs since it'll cause our resolver complexity to bloom considerably and seriously damage the ecosystem (many definitions assume babel-style transparent cjs interop), and 2. there's the possibility that its usage ends up really low and isn't worth supporting if it's not implemented well, given the large existing cjs ecosystem) prior to any implementation that actually uses .mjs shipping without an experimental flag. Besides, in the unlikely event that someone actually loads things they didn't write (ie, from node_modules) and had problems loading some files, they could override the default parse goal at that point, no? Why would you want the default mode to be the old unfancy cjs-y mode? I don't think I've ever seen or used a babel-using project that just transplied things inside commonjs modules or script files, myself. Babel's tagline used to be along the lines of "The JavaScript of tomorrow, today!", no? Is ESM no longer an accepted part of "tomorrow"?

Y'all should be tracking the node modules WG if you're not already, and use the progress you see there help inform where you invest your time right now, but at least on our end we're planning on sitting on it until node's plan is more complete. Like, if "mode": "esm" merges and stays in, it'd be silly not to continue to consider js as a module by default, since both old users and most new users would expect it.

@ljharb
Copy link
Member

ljharb commented Mar 6, 2018

No mater what ships in node, CJS will remain the default, and babel will absolutely need to support mjs when it ships unflagged in node.

I often use babel without using ESM; converting syntax is nice but having to remember not to pollute the ecosystem with a .default is less so.

@weswigham
Copy link

@ljharb What about all the browser people who just wanna transpile their async/await down to normal esm modules for browser use by default? Node's default isn't the browser's default. Your default isn't theirs.

@bmeck
Copy link
Contributor

bmeck commented Mar 6, 2018

@weswigham even with "mode": "esm" the default is CJS and .mjs would be supported. I'm not sure I understand your statement that .js should be a module by default in that situation. babel would still need to read the package.json field to find the mode which would be CJS by default and still support .mjs.

@ljharb
Copy link
Member

ljharb commented Mar 6, 2018

@weswigham the browser’s default is Script, unless you specify type=module. I agree with you - Babel’s default should match the browser’s.

@weswigham
Copy link

weswigham commented Mar 6, 2018

@ljharb And that file is a .js file.

I'm saying that without knowing if the code is targeted for node or a browser, you can't actually know if an arbitrary .js file is intended as a module or a script without inspecting it and/or making assumptions.

Changing the default behavior here just changes the default platform assumption.

IMO, the current default is usable and it is configurable so that a given audience can achieve their desired behavior, anyway (while providing a configuration that works out of the box for most people); so why would you sacrifice that usability? Like, webpack's last release's push was for zero-config starts for webdev projects; this pushes babel in the opposite direction for web projects.

@TheLarkInn
Copy link
Contributor

I think the big question for me is what we do actually gain by not doing this? By doing nothing, we're actively encouraging the community to continue with no particular choice of side-channel for the time being, which seems harmful to the future of the community moreso than suggesting .mjs for now.

I couldn't agree more here. This is the exact same reason webpack supports .mjs as a "module type".

One thing I'd like clarified: does this mean that babel will not transpile module statements and leave as is if this PR lands? (Unless otherwise configured to do so). This in combination with the million who are using webpack will see some incredibly beneficial size reductions in their linked bundles because they are explicitly opting (in one way or another) to using ESM (and it will not be down converted to cjs)

@appsforartists
Copy link

My hunch is that people just don't like mjs, and they're afraid that Babel is popular enough that its decisions will add legitimacy to a solution that they don't want to see adopted at all.

I looked at the Unambiguous Grammar spec, and while I laud its aims, I also understand why browser vendors may be unmotivated to push it forward. If you're importing a script in the browser, you're either using <script type="module"> or import {} from "";, so the browser is never ambiguous. I also expect they'd be hesitant to add anything to the critical path for parsing/execution.

That said, node and Babel could certainly follow Unambiguous Grammar spec without waiting for browsers to endorse it.

@ljharb
Copy link
Member

ljharb commented Mar 6, 2018

@TheLarkInn this PR only deals with sourceType; transforming import/export is a separate issue (albeit, one that only applies when the sourceType is module).

@appsforartists if node and babel are doing things "without waiting for browsers", then any argument that relates to browser interop imo vanishes; I don't think that'd be a good idea for any of the "camps" here.

@appsforartists
Copy link

@ljharb Unambiguous Grammar feels orthogonal to browsers. Imports are never ambiguous for browsers in the first place.

@loganfsmyth
Copy link
Member Author

@TheLarkInn

One thing I'd like clarified: does this mean that babel will not transpile module statements and leave as is if this PR lands? (Unless otherwise configured to do so).

I hadn't thought about it, but that does seem like a conversation we should have pre-RC. I'd probably be open to that, since it's always easier to add a plugin than remove it in Babel. Maybe continue that discussion in another issue?

@loganfsmyth
Copy link
Member Author

@appsforartists

My hunch is that people just don't like mjs, and they're afraid that Babel is popular enough that its decisions will add legitimacy to a solution that they don't want to see adopted at all.

It does seem that way, and I do understand the concern there because they are right that people associate very strongly with .js. Ultimately, I have a hard time seeing any way for the community as a whole to adopt a side-channel that will satisfy every usecase that isn't either in the filename, or in the file content. There are so many competing groups trying to decide what is best for their specific case.

To me, a file extension is the least offensive way to handle that without typing the solution to one specific ecosystem. I can easily see individual communities adopting other options, or simply assuming that code is ESM, and that may well be the right choice for that specific community. Babel cannot hope to have a default that keeps every one of those happy with their individual ecosystem choices, so we're essentially left with the choice of choosing the least offensive option (in my opinion .mjs), or letting our users continue to get random errors and have a painful experience in some cases to avoid making a decision.

That said, node and Babel could certainly follow Unambiguous Grammar spec

Unless there's real indication of Node pursuing that approach and it having a real standard, I don't think I'd be comfortable to have Babel default to that.

@Jessidhia
Copy link
Member

Jessidhia commented Mar 7, 2018

Whatever side channel the working group decides on, it will strictly be a way for opting in to modules instead of opting out. Opting out is guaranteed to break a ton of real world code.

If the default is script, we can just add support to whatever side-channels are decided on (such as .mjs, even if it eventually turns out to not be the recommended one) as they are decided on by the WG, in any future minor update to babel-core, without compatibility risk.

If we keep module as the default, we'll just keep on silently doing the wrong thing for the vast majority of 3rd party code, and then when the WG gets their ducks in order we're going to have to make another breaking change to implement their algorithm.

As for changing the default, a lot of code happens to run fine in both strict and sloppy mode, so you won't notice anything if all that babel ends up doing is writing a "use strict".

But combine that with transform-runtime, preset-env with useBuiltIns: "usage", or any plugin that needs to add an import to a file, and you easily create invalid "mixed" modules, that have require, static imports and exports/module.exports assignment all in the same file. If that output were to then be treated as a module (which it would in unambiguous grammar too, it has a static import), you then get a ReferenceError, but only at runtime. Babel will silently take your valid script (even strict mode script) and write an invalid module.

If we know the input is not a module, then the aforementioned plugins know to write commonjs requires instead (well, env will once it uses the correct helper module). If we think the input is not a module but it has module syntax, it's easy to raise an error telling you what to do in your .babelrc. If we think the input is a module but it's not, we corrupt your script and there's nothing you can do to detect other than seeing if it crashes at runtime.

@weswigham
Copy link

weswigham commented Mar 7, 2018

Can anyone here say that they realistically expect .mjs to be entirely unsupported?

I think the conceit is that everyone knows .mjs is likely to be available, but that it's also likely that a another solution with some set of tradeoffs is also implemented, since on it's own it's very not-friendly-to-use alongside the existing cjs package ecosystem 😄

The core of the issue is that changing babel's default to be script for any js is that it insists that, by default, only .mjs files can be modules. This is false for a browser environment, and very likely to be generally false in node, in one way or another. That's a really bad understanding of the world, by default; doing a little bit more would allow you to be way less naggy in this space. But since y'all have no problem breaking all your consumers every 3 months, nbd; there's no brakes on this train, and I guess you just have to implement to whatever partially done work is there right now, since you can just break everyone again and redo some work later anyway.

TBQH, at TS since we power editor services for a bunch of editors, we'll probably continue to be using an unambiguous-style heuristic for almost forever because it generally works, with an opt-in to whatever node's final behavior ends up as it'd be crazy for an editor to ask "hey, I'm not sure how to handle this loose .js file that's clearly a module, you wanna tell me?" before providing useful tools for a given file (and there's no way we'd break everyone using all the editors we power, because that'd be extreme).

@Jessidhia
Copy link
Member

Jessidhia commented Mar 7, 2018

Editor services are a completely different world from compilers. Editor services will only be of use on files that a user actually wants to look at and edit, have to be able to work with incomplete code, and even if the user opens a file that the editor services guesses wrong, the worst thing that can happen is bad syntax highlighting and code completion.

Babel is a compiler. If we guess wrong, we write broken code. Even if TSC guesses wrong, it can throw an error under --isolatedModules that your code is not actually unambiguous enough to tell, and that's something that TS can afford to do as it's a language based on JS, it's not JS.

TSC is also not used to compile third party code, but the ecosystem is moving in such a way that babel, with preset-env, will be used to compile all third party code as well. Third party code that, in a lot of cases, has already been transformed from a module into a script. Third party code that we can correctly parse as module, because its original source code was a module, and so it doesn't have sloppy mode syntax. Third party code that we will break by then emitting extra module-related garbage, because we think it's a module.

@weswigham
Copy link

Editor services are a completely different world from compilers.

Many modern compilers are written as services, as service behavior is a superset of the analysis a compiler needs to perform (and is quite useful to have). A compiler pretty much falls out for free from a language service, while building a service on top of a compiler is really difficult.

it's not JS

And I'm not talking about typescript. I'm talking about JS. I'm talking about when vscode opens a .js file what's going to happen. What's even reasonable. Just because TS is marketed as a seperate language doesn't mean that's all its' tools handle, ffs - the TS service does the whole intellisense gig for all kinds of vanilla JS (and TS and JS with closure-compiler type annotations because we like to be a magical as possible). And I'm saying this will never be the default mode of operation an editor takes when handling JS because it'd just be silly to ask a user for so much before they even start reading code.

TS is also not used to compile third party code

It is. All the time. Especially within editors. We probably consume more of it than almost anything else. That's why a decent heuristic is so important because there's so much varied code out there. And way more old code than new code. So much old code. Nobody updates old code unless they're forced to, really.

Third party code that we can correctly parse as module, because its original source code was a module, and so it doesn't have sloppy mode syntax. Third party code that we will break by then emitting extra module-related garbage.

Do you have real examples? Because TBQH most of the ones I can think of are caught at parse time inside babel/ts, such as with or block-scoped function declarations. Top-level this being loose by default instead of strict by default is worse, since a loose mode this leaks environment information that was supposed to be removed.

@Jessidhia
Copy link
Member

Jessidhia commented Mar 7, 2018

// src/index.mjs
import 'react' // or any commonjs module without a modules build that uses ES6+ stdlib
// webpack 4 config
module.exports = {
  mode: 'development',
  // note: does not exclude node_modules
  modules: { rules: [
    { test: /\.m?js$/, exclude: /core-js/, loader: 'babel-loader' }
  ] }
}
{
  "presets": [[ "@babel/env", { "useBuiltIns": "usage", "modules": false } ]]
}

This will write import for a bunch of core-js dependencies on top of all your CJS dependencies, which will cause webpack to either fail to parse your script, or to incorrectly parse it as a module with no exports.

webpack:///./node_modules/object-assign/index.js?:82
module.exports = shouldUseNative() ? Object.assign : function (target, source) {
               ^

TypeError: Cannot assign to read only property 'exports' of object '#<Object>'

(module.exports is readonly in webpack because it still lets you look at everything else in module; in actual modules it'd throw a ReferenceError)

Note, there is actually a bug with preset-env in that it does not respect the sourceType (#7457), so even setting sourceType to "script" won't help here; but this is exactly the behavior it would have with our current behavior of sourceType defaulting to "module" anyway.

@weswigham
Copy link

weswigham commented Mar 7, 2018

@Kovensky modules: false effectively means input module kind should equal output module kind, no? Since it's disabling the transform, not asserting that you have no modules? - it's then up to webpack to determine the correct module kind to work with for each file in the build, no?

@Jessidhia
Copy link
Member

Jessidhia commented Mar 7, 2018

@weswigham the problem here is not our modules transform (which would, at worst, just take sloppy code and accidentally make it strict); the problem is when the modules transform is disabled (which it should! you should never use transform-modules-commonjs with webpack).

If the modules transform is disabled, then the output module kind should equal the input module kind. The input module kind is "module", because that's our current default. So we would write a module in the output, with import 'core-js/modules/es6.foo.bar' for the useBuiltIns polyfills, even after that bug I mentioned is fixed.


IIRC there were talks of changing the default of modules to false as well, but I don't remember what the conclusion was. I assume it was postponed until we know what to do when writing output for node.

@weswigham
Copy link

weswigham commented Mar 7, 2018

The input module kind is "module", because that's our current default.

Y'all should default to something like unambiguous and not script, then, cause you're not doing a great job of preserving the intent of the code as written by passing it on while assuming it's one or the other... You could get the exact same kind of issues but with cjs stuff instead of module stuff if you just swap the default...

@loganfsmyth
Copy link
Member Author

loganfsmyth commented Mar 7, 2018

@weswigham

You could get the exact same kind of issues but with cjs stuff instead of module stuff if you just swap the default...

The fact that it's not exactly the same is an important factor for me.

  • If we default to module and it was actually cjs, we've injected an import and you'll get a runtime error in your code with no clear reason why it has happened.
  • If we default to script and it was actually supposed to be a module, 99% of the time the file will have an import or export statement that causes parsing to fail up front, and users will get a super clear message saying "hey it looks like this is a module, is that right? If so please set .js files to be modules in your config, or rename it to be .mjs.

Defaulting to script means we can give super useful feedback to our users right up front. Defaulting to module makes that much harder.

Y'all should default to something like unambiguous and not script, then, cause you're not doing a great job of preserving the intent of the code as written by passing it on while assuming it's one or the other...

But unambiguous is in no way standards track and attempts to make it so failed. It's also trivially easy to either opt into it if you want, and if it does eventually become part of the spec it is easy to make it the default instead of script without breaking anyone's setup.

My concern with unambiguous, at this point in time, is that it encourages people to publish code that relies on unambiguous to be present in all tooling that processes those files, not just now, but for the existence of those module's lifetimes. Given that there is no current plans for standardization on it, this seems like a massive problem to me. If you're random tooling that parses a file for metadata where the specifics of module vs script don't matter as much, all the more power to you, unambiguous makes sense.

If we're trying to build a package repository like the npm registry, it makes zero sense for people to publish ESM as .js because the answer is that we just haven't settled as a community on an answer yet. If we encourage people now to publish and assume .js as unambiguous is a set thing, we're still setting a mandate for the community, but we're setting one that is way more likely to change than the existence of .mjs.

  • If we encourage .mjs for now and next year Node also handles something, it's an easy change to also support that.
  • If we encourage .js as ESM now, and next year Node settles on something without unambiguous, we've now got a registry full of packages that no longer work right. More importantly, we've gone an additional year training users to expect ESM to work in .js files automatically, and it'll be that much harder to transition them to another mental model.

.mjs is the safest option for ESM until the community settles on a common solution broadly.

@appsforartists
Copy link

appsforartists commented Mar 7, 2018

If we encourage .js as ESM now, and next year Node settles on something without unambiguous, we've now got a registry full of packages that no longer work right.

Aren't we already a few years down this path, with authors publishing ESM as .js?

@novemberborn
Copy link
Contributor

@loganfsmyth

(though I know you already have how Babel searches for .babelrc, so I bet you also don't love the idea of a flag in package.json?)

Apologies everyone for the inside baseball — @loganfsmyth I'm sure this'll be fine.

Currently transform-modules-commonjs bails entirely if the file has not sourceType:module, so by defaulting to script it should be easier to run CommonJS code through Babel.

👍

We don't currently error if we detect module and exports inside an ESM module, but that is something I'd very much like to do, no matter the decision here.

If you could loop me in at that stage that'd be appreciated. I'll be advocating for an option to disable that behavior 😄

@loganfsmyth
Copy link
Member Author

loganfsmyth commented Mar 7, 2018

@appsforartists

Aren't we already a few years down this path, with authors publishing ESM as .js?

Some have, but to me this point seems like the sunk cost fallacy. We're talking about the an unspecificed amount of time into the future where we'll need to maintain this ecosystem. I don't think locking the ecosystem to unambiguous based on reliance of undefined behavior at the beginning makes the decision the right one for the future of the community. And again, if it turns out .mjs means module and .js means unambiguous in the future, we've lost nothing, whereas if unambiguous is not adopted, we've now spent even more time creating an incompatible ecosystem.

@novemberborn

If you could loop me in at that stage that'd be appreciated. I'll be advocating for an option to disable that behavior

Fair. Let's continue that discussion on Slack so as not to diverge this thread.

@loganfsmyth
Copy link
Member Author

Hey all, we've talked about this a bunch both here and in Babel's chat. I really appreciate all the discussions we've had.

We've settled on postponing this change for the 7.x release because it will be such a large breaking change. The behavior will essentially be remaining like it is now, which is roughly how it was in Babel 6, assuming everything is a module and hoping for the best, and relying on users to configure Babel if things go wrong with that. It's important to not discard the good will of the community itself, especially given that we've been in beta for quite a while and people are unlikely to be expecting such large changes at this phase.

I'd still like to seriously consider this as an early change for the next major release of Babel, which would allow us time to test it out and see how it feels to people, but we'll cross that bridge when we come to it.

@loganfsmyth loganfsmyth deleted the default-sourcetype-script branch May 23, 2018 06:18
@lock lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release PR: Spec Compliance 👓 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[8.x] Gain consensus on path forward for "sourceType" for parsing modules vs scripts