Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that Identifier source mappings explicitly start and stop on the generated range #8380

Merged
merged 2 commits into from Jul 27, 2018

Conversation

Projects
None yet
4 participants
@loganfsmyth
Copy link
Member

commented Jul 24, 2018

Q                       A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change? N, assuming nothing was relying on exact semantics of Babel's output
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

Babel's current source map generation logic is a little handwavy in how it generates mappings. Essentially what it does is:

  • Any time a piece of code is output, it looks at the current "active" original location, and links the output code to that original-code location
  • As the generator traverses, whenever a new AST node is entered, it sets the node's start location as "active", if it has a location
  • Whenever the generator leaves a node, it sets the location back to whatever it was before.

This has served us for a while, but it has some downsides:

  • Any mid-node tokens use the start location as their location, because our AST doesn't actually know where random commas were in the original user code.
  • Setting the location back to the previous location on traversing out of a node can lead to unintuitive mappings in some cases, one of which this PR aims to resolve.

In a perfect world, the printers would assign exact locations, or make explicit best-effort attempts to pick a good location, but currently that isn't what we do. While I'm totally for exploring that, it'd be a massive change and it's always hard to tell what other things would break in tooling that consumes Babel's output maps.

This PR attempts to address one specific case where having exact location information is a big improvement: Identifier nodes. Consider a simple case like this:

foo();

With the algorithm I described above, we get:

  1. Push column 0 as the start of ExpressionStatement
  2. Push column 0 as the start of CallExpression
  3. Push column 0 as the start of Identifier
  4. Print "foo"
  5. Pop out of Identifier, restore CallExpression 0 as position
  6. Print "("
  7. Print ")"
  8. Pop out of CallExpression, restore ExpressionStatement 0 as position

What this means is that there is only a single mapping for this whole line. That is a pain point for any tooling that wants to perform further transformations on the output that Babel produces. Prime examples of this are Rollup and Webpack (and other bundlers I'm sure), which may want to replace foo with ns.foo if the code had import foo from "foo"; for instance.

As mentioned, it would be great to have a better "unified theory of map generation", but this PR aims to address this one specific case.

This PR adds two special cases:

  1. The start of an identifier always gets its own mapping, even if the current active location is already the original location of the identifier. This primarily will affect cases where the identifier was prefixed with whitespace, but it is realistically dependent on the transforms being performed. You can see this in the screenshot below since the spaces are standalone.
  2. The end of an identifier is used as the next "active" location, overriding the standard "pop" behavior, meaning that the next token will not be linked to the original location of the identifier's first character. This makes the () in the screenshot below map better.

These can be seen in this comparison, with the right being the old mappings, and the left being the new ones:

screen shot 2018-07-24 at 4 19 41 pm

You can see that the various identifiers are now accurately mapped with their own specific range.

On the off chance you're wondering why we don't do this for all nodes, the main issue is that using the "end" location of a node in general isn't super effective once you have to take generic transformation into account. If an identifier gets moved, there's technically no guarantee that the end of the original location actually maps what's actually getting referenced in the original code. Consider

foo(); // 0

getting transformed into

var _tmp = foo; // 1
_tmp(); // 2

then this patch will make the ; in 1 map to the ( in 0. It's not great, but on the other hand we don't really get something useful either way.

I've essentially decided to special-case Identifier to reduce the risk of breaking things, while still accomplishing useful improvements.

Edit

Also, once this lands, I'd like to backport it to 6.x.

@babel-bot

This comment has been minimized.

Copy link
Collaborator

commented Jul 24, 2018

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/8699/

@loganfsmyth loganfsmyth changed the title Ensure that Identifier source mappings explicitly start and stop and on the generated range Ensure that Identifier source mappings explicitly start and stop on the generated range Jul 25, 2018

@hzoo

hzoo approved these changes Jul 26, 2018

Copy link
Member

left a comment

not that familiar with the area as usual but 👍

@babel babel deleted a comment from babel-bot Jul 27, 2018

@hzoo hzoo merged commit 5eb193c into babel:master Jul 27, 2018

4 checks passed

babel/repl REPL preview is available
Details
ci/circleci Your tests passed on CircleCI!
Details
codecov/project 80.43% (target 80%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@loganfsmyth loganfsmyth deleted the loganfsmyth:identifier-mapping-specific branch Aug 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.