Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve source map generation #759

Merged
merged 9 commits into from Apr 9, 2023

Conversation

forivall
Copy link
Contributor

@forivall forivall commented Nov 9, 2022

Fixes #541

Let me know what kind of refactoring will please you! I'd love to get this in to improve source-map debugging. I implemented it kinda weird to minimize theoretical perf impact when sourcemaps arent requested.

@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Merging #759 (d0862b6) into main (7e5b06a) will increase coverage by 0.00%.
The diff coverage is 89.65%.

@@           Coverage Diff           @@
##             main     #759   +/-   ##
=======================================
  Coverage   88.32%   88.33%           
=======================================
  Files          55       55           
  Lines        5955     6001   +46     
  Branches     1415     1422    +7     
=======================================
+ Hits         5260     5301   +41     
- Misses        430      432    +2     
- Partials      265      268    +3     
Impacted Files Coverage Δ
src/Options.ts 100.00% <ø> (ø)
src/computeSourceMap.ts 88.09% <87.17%> (-0.80%) ⬇️
src/transformers/RootTransformer.ts 93.39% <88.88%> (-0.30%) ⬇️
src/TokenProcessor.ts 95.38% <100.00%> (+0.22%) ⬆️
src/index.ts 92.15% <100.00%> (+0.15%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

Benchmark results

Before this PR: 405 thousand lines per second
After this PR: 403.3 thousand lines per second

Measured change: 0.4% slower (1.18% slower to 0.16% faster)
Summary: Likely no significant difference

this _shouldnt_ theoretically cause any issues, as any mappings
generated in the snapshot should be overwritten. if needed, we can clear
out the mappings generated after `this.tokenIndex` in `restoreToSnapshot`
@forivall
Copy link
Contributor Author

forivall commented Nov 9, 2022

On my machine, latest fix brings down the perf loss to

Measured change: 2.6% slower (2.74% slower to 0.94% slower)

Additional experimentation shows that if I omit the mappings from TokenProcessor's snapshot(), the perf loss is completely gone. Worth it?

Copy link
Owner

@alangpierce alangpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think this will also fix #667 . I still need to take a deeper look at the details, but overall this looks like a good approach.

My main hesitations with detailed source maps in the past have been around complexity and performance, so thanks for implementing it in a way that's relatively lightweight in TokenProcessor. In addition to the automated benchmark (which doesn't include source maps), would you mind doing a comparison of performance without source maps vs simple source maps vs detailed source maps? (You could modify the sucrase.transform args in benchmark.ts and then run yarn benchmark jest-dev with different configs.)

Additional experimentation shows that if I omit the mappings from TokenProcessor's snapshot(), the perf loss is completely gone. Worth it?

I think it would be good to avoid the .slice() somehow. As-is, it adds n^2 time for a file needing many snapshots. Two thoughts:

  • You could pass a flag into TokenProcessor to know if detailed source maps are enabled, and use that flag to disable any sourcemap-related work if it won't be used. Hopefully the JIT would recognize that the flag is always false and there would be very little overhead in that case.
  • I think just skipping .slice() is actually ok, since all use cases re-walk the same tokens after restore and would set any updated values. It's certainly a little fragile, but I think should be fine with an explanation comment. It's also relatively low-stakes.

Also, I'm curious what your use case is for column mappings (as opposed to just line mappings). I know there's #541 and #667, just curious if you had another use case in mind.

src/Options.ts Outdated
@@ -12,6 +12,7 @@ export interface SourceMapOptions {
* file.
*/
compiledFilename: string;
simple?: boolean;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a JSDoc explanation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I just wanted to get things working and feedback on naming, etc. before digging into the polish of full documentation 😄

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking now that the best approach for this PR is to actually not have a simple option, and instead just change the source map behavior to always compute detailed source maps. If there's demand for the simple mode, it can be added later, but I'd rather keep the options simple and avoid the need to maintain multiple modes. Could you remove this option and the implementation and tests for simple mode?

@forivall
Copy link
Contributor Author

In addition to the automated benchmark (which doesn't include source maps), would you mind doing a comparison of performance without source maps vs simple source maps vs detailed source maps?

Sure! I'll try get to that on Friday

I think just skipping .slice() is actually ok, since all use cases re-walk the same tokens after restore and would set any updated values. It's certainly a little fragile, but I think should be fine with an explanation comment. It's also relatively low-stakes.

Yeah, that's my thinking too, as I noted in the commit message (which i forgot to push up 😅 ):

this shouldnt theoretically cause any issues, as any mappings
generated in the snapshot should be overwritten. if needed, we can clear
out the mappings generated after this.tokenIndex in restoreToSnapshot


My main use case is to improve debugging - for example, the chrome debugger allows setting breakpoints on specific expressions in the middle of a statement. Not super high-stakes, but I was in a mood to look at the code to see if i could figure out a solution, and i found that it was relatively straightforward to add the necessary support code in TokenProcessor, and then I really wanted to see if it worked, and was pleased that it did!

@forivall
Copy link
Contributor Author

forivall commented Nov 21, 2022

Sorry for the delay. Here's the results from the benchmark script i based off of the existing benchmark script:

Speeds for No sourcemaps: 767899, 775036, 775274, 775891, 776099
Speeds for Full sourcemaps: 533671, 534539, 535301, 535712, 536317
Speeds for Simple sourcemaps: 700954, 703167, 703572, 703653, 703936

Benchmark results

No sourcemaps: 775.3 thousand lines per second
Full sourcemaps: 535.3 thousand lines per second
Simple sourcemaps: 703.6 thousand lines per second

Change

Full sourcemaps: 30.95% slower (30.96% slower to 31.03% slower)
Simple sourcemaps: 9.25% slower (9.31% slower to 9.27% slower)


So yeah, seems like full sourcemap generation is 3x slower than the existing simple sourcemap generation.

@alangpierce
Copy link
Owner

No problem on the delay, thanks for running the numbers!

The 30% slowdown does give me some hesitation, but I confirmed with some extra perf tests that there is minimal/no impact when source maps are disabled, so this change seems at least fine to release in an opt-in way.

I want to think a little more about the right approach from a configuration standpoint. I think there are two open questions here:

  • How many source map modes should there be? The current PR has three modes, "no source maps", "simple/fast source maps", and "full source maps". That gives the most flexibility, but I do also want Sucrase to be as simple as possible from a configuration standpoint, so I'd maybe lean toward getting rid of the simple mode entirely, especially if the perf difference is minor. 30% vs 10% is maybe not quite "minor", so the best approach doesn't seem obvious. My gut feel is that Sucrase is so fast that usually other tools/operations are the dominating performance issue, so most people would prefer the more correct source maps at the expense of some perf. (An even more aggressive approach would be to get rid of the "no source maps" mode and always generate a full source map, though that would require specifying a file path when calling directly.)
  • What is the default mode? The current PR makes "full source maps" the default mode, but the most defensive approach would be to keep "simple/fast source maps" as default. I think I agree with "full source maps" as the default, but want to call it out as another decision point.

I also want to take a closer look at the implementation and see if there's a way to speed it up and maybe also avoid the extra dependency. (In a perfect world, Sucrase would have zero dependencies, though a source map library feels pretty defensible.) Given that Sucrase still creates fairly predictable source maps and in most cases makes only minor changes to the code, there might be opportunity for optimization. Here's a profile I ran and at least ~half of the source map time is spent in the library, potentially more.

Screen Shot 2022-11-23 at 9 04 57 AM

That said, it's probably also reasonable to defer additional source map performance work and possibly speed it up later.

Copy link
Owner

@alangpierce alangpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few small things, but overall looks good! I tested and confirmed that this fixed chrome dev tools inline breakpoints and also fixes Jest's toMatchInlineSnapshot.

src/Options.ts Outdated
@@ -12,6 +12,7 @@ export interface SourceMapOptions {
* file.
*/
compiledFilename: string;
simple?: boolean;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking now that the best approach for this PR is to actually not have a simple option, and instead just change the source map behavior to always compute detailed source maps. If there's demand for the simple mode, it can be added later, but I'd rather keep the options simple and avoid the need to maintain multiple modes. Could you remove this option and the implementation and tests for simple mode?

export default class TokenProcessor {
private resultCode: string = "";
private resultMappings: Array<number | undefined> = new Array(this.tokens.length);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining the format of this array?

return {
resultCode: this.resultCode,
tokenIndex: this.tokenIndex,
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining why resultMappings doesn't need to be snapshotted?

@@ -217,6 +229,7 @@ export default class TokenProcessor {
copyToken(): void {
this.resultCode += this.previousWhitespaceAndComments();
this.appendTokenPrefix();
this.resultMappings[this.tokenIndex] = this.resultCode.length;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that source maps provided a way to say that certain regions of code are unchanged, which would avoid the need for copyToken mappings and probably speed up and simplify the source maps, but unfortunately it looks like you really do need to define a mapping for basically every token, at least for the Chrome Dev Tools and Jest cases that I tested.

Comment on lines 444 to 451
if (newlineIndex >= 0) {
for (; i < mappings.length; i++) {
const mapping = mappings[i];
if (mapping !== undefined && mapping > newlineIndex) {
break;
}
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file starts with a #!, the first line is treated as a comment and is not a token, so there shouldn't be a need to skip any mappings here, so I think this case can be removed and there's no need for the newlineIndex param.

maybeAddSegment(map, line, genColumn, filePath, line, sourceColumn);
for (
currentMapping = rawMappings[++j];
currentMapping === i || currentMapping === undefined;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bounds check on j here as well?

}
}
}
// eslint-disable-next-line @typescript-eslint/no-unused-vars
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal to have this ignore comment, but you may be able to avoid it by adding ignoreRestSiblings to the rule config in .eslintrc.js. https://eslint.org/docs/latest/rules/no-unused-vars#ignorerestsiblings

return sourceColumns;
}

function computeDetailedSourceMap(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The logic all looks good to me, and it's nice to have it self-contained here. It might be better at some point to switch to pre-computing token column indices if source maps will always need all of them, but there's certainly a tradeoff, and this approach seems good.

fs.writeFileSync(path.join(outDir, "test.js.map"), JSON.stringify(result.sourceMap));
}
const {mappings} = result.sourceMap!;
assert.match(mappings, /^[^;]+(;[^;]+){3}$/); // 4 lines
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you actually just assert the entire string? It looks to me like it isn't so long, and I'd think of it like a snapshot test where any changes (e.g. changes to the column logic or a library upgrade) would be spot-checked, even though it's a bit opaque to someone reading the test.

file: "test.js",
});
if (!simple) {
if (process.env.WRITE_SOURCE_MAPS) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How have you been using this, out of curiosity? It might be something that could be added to the demo website instead (though certainly that would be out of scope for this PR), or it might be less necessary if you hard-code the expected mappings.

Copy link
Contributor Author

@forivall forivall Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using that with source map visualization tools - specifically https://evanw.github.io/source-map-visualization/ and https://sokra.github.io/source-map-visualization/ (via https://www.npmjs.com/package/source-map-visualize) - to validate that the mappings were correct.

This bit is entirely contained in 212f7e2 so i can just drop or revert that commit from the PR to remove this.

alangpierce added a commit that referenced this pull request Apr 3, 2023
Progress toward #541, prep work to help make #759 (and any future source map
work) more manually testable.

This PR splits out token information into a new collapased-by-default debug
section, then adds source map info to that section. The source map toggle does
three things:
* Show the actual mappings from the source map by line/column index.
* Show the JSON value of the source map.
* Extend the normal code output to include an inline source map with the full
  original source embedded, then link to the very helpful tool
  https://evanw.github.io/source-map-visualization/ , which accepts pasting the
  output code.
alangpierce added a commit that referenced this pull request Apr 8, 2023
Progress toward #541, prep work to help make #759 (and any future source map
work) more manually testable.

This PR splits out token information into a new collapased-by-default debug
section, then adds source map info to that section. The source map toggle does
three things:
* Show the actual mappings from the source map by line/column index.
* Show the JSON value of the source map.
* Extend the normal code output to include an inline source map with the full
  original source embedded, then link to the very helpful tool
  https://evanw.github.io/source-map-visualization/ , which accepts pasting the
  output code.
* Get rid of simple source map mode. For now, the detailed source maps will be
  the only mode. Also inline `computeDetailedSourceMap` now that it's the only
  case.
* Remove `WRITE_SOURCE_MAPS` in test and related details. In alangpierce#793 I added source
  map info to the demo website, which is a nicer system for manually verifying
  correctness.
* Update test code to assert the full source map and a JSON representation of
  the mappings.
* Add integration test for Jest inline snapshots.
* Change lint config to ignore unused rest siblings.
* Add bounds checks and change empty `for` loops to `while` (just a style
  preference).
* Add comments in a few more places.
* Remove unnecessary special case when handling hashbang lines.
@alangpierce
Copy link
Owner

alangpierce commented Apr 9, 2023

Thanks again for the contribution! I was revisiting this to get it through and pushed an update with a few changes:

  • Get rid of simple source map mode. For now, the detailed source maps will be
    the only mode. Also inline computeDetailedSourceMap now that it's the only
    case.
  • Remove WRITE_SOURCE_MAPS in test and related details. In Add source map debugging to website #793 I added source
    map info to the demo website, which is a nicer system for manually verifying
    correctness.
  • Update test code to assert the full source map and a JSON representation of
    the mappings.
  • Add integration test for Jest inline snapshots.
  • Change lint config to ignore unused rest siblings.
  • Add bounds checks and change empty for loops to while (just a style
    preference).
  • Add comments in a few more places.
  • Remove unnecessary special case when handling hashbang lines.

@alangpierce alangpierce self-requested a review April 9, 2023 03:52
@alangpierce alangpierce merged commit 2fafe71 into alangpierce:main Apr 9, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add column data to source map mappings
2 participants