Improve source map generation #759

forivall · 2022-11-09T03:51:43Z

Fixes #541

Let me know what kind of refactoring will please you! I'd love to get this in to improve source-map debugging. I implemented it kinda weird to minimize theoretical perf impact when sourcemaps arent requested.

codecov · 2022-11-09T05:39:49Z

Codecov Report

Merging #759 (d0862b6) into main (7e5b06a) will increase coverage by 0.00%.
The diff coverage is 89.65%.

@@           Coverage Diff           @@
##             main     #759   +/-   ##
=======================================
  Coverage   88.32%   88.33%           
=======================================
  Files          55       55           
  Lines        5955     6001   +46     
  Branches     1415     1422    +7     
=======================================
+ Hits         5260     5301   +41     
- Misses        430      432    +2     
- Partials      265      268    +3

Impacted Files	Coverage Δ
src/Options.ts	`100.00% <ø> (ø)`
src/computeSourceMap.ts	`88.09% <87.17%> (-0.80%)`	⬇️
src/transformers/RootTransformer.ts	`93.39% <88.88%> (-0.30%)`	⬇️
src/TokenProcessor.ts	`95.38% <100.00%> (+0.22%)`	⬆️
src/index.ts	`92.15% <100.00%> (+0.15%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

github-actions · 2022-11-09T05:40:40Z

Benchmark results

Before this PR: 405 thousand lines per second
After this PR: 403.3 thousand lines per second

Measured change: 0.4% slower (1.18% slower to 0.16% faster)
Summary: Likely no significant difference

this _shouldnt_ theoretically cause any issues, as any mappings generated in the snapshot should be overwritten. if needed, we can clear out the mappings generated after `this.tokenIndex` in `restoreToSnapshot`

forivall · 2022-11-09T07:03:49Z

On my machine, latest fix brings down the perf loss to

Measured change: 2.6% slower (2.74% slower to 0.94% slower)

Additional experimentation shows that if I omit the mappings from TokenProcessor's snapshot(), the perf loss is completely gone. Worth it?

alangpierce

Thank you! I think this will also fix #667 . I still need to take a deeper look at the details, but overall this looks like a good approach.

My main hesitations with detailed source maps in the past have been around complexity and performance, so thanks for implementing it in a way that's relatively lightweight in TokenProcessor. In addition to the automated benchmark (which doesn't include source maps), would you mind doing a comparison of performance without source maps vs simple source maps vs detailed source maps? (You could modify the sucrase.transform args in benchmark.ts and then run yarn benchmark jest-dev with different configs.)

Additional experimentation shows that if I omit the mappings from TokenProcessor's snapshot(), the perf loss is completely gone. Worth it?

I think it would be good to avoid the .slice() somehow. As-is, it adds n^2 time for a file needing many snapshots. Two thoughts:

You could pass a flag into TokenProcessor to know if detailed source maps are enabled, and use that flag to disable any sourcemap-related work if it won't be used. Hopefully the JIT would recognize that the flag is always false and there would be very little overhead in that case.
I think just skipping .slice() is actually ok, since all use cases re-walk the same tokens after restore and would set any updated values. It's certainly a little fragile, but I think should be fine with an explanation comment. It's also relatively low-stakes.

Also, I'm curious what your use case is for column mappings (as opposed to just line mappings). I know there's #541 and #667, just curious if you had another use case in mind.

alangpierce · 2022-11-09T07:39:56Z

src/Options.ts

@@ -12,6 +12,7 @@ export interface SourceMapOptions {
   * file.
   */
  compiledFilename: string;
+  simple?: boolean;


Could you add a JSDoc explanation here?

Absolutely, I just wanted to get things working and feedback on naming, etc. before digging into the polish of full documentation 😄

I'm thinking now that the best approach for this PR is to actually not have a simple option, and instead just change the source map behavior to always compute detailed source maps. If there's demand for the simple mode, it can be added later, but I'd rather keep the options simple and avoid the need to maintain multiple modes. Could you remove this option and the implementation and tests for simple mode?

forivall · 2022-11-10T01:17:45Z

In addition to the automated benchmark (which doesn't include source maps), would you mind doing a comparison of performance without source maps vs simple source maps vs detailed source maps?

Sure! I'll try get to that on Friday

I think just skipping .slice() is actually ok, since all use cases re-walk the same tokens after restore and would set any updated values. It's certainly a little fragile, but I think should be fine with an explanation comment. It's also relatively low-stakes.

Yeah, that's my thinking too, as I noted in the commit message (which i forgot to push up 😅 ):

this shouldnt theoretically cause any issues, as any mappings
generated in the snapshot should be overwritten. if needed, we can clear
out the mappings generated after this.tokenIndex in restoreToSnapshot

My main use case is to improve debugging - for example, the chrome debugger allows setting breakpoints on specific expressions in the middle of a statement. Not super high-stakes, but I was in a mood to look at the code to see if i could figure out a solution, and i found that it was relatively straightforward to add the necessary support code in TokenProcessor, and then I really wanted to see if it worked, and was pleased that it did!

forivall · 2022-11-21T09:12:39Z

Sorry for the delay. Here's the results from the benchmark script i based off of the existing benchmark script:

Speeds for No sourcemaps: 767899, 775036, 775274, 775891, 776099
Speeds for Full sourcemaps: 533671, 534539, 535301, 535712, 536317
Speeds for Simple sourcemaps: 700954, 703167, 703572, 703653, 703936

Benchmark results

No sourcemaps: 775.3 thousand lines per second
Full sourcemaps: 535.3 thousand lines per second
Simple sourcemaps: 703.6 thousand lines per second

Change

Full sourcemaps: 30.95% slower (30.96% slower to 31.03% slower)
Simple sourcemaps: 9.25% slower (9.31% slower to 9.27% slower)

So yeah, seems like full sourcemap generation is 3x slower than the existing simple sourcemap generation.

alangpierce · 2022-11-23T19:11:42Z

No problem on the delay, thanks for running the numbers!

The 30% slowdown does give me some hesitation, but I confirmed with some extra perf tests that there is minimal/no impact when source maps are disabled, so this change seems at least fine to release in an opt-in way.

I want to think a little more about the right approach from a configuration standpoint. I think there are two open questions here:

How many source map modes should there be? The current PR has three modes, "no source maps", "simple/fast source maps", and "full source maps". That gives the most flexibility, but I do also want Sucrase to be as simple as possible from a configuration standpoint, so I'd maybe lean toward getting rid of the simple mode entirely, especially if the perf difference is minor. 30% vs 10% is maybe not quite "minor", so the best approach doesn't seem obvious. My gut feel is that Sucrase is so fast that usually other tools/operations are the dominating performance issue, so most people would prefer the more correct source maps at the expense of some perf. (An even more aggressive approach would be to get rid of the "no source maps" mode and always generate a full source map, though that would require specifying a file path when calling directly.)
What is the default mode? The current PR makes "full source maps" the default mode, but the most defensive approach would be to keep "simple/fast source maps" as default. I think I agree with "full source maps" as the default, but want to call it out as another decision point.

I also want to take a closer look at the implementation and see if there's a way to speed it up and maybe also avoid the extra dependency. (In a perfect world, Sucrase would have zero dependencies, though a source map library feels pretty defensible.) Given that Sucrase still creates fairly predictable source maps and in most cases makes only minor changes to the code, there might be opportunity for optimization. Here's a profile I ran and at least ~half of the source map time is spent in the library, potentially more.

That said, it's probably also reasonable to defer additional source map performance work and possibly speed it up later.

alangpierce

Thanks! A few small things, but overall looks good! I tested and confirmed that this fixed chrome dev tools inline breakpoints and also fixes Jest's toMatchInlineSnapshot.

alangpierce · 2022-11-26T08:26:07Z

src/Options.ts

@@ -12,6 +12,7 @@ export interface SourceMapOptions {
   * file.
   */
  compiledFilename: string;
+  simple?: boolean;


I'm thinking now that the best approach for this PR is to actually not have a simple option, and instead just change the source map behavior to always compute detailed source maps. If there's demand for the simple mode, it can be added later, but I'd rather keep the options simple and avoid the need to maintain multiple modes. Could you remove this option and the implementation and tests for simple mode?

alangpierce · 2022-11-26T08:36:43Z

src/TokenProcessor.ts

 export default class TokenProcessor {
  private resultCode: string = "";
+  private resultMappings: Array<number | undefined> = new Array(this.tokens.length);


Could you add a comment explaining the format of this array?

alangpierce · 2022-11-26T08:37:13Z

src/TokenProcessor.ts

+    return {
+      resultCode: this.resultCode,
+      tokenIndex: this.tokenIndex,
+    };


Could you add a comment explaining why resultMappings doesn't need to be snapshotted?

alangpierce · 2022-11-26T08:54:18Z

src/TokenProcessor.ts

@@ -217,6 +229,7 @@ export default class TokenProcessor {
  copyToken(): void {
    this.resultCode += this.previousWhitespaceAndComments();
    this.appendTokenPrefix();
+    this.resultMappings[this.tokenIndex] = this.resultCode.length;


I was hoping that source maps provided a way to say that certain regions of code are unchanged, which would avoid the need for copyToken mappings and probably speed up and simplify the source maps, but unfortunately it looks like you really do need to define a mapping for basically every token, at least for the Chrome Dev Tools and Jest cases that I tested.

alangpierce · 2022-11-26T18:48:19Z

src/transformers/RootTransformer.ts

+    if (newlineIndex >= 0) {
+      for (; i < mappings.length; i++) {
+        const mapping = mappings[i];
+        if (mapping !== undefined && mapping > newlineIndex) {
+          break;
+        }
+      }
+    }


If the file starts with a #!, the first line is treated as a comment and is not a token, so there shouldn't be a need to skip any mappings here, so I think this case can be removed and there's no need for the newlineIndex param.

alangpierce · 2022-11-26T20:31:15Z

src/computeSourceMap.ts

+      maybeAddSegment(map, line, genColumn, filePath, line, sourceColumn);
+      for (
+        currentMapping = rawMappings[++j];
+        currentMapping === i || currentMapping === undefined;


Could you add a bounds check on j here as well?

alangpierce · 2022-11-26T20:38:06Z

src/computeSourceMap.ts

+      }
+    }
+  }
+  // eslint-disable-next-line @typescript-eslint/no-unused-vars


Not a big deal to have this ignore comment, but you may be able to avoid it by adding ignoreRestSiblings to the rule config in .eslintrc.js. https://eslint.org/docs/latest/rules/no-unused-vars#ignorerestsiblings

alangpierce · 2022-11-26T20:41:03Z

src/computeSourceMap.ts

+  return sourceColumns;
+}
+
+function computeDetailedSourceMap(


Thanks! The logic all looks good to me, and it's nice to have it self-contained here. It might be better at some point to switch to pre-computing token column indices if source maps will always need all of them, but there's certainly a tradeoff, and this approach seems good.

alangpierce · 2022-11-26T20:43:51Z

test/source-maps-test.ts

+        fs.writeFileSync(path.join(outDir, "test.js.map"), JSON.stringify(result.sourceMap));
+      }
+      const {mappings} = result.sourceMap!;
+      assert.match(mappings, /^[^;]+(;[^;]+){3}$/); // 4 lines


Could you actually just assert the entire string? It looks to me like it isn't so long, and I'd think of it like a snapshot test where any changes (e.g. changes to the column logic or a library upgrade) would be spot-checked, even though it's a bit opaque to someone reading the test.

alangpierce · 2022-11-26T20:48:12Z

test/source-maps-test.ts

      file: "test.js",
    });
+    if (!simple) {
+      if (process.env.WRITE_SOURCE_MAPS) {


How have you been using this, out of curiosity? It might be something that could be added to the demo website instead (though certainly that would be out of scope for this PR), or it might be less necessary if you hard-code the expected mappings.

I've been using that with source map visualization tools - specifically https://evanw.github.io/source-map-visualization/ and https://sokra.github.io/source-map-visualization/ (via https://www.npmjs.com/package/source-map-visualize) - to validate that the mappings were correct.

This bit is entirely contained in 212f7e2 so i can just drop or revert that commit from the PR to remove this.

Progress toward #541, prep work to help make #759 (and any future source map work) more manually testable. This PR splits out token information into a new collapased-by-default debug section, then adds source map info to that section. The source map toggle does three things: * Show the actual mappings from the source map by line/column index. * Show the JSON value of the source map. * Extend the normal code output to include an inline source map with the full original source embedded, then link to the very helpful tool https://evanw.github.io/source-map-visualization/ , which accepts pasting the output code.

* Get rid of simple source map mode. For now, the detailed source maps will be the only mode. Also inline `computeDetailedSourceMap` now that it's the only case. * Remove `WRITE_SOURCE_MAPS` in test and related details. In alangpierce#793 I added source map info to the demo website, which is a nicer system for manually verifying correctness. * Update test code to assert the full source map and a JSON representation of the mappings. * Add integration test for Jest inline snapshots. * Change lint config to ignore unused rest siblings. * Add bounds checks and change empty `for` loops to `while` (just a style preference). * Add comments in a few more places. * Remove unnecessary special case when handling hashbang lines.

alangpierce · 2023-04-09T03:52:01Z

Thanks again for the contribution! I was revisiting this to get it through and pushed an update with a few changes:

Get rid of simple source map mode. For now, the detailed source maps will be
the only mode. Also inline computeDetailedSourceMap now that it's the only
case.
Remove WRITE_SOURCE_MAPS in test and related details. In Add source map debugging to website #793 I added source
map info to the demo website, which is a nicer system for manually verifying
correctness.
Update test code to assert the full source map and a JSON representation of
the mappings.
Add integration test for Jest inline snapshots.
Change lint config to ignore unused rest siblings.
Add bounds checks and change empty for loops to while (just a style
preference).
Add comments in a few more places.
Remove unnecessary special case when handling hashbang lines.

forivall added 2 commits November 8, 2022 15:35

Improve source map generation

c97c253

Write generated sourcemap for manual inspection

212f7e2

forivall added 2 commits November 8, 2022 22:57

Source map token offset perf fix

aeacef1

perf: remove mappings from snapshot

a3dc5c3

this _shouldnt_ theoretically cause any issues, as any mappings generated in the snapshot should be overwritten. if needed, we can clear out the mappings generated after `this.tokenIndex` in `restoreToSnapshot`

alangpierce reviewed Nov 9, 2022

View reviewed changes

Update doc comment for simple sourcemaps

ba6fdf1

alangpierce requested changes Nov 26, 2022

View reviewed changes

alangpierce mentioned this pull request Apr 3, 2023

Add source map debugging to website #793

Merged

alangpierce added 2 commits April 7, 2023 22:44

Merge remote-tracking branch 'origin/main' into HEAD

186ba58

alangpierce self-requested a review April 9, 2023 03:52

alangpierce added 2 commits April 8, 2023 21:02

Update some additional comments

b05696d

Try updating integration test to update snapshot in CI

d0862b6

alangpierce approved these changes Apr 9, 2023

View reviewed changes

alangpierce merged commit 2fafe71 into alangpierce:main Apr 9, 2023
8 checks passed

alangpierce mentioned this pull request Apr 9, 2023

Jest plugin doesn't support inline snapshots #667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve source map generation #759

Improve source map generation #759

forivall commented Nov 9, 2022 •

edited

codecov bot commented Nov 9, 2022 •

edited

github-actions bot commented Nov 9, 2022 •

edited

forivall commented Nov 9, 2022 •

edited

alangpierce left a comment

alangpierce Nov 9, 2022

forivall Nov 10, 2022

alangpierce Nov 26, 2022

forivall commented Nov 10, 2022

forivall commented Nov 21, 2022 •

edited

alangpierce commented Nov 23, 2022

alangpierce left a comment

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

alangpierce Nov 26, 2022

forivall Dec 5, 2022 •

edited

alangpierce commented Apr 9, 2023 •

edited

Improve source map generation #759

Improve source map generation #759

Conversation

forivall commented Nov 9, 2022 • edited

codecov bot commented Nov 9, 2022 • edited

Codecov Report

github-actions bot commented Nov 9, 2022 • edited

Benchmark results

forivall commented Nov 9, 2022 • edited

alangpierce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forivall commented Nov 10, 2022

forivall commented Nov 21, 2022 • edited

Benchmark results

Change

alangpierce commented Nov 23, 2022

alangpierce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forivall Dec 5, 2022 • edited

Choose a reason for hiding this comment

alangpierce commented Apr 9, 2023 • edited

forivall commented Nov 9, 2022 •

edited

codecov bot commented Nov 9, 2022 •

edited

github-actions bot commented Nov 9, 2022 •

edited

forivall commented Nov 9, 2022 •

edited

forivall commented Nov 21, 2022 •

edited

forivall Dec 5, 2022 •

edited

alangpierce commented Apr 9, 2023 •

edited