New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve source map generation #759
Improve source map generation #759
Conversation
Codecov Report
@@ Coverage Diff @@
## main #759 +/- ##
=======================================
Coverage 88.32% 88.33%
=======================================
Files 55 55
Lines 5955 6001 +46
Branches 1415 1422 +7
=======================================
+ Hits 5260 5301 +41
- Misses 430 432 +2
- Partials 265 268 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Benchmark resultsBefore this PR: 405 thousand lines per second Measured change: 0.4% slower (1.18% slower to 0.16% faster) |
this _shouldnt_ theoretically cause any issues, as any mappings generated in the snapshot should be overwritten. if needed, we can clear out the mappings generated after `this.tokenIndex` in `restoreToSnapshot`
On my machine, latest fix brings down the perf loss to
Additional experimentation shows that if I omit the mappings from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I think this will also fix #667 . I still need to take a deeper look at the details, but overall this looks like a good approach.
My main hesitations with detailed source maps in the past have been around complexity and performance, so thanks for implementing it in a way that's relatively lightweight in TokenProcessor. In addition to the automated benchmark (which doesn't include source maps), would you mind doing a comparison of performance without source maps vs simple source maps vs detailed source maps? (You could modify the sucrase.transform
args in benchmark.ts
and then run yarn benchmark jest-dev
with different configs.)
Additional experimentation shows that if I omit the mappings from TokenProcessor's snapshot(), the perf loss is completely gone. Worth it?
I think it would be good to avoid the .slice()
somehow. As-is, it adds n^2 time for a file needing many snapshots. Two thoughts:
- You could pass a flag into
TokenProcessor
to know if detailed source maps are enabled, and use that flag to disable any sourcemap-related work if it won't be used. Hopefully the JIT would recognize that the flag is always false and there would be very little overhead in that case. - I think just skipping
.slice()
is actually ok, since all use cases re-walk the same tokens after restore and would set any updated values. It's certainly a little fragile, but I think should be fine with an explanation comment. It's also relatively low-stakes.
Also, I'm curious what your use case is for column mappings (as opposed to just line mappings). I know there's #541 and #667, just curious if you had another use case in mind.
src/Options.ts
Outdated
@@ -12,6 +12,7 @@ export interface SourceMapOptions { | |||
* file. | |||
*/ | |||
compiledFilename: string; | |||
simple?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a JSDoc explanation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I just wanted to get things working and feedback on naming, etc. before digging into the polish of full documentation 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking now that the best approach for this PR is to actually not have a simple
option, and instead just change the source map behavior to always compute detailed source maps. If there's demand for the simple mode, it can be added later, but I'd rather keep the options simple and avoid the need to maintain multiple modes. Could you remove this option and the implementation and tests for simple mode?
Sure! I'll try get to that on Friday
Yeah, that's my thinking too, as I noted in the commit message (which i forgot to push up 😅 ):
My main use case is to improve debugging - for example, the chrome debugger allows setting breakpoints on specific expressions in the middle of a statement. Not super high-stakes, but I was in a mood to look at the code to see if i could figure out a solution, and i found that it was relatively straightforward to add the necessary support code in |
Sorry for the delay. Here's the results from the benchmark script i based off of the existing benchmark script: Speeds for No sourcemaps: 767899, 775036, 775274, 775891, 776099 Benchmark resultsNo sourcemaps: 775.3 thousand lines per second ChangeFull sourcemaps: 30.95% slower (30.96% slower to 31.03% slower) So yeah, seems like full sourcemap generation is 3x slower than the existing simple sourcemap generation. |
No problem on the delay, thanks for running the numbers! The 30% slowdown does give me some hesitation, but I confirmed with some extra perf tests that there is minimal/no impact when source maps are disabled, so this change seems at least fine to release in an opt-in way. I want to think a little more about the right approach from a configuration standpoint. I think there are two open questions here:
I also want to take a closer look at the implementation and see if there's a way to speed it up and maybe also avoid the extra dependency. (In a perfect world, Sucrase would have zero dependencies, though a source map library feels pretty defensible.) Given that Sucrase still creates fairly predictable source maps and in most cases makes only minor changes to the code, there might be opportunity for optimization. Here's a profile I ran and at least ~half of the source map time is spent in the library, potentially more. That said, it's probably also reasonable to defer additional source map performance work and possibly speed it up later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A few small things, but overall looks good! I tested and confirmed that this fixed chrome dev tools inline breakpoints and also fixes Jest's toMatchInlineSnapshot
.
src/Options.ts
Outdated
@@ -12,6 +12,7 @@ export interface SourceMapOptions { | |||
* file. | |||
*/ | |||
compiledFilename: string; | |||
simple?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking now that the best approach for this PR is to actually not have a simple
option, and instead just change the source map behavior to always compute detailed source maps. If there's demand for the simple mode, it can be added later, but I'd rather keep the options simple and avoid the need to maintain multiple modes. Could you remove this option and the implementation and tests for simple mode?
export default class TokenProcessor { | ||
private resultCode: string = ""; | ||
private resultMappings: Array<number | undefined> = new Array(this.tokens.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining the format of this array?
return { | ||
resultCode: this.resultCode, | ||
tokenIndex: this.tokenIndex, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining why resultMappings
doesn't need to be snapshotted?
@@ -217,6 +229,7 @@ export default class TokenProcessor { | |||
copyToken(): void { | |||
this.resultCode += this.previousWhitespaceAndComments(); | |||
this.appendTokenPrefix(); | |||
this.resultMappings[this.tokenIndex] = this.resultCode.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that source maps provided a way to say that certain regions of code are unchanged, which would avoid the need for copyToken
mappings and probably speed up and simplify the source maps, but unfortunately it looks like you really do need to define a mapping for basically every token, at least for the Chrome Dev Tools and Jest cases that I tested.
src/transformers/RootTransformer.ts
Outdated
if (newlineIndex >= 0) { | ||
for (; i < mappings.length; i++) { | ||
const mapping = mappings[i]; | ||
if (mapping !== undefined && mapping > newlineIndex) { | ||
break; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file starts with a #!
, the first line is treated as a comment and is not a token, so there shouldn't be a need to skip any mappings here, so I think this case can be removed and there's no need for the newlineIndex
param.
src/computeSourceMap.ts
Outdated
maybeAddSegment(map, line, genColumn, filePath, line, sourceColumn); | ||
for ( | ||
currentMapping = rawMappings[++j]; | ||
currentMapping === i || currentMapping === undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a bounds check on j
here as well?
src/computeSourceMap.ts
Outdated
} | ||
} | ||
} | ||
// eslint-disable-next-line @typescript-eslint/no-unused-vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal to have this ignore comment, but you may be able to avoid it by adding ignoreRestSiblings
to the rule config in .eslintrc.js
. https://eslint.org/docs/latest/rules/no-unused-vars#ignorerestsiblings
src/computeSourceMap.ts
Outdated
return sourceColumns; | ||
} | ||
|
||
function computeDetailedSourceMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The logic all looks good to me, and it's nice to have it self-contained here. It might be better at some point to switch to pre-computing token column indices if source maps will always need all of them, but there's certainly a tradeoff, and this approach seems good.
test/source-maps-test.ts
Outdated
fs.writeFileSync(path.join(outDir, "test.js.map"), JSON.stringify(result.sourceMap)); | ||
} | ||
const {mappings} = result.sourceMap!; | ||
assert.match(mappings, /^[^;]+(;[^;]+){3}$/); // 4 lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you actually just assert the entire string? It looks to me like it isn't so long, and I'd think of it like a snapshot test where any changes (e.g. changes to the column logic or a library upgrade) would be spot-checked, even though it's a bit opaque to someone reading the test.
test/source-maps-test.ts
Outdated
file: "test.js", | ||
}); | ||
if (!simple) { | ||
if (process.env.WRITE_SOURCE_MAPS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How have you been using this, out of curiosity? It might be something that could be added to the demo website instead (though certainly that would be out of scope for this PR), or it might be less necessary if you hard-code the expected mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using that with source map visualization tools - specifically https://evanw.github.io/source-map-visualization/ and https://sokra.github.io/source-map-visualization/ (via https://www.npmjs.com/package/source-map-visualize) - to validate that the mappings were correct.
This bit is entirely contained in 212f7e2
so i can just drop or revert that commit from the PR to remove this.
Progress toward #541, prep work to help make #759 (and any future source map work) more manually testable. This PR splits out token information into a new collapased-by-default debug section, then adds source map info to that section. The source map toggle does three things: * Show the actual mappings from the source map by line/column index. * Show the JSON value of the source map. * Extend the normal code output to include an inline source map with the full original source embedded, then link to the very helpful tool https://evanw.github.io/source-map-visualization/ , which accepts pasting the output code.
Progress toward #541, prep work to help make #759 (and any future source map work) more manually testable. This PR splits out token information into a new collapased-by-default debug section, then adds source map info to that section. The source map toggle does three things: * Show the actual mappings from the source map by line/column index. * Show the JSON value of the source map. * Extend the normal code output to include an inline source map with the full original source embedded, then link to the very helpful tool https://evanw.github.io/source-map-visualization/ , which accepts pasting the output code.
* Get rid of simple source map mode. For now, the detailed source maps will be the only mode. Also inline `computeDetailedSourceMap` now that it's the only case. * Remove `WRITE_SOURCE_MAPS` in test and related details. In alangpierce#793 I added source map info to the demo website, which is a nicer system for manually verifying correctness. * Update test code to assert the full source map and a JSON representation of the mappings. * Add integration test for Jest inline snapshots. * Change lint config to ignore unused rest siblings. * Add bounds checks and change empty `for` loops to `while` (just a style preference). * Add comments in a few more places. * Remove unnecessary special case when handling hashbang lines.
Thanks again for the contribution! I was revisiting this to get it through and pushed an update with a few changes:
|
Fixes #541
Let me know what kind of refactoring will please you! I'd love to get this in to improve source-map debugging. I implemented it kinda weird to minimize theoretical perf impact when sourcemaps arent requested.