Fix flow-remove-types for input with multi-byte characters #7781

mourner · 2019-06-03T20:21:58Z

Closes #7779 by implementing the approach described by @mroch in #7779 (comment).

Remaining work:

~~Try eliminating all string manipulation and buffer.write calls in favor of purely byte-based operations for performance.~~ Not worth it — bottleneck is in parsing.
Benchmark the change to make sure performance didn't regress.
Address any review suggestions.
Sign the CLA.

facebook-github-bot · 2019-06-03T20:22:10Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2019-06-03T20:42:16Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

mourner · 2019-06-03T20:49:29Z

@mroch this is ready for review. While the performance didn't regress, there's no point in optimizing the code much because almost all time is spent in flow-parser. Preliminary benchmarks show that it's about 2.5–4 times slower than the previous flow-remove-type version based on babylon. 😭

facebook-github-bot

@motiz88 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

motiz88

Thanks @mourner for catching this and working on a fix. I have a concern regarding source maps and some other minor comments, but this looks good in principle.

motiz88 · 2019-06-19T11:08:51Z

packages/flow-remove-types/index.js

@@ -397,6 +407,7 @@ function getTrailingLineNode(context, node) {
 // Creates a zero-width "node" with a value to splice at that position.
 // WARNING: This is only safe to use when source maps are off!
 function getSpliceNodeAtPos(context, pos, loc, value) {
+  context.bytesAdded += value.length;


This is technically mixing JS string length (UTF-16 code units) and UTF-8 bytes. Not a problem yet because the only call to getSpliceNodeAtPos uses a plain ASCII string (' =>'), but maybe we should future-proof this?

motiz88 · 2019-06-19T11:19:39Z

packages/flow-remove-types/index.js

-      return (result += source.slice(lastPos));
+      offset += sourceBuffer.copy(buf, offset, lastPos, sourceBuffer.length);
+
+      return buf.toString('utf8', 0, offset);
    },
    generateMap: function() {


How does this issue translate to source maps? I'm suddenly worried that the source map spec is silent on what "columns" mean, but I'd imagine it's meant to be interpreted more like "code points" than "bytes". So we might need to do some Unicode bookkeeping in generateSourceMappings as well.

motiz88 · 2019-06-19T11:29:01Z

packages/flow-remove-types/test/source.js

@@ -1,6 +1,8 @@
 /* @flow */
 // @nolint

+// multi-byte chars: Гарного дня, котики!


Can we add some non-BMP (multi UTF-16 code unit) characters to the test? e.g. emoji: '🐈'.length == 2, Buffer.from('🐈').length == 4

Also, I'd add a test case with non-ASCII characters in the actual code that we process, e.g.

var lambda: λ = (α: number): number => α;

thehogfather · 2019-08-17T01:51:48Z

@mourner have you still got time to look into this and address the comments?

mourner · 2019-08-23T16:19:29Z

@thehogfather apologies — can't get to it at the moment but hopefully I'll find some time in a week or two. The only comment left unaddressed is this one:

How does this issue translate to source maps? I'm suddenly worried that the source map spec is silent on what "columns" mean, but I'd imagine it's meant to be interpreted more like "code points" than "bytes". So we might need to do some Unicode bookkeeping in generateSourceMappings as well.

If anyone's up to helping out on the source maps part, I'd appreciate it!

byte-based indexing to fix multi-byte characters input

5c00d61

closes facebook#7779

facebook-github-bot added the CLA Signed label Jun 3, 2019

mourner marked this pull request as ready for review June 3, 2019 20:45

fix regression in noop case

a170b55

nmote assigned mroch Jun 6, 2019

motiz88 self-requested a review June 19, 2019 10:59

facebook-github-bot reviewed Jun 19, 2019

View reviewed changes

motiz88 suggested changes Jun 19, 2019

View reviewed changes

address some of the review comments, better tests

787ef1a

goodmind added the Stalled Issues and PRs that are stalled. label Jul 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flow-remove-types for input with multi-byte characters #7781

Fix flow-remove-types for input with multi-byte characters #7781

mourner commented Jun 3, 2019 •

edited

facebook-github-bot commented Jun 3, 2019

facebook-github-bot commented Jun 3, 2019

mourner commented Jun 3, 2019 •

edited

facebook-github-bot left a comment

motiz88 left a comment

motiz88 Jun 19, 2019

motiz88 Jun 19, 2019

motiz88 Jun 19, 2019

thehogfather commented Aug 17, 2019

mourner commented Aug 23, 2019

Fix flow-remove-types for input with multi-byte characters #7781

Are you sure you want to change the base?

Fix flow-remove-types for input with multi-byte characters #7781

Conversation

mourner commented Jun 3, 2019 • edited

facebook-github-bot commented Jun 3, 2019

facebook-github-bot commented Jun 3, 2019

mourner commented Jun 3, 2019 • edited

facebook-github-bot left a comment

Choose a reason for hiding this comment

motiz88 left a comment

Choose a reason for hiding this comment

motiz88 Jun 19, 2019

Choose a reason for hiding this comment

motiz88 Jun 19, 2019

Choose a reason for hiding this comment

motiz88 Jun 19, 2019

Choose a reason for hiding this comment

thehogfather commented Aug 17, 2019

mourner commented Aug 23, 2019

mourner commented Jun 3, 2019 •

edited

mourner commented Jun 3, 2019 •

edited