Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract string parsing to a separate package #14772

Merged

Conversation

nicolo-ribaudo
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo commented Jul 19, 2022

Q                       A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

This was hard, but it lets us avoid an external dependency with some logic that we already have in #14757.

@liuxingbaoyu you should be able to do something like this:

let unterminatedCalled = false;
try {
  const error = () => { throw new Error(); };
  cooked = parseStringContents("template", raw, 0, 0, 0, {
    unterminated() { unterminatedCalled = true; },
    strictNumericEscape: error,
    invalidEscapeSequence: error,
    numericSeparatorInEscapeSequence: error,
    unexpectedNumericSeparator: error,
    invalidDigit: error,
    invalidCodePoint: error,
  });
  if (!unterminatedCalled) cooked = null; // we must consume the whole raw string
} catch {}

This PR mostly moves code around, with one main change: instead of directly updating this.state.{pos,curLine,lineStart}, we keep track of them in variables and we update them only after returning from the helper parser functions.

@nicolo-ribaudo nicolo-ribaudo added PR: Internal 🏠 A type of pull request used for our changelog categories pkg: parser labels Jul 19, 2022
@JLHwung JLHwung self-requested a review Jul 20, 2022
@liuxingbaoyu
Copy link
Contributor

liuxingbaoyu commented Jul 20, 2022

Amazing!

babel/babel.config.js

Lines 208 to 214 in 486d27a

{
test: [
"packages/babel-generator",
"packages/babel-plugin-proposal-decorators",
].map(normalize),
plugins: ["babel-plugin-transform-charcodes"],
},

Can you add it here?

@nicolo-ribaudo nicolo-ribaudo force-pushed the parser-extract-string-parsing branch from 43cc94a to 9971813 Compare Jul 20, 2022
@babel-bot
Copy link
Collaborator

babel-bot commented Jul 20, 2022

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/52568/

@@ -20,7 +20,7 @@
"extra": {
"rawValue": "\\u{}",
"raw": "\"\\u{}\"",
"expressionValue": "null"
"expressionValue": ""
Copy link
Contributor

@JLHwung JLHwung Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behaviour here is better than current main. Should we set expressionValue to be null when the string can not be successfully parsed?

Copy link
Member Author

@nicolo-ribaudo nicolo-ribaudo Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't know why the behavior changed 😬
But yes, null would be even better.

Copy link
Member Author

@nicolo-ribaudo nicolo-ribaudo Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, expressionValue is a property of directives that represents the .value property of the originally parsed StringLiteral. I feat that making it null would break anything relying on the fact that StringLiteral.value is a string, so I prefer the behavior currently implemented by this PR of just replacing invalid escapes with "" (for example, ab\u{}cd is abcd).

@nicolo-ribaudo nicolo-ribaudo force-pushed the parser-extract-string-parsing branch from 700d6f0 to 23418cd Compare Jul 20, 2022
Copy link
Contributor

@JLHwung JLHwung left a comment

Awesome!

Copy link
Contributor

@liuxingbaoyu liuxingbaoyu left a comment

I did a lax benchmark with no significant performance penalty.

current 1 jquery 3.6: 40.11 ops/sec ±5.28% (25ms)
current 4 jquery 3.6: 9.5 ops/sec ±3.78% (105ms)
current 16 jquery 3.6: 2.22 ops/sec ±7.14% (451ms)
current 64 jquery 3.6: 0.54 ops/sec ±2.83% (1861ms)
baseline 1 jquery 3.6: 41.42 ops/sec ±5.53% (24ms)
baseline 4 jquery 3.6: 9.89 ops/sec ±2.5% (101ms)
baseline 16 jquery 3.6: 2.36 ops/sec ±5% (424ms)
baseline 64 jquery 3.6: 0.55 ops/sec ±9.05% (1830ms)

@nicolo-ribaudo
Copy link
Member Author

nicolo-ribaudo commented Jul 20, 2022

Good! I believe rewriting all the tokenizer like this could actually improve performance, because we would skip all the this.state.* accesses and the this.... prototype lookup to call other tokenizer methods.

@nicolo-ribaudo nicolo-ribaudo merged commit ea5ff29 into babel:main Jul 20, 2022
38 checks passed
@nicolo-ribaudo nicolo-ribaudo deleted the parser-extract-string-parsing branch Jul 20, 2022
"devDependencies": {
"charcodes": "^0.2.0"
},
Copy link
Contributor

@merceyz merceyz Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if @babel/helper-string-parser will be bundled so asking to be on the safe side; shouldn't this be a normal dependency?

Copy link
Member Author

@nicolo-ribaudo nicolo-ribaudo Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a plugin that inlines charcodes.*** into the corresponding codepoints: https://github.com/xtuc/charcodes/blob/master/packages/babel-plugin-transform-charcodes/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants