-
Notifications
You must be signed in to change notification settings - Fork 22
Avoid ambiguity in regexp-based extraction #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -1350,7 +1350,7 @@ | |||
<emu-alg> | |||
1. Let _tokens_ be the List of tokens obtained by parsing _source_ according to <emu-xref href="#sec-ecmascript-language-lexical-grammar">ECMA-262's lexical grammar</emu-xref>. | |||
1. For each nonterminal _token_ in _tokens_, in reverse order, do | |||
1. If _token_ is not |SingleLineComment| or |MultiLineComment|, return *null*. | |||
1. If _token_ is not |SingleLineComment|, return *null*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check for disallowed chars here so that both implementation match up
@@ -1376,25 +1376,17 @@ | |||
1. Set _position_ to _position_ + 1. | |||
1. If _second_ is U+002F (SOLIDUS), then | |||
1. Let _comment_ be the substring of _lineStr_ from _position_ to _lineLength_. | |||
1. If _comment_ contains the code point U+0022 (QUOTATION MARK), U+0027 (APOSTROPHE), U+002F (SOLIDUS), or U+0060 (GRAVE ACCENT), then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/
-> */
@@ -1350,7 +1350,7 @@ | |||
<emu-alg> | |||
1. Let _tokens_ be the List of tokens obtained by parsing _source_ according to <emu-xref href="#sec-ecmascript-language-lexical-grammar">ECMA-262's lexical grammar</emu-xref>. | |||
1. For each nonterminal _token_ in _tokens_, in reverse order, do | |||
1. If _token_ is not |SingleLineComment| or |MultiLineComment|, return *null*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to skip over whitespace tokens here?
Some context (cc @DanielRosenwasser) When we were first exploring the So when we presented the specification to tc39 last year we pitched offering both approaches. We got some pushback that the regex approach should have the same functionality as the parsing approach (never returning different results) or at least that one approach should be a subset of the other. This PR is @nicolo-ribaudo's attempt at updating the regex approach to align with that goal but we wanted to make sure that this approach still works well for TypeScript and VSCode. |
This PR changes:
`
,'
,"
, or/
.The notes still need to be updated.