-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ES2023 tokenization #9
Labels
feature
Addition of something new
part: parser
File-to-tree conversion
spec: es2023
Conformance to https://262.ecma-international.org/14.0/
Milestone
Comments
arhadthedev
added
feature
Addition of something new
spec: es2023
Conformance to https://262.ecma-international.org/14.0/
part: parser
File-to-tree conversion
labels
Mar 24, 2024
arhadthedev
added a commit
that referenced
this issue
Mar 26, 2024
We start with the way to report syntax errors.
This was referenced Mar 27, 2024
arhadthedev
added a commit
that referenced
this issue
Apr 6, 2024
arhadthedev
added a commit
that referenced
this issue
Apr 7, 2024
Now we have a global sample table of what tokens the whole lexer (tokenizer) can process, and all tests refer it as their parameter type for substitution.
arhadthedev
added a commit
that referenced
this issue
Apr 7, 2024
arhadthedev
added a commit
that referenced
this issue
Apr 11, 2024
Replace `|foo| foo.1` with `|(_, bar) bar|`. No longer opaque meaning behind numeric fields; we name these fields in place of usage.
arhadthedev
added a commit
that referenced
this issue
Apr 14, 2024
This was referenced Jun 5, 2024
arhadthedev
added a commit
that referenced
this issue
Jun 9, 2024
`get_next_token` tries to get the unparsed tail by chopping of a recognized token no matter whether we got the token or a parse error. As a result, the error yields to a panic. This PR moves the tail extraction into processing of parse success. Also, it adds a regression test that uses `claims` crate.
arhadthedev
added a commit
that referenced
this issue
Jun 18, 2024
arhadthedev
added a commit
that referenced
this issue
Jun 18, 2024
Prepare the parser for exposure of `InputElementRegExp`, `InputElementRegExpOrTemplateTail`, and `InputElementHashbangOrRegExp` extra goal symbols by implementing the unified logic of processing them all. Since all goal symbols mostly share non-terminals on their right side, we can create a enum listing all possible right-side non-terminals. It allows us to write a per-goal logic copying into the unified enum and slightly modify existing `extract_token` to make it a shared processor of the unified enum. For more details and the ultimate purpose, see description of `GoalSymbols` in the parent issue.
arhadthedev
added a commit
that referenced
this issue
Jun 21, 2024
According to the ECMASccript specification, ReservedWord always follows IdentifierName (included in CommonToken) in grammar definition (<https://262.ecma-international.org/14.0/#sec-names-and-keywords>): > The syntactic grammar defines Identifier as an IdentifierName that > is not a ReservedWord. So we make every goal symbol supporting CommonToken into supporting ReservedWord as well.
arhadthedev
added a commit
that referenced
this issue
Jun 21, 2024
According to the ECMASccript specification, ReservedWord always follows IdentifierName (included in CommonToken) in grammar definition (<https://262.ecma-international.org/14.0/#sec-names-and-keywords>): > The syntactic grammar defines Identifier as an IdentifierName that > is not a ReservedWord. So we make every goal symbol supporting CommonToken into supporting ReservedWord as well.
arhadthedev
added a commit
that referenced
this issue
Jun 21, 2024
arhadthedev
added a commit
that referenced
this issue
Jun 21, 2024
From <https://262.ecma-international.org/14.0/#sec-ecmascript-language-lexical-grammar>: > There are several situations where the identification of lexical input > elements is sensitive to the syntactic grammar context that is > consuming the input elements. This requires multiple goal symbols > for the lexical grammar.
arhadthedev
added a commit
that referenced
this issue
Jun 21, 2024
From <https://262.ecma-international.org/14.0/#sec-ecmascript-language-lexical-grammar>: > There are several situations where the identification of lexical input > elements is sensitive to the syntactic grammar context that is > consuming the input elements. This requires multiple goal symbols > for the lexical grammar.
arhadthedev
added a commit
that referenced
this issue
Jun 22, 2024
Some rules match for a certain goal symbols and error for others. Check this.
arhadthedev
added a commit
that referenced
this issue
Jun 22, 2024
Some rules match for a certain goal symbols and error for others. Now we check this.
arhadthedev
added a commit
that referenced
this issue
Jun 26, 2024
For some reason, such a format is easier to read.
arhadthedev
added a commit
that referenced
this issue
Jun 26, 2024
For some reason, such a format is easier to read.
arhadthedev
added a commit
that referenced
this issue
Jun 26, 2024
It allows us to get rid of a mut-argument after caller-side cloning.
arhadthedev
added a commit
that referenced
this issue
Jun 26, 2024
It allows us to get rid of a mut-argument after caller-side cloning.
arhadthedev
added a commit
that referenced
this issue
Jun 26, 2024
arhadthedev
added a commit
that referenced
this issue
Jun 26, 2024
This was referenced Jun 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature
Addition of something new
part: parser
File-to-tree conversion
spec: es2023
Conformance to https://262.ecma-international.org/14.0/
For starters, it can be
Tokenizer
struct right inside the crate.The struct should recognize lexical grammar described in https://262.ecma-international.org/14.0/#sec-ecmascript-language-lexical-grammar.
The struct should provide methods to switch between goal symbols on the fly. The spec section linked above gives the following reasoning for such a feature:
Syntactic grammar contexts is defined by a parser state so it's the parser that needs to switch a current lexical grammar goal symbol to adjust tokenization.
The user entry point is
Tokenizer
object with itsget_next_symbol
method:Grammar parameter implementation
For notation like
Production[Param1, Param2]
add the following comment into the code:Cover grammar
Also add a note on what cover grammar is.
Todo list:
The text was updated successfully, but these errors were encountered: