Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic after-parse tokenization interface #221

Merged
merged 1 commit into from Mar 17, 2023
Merged

Basic after-parse tokenization interface #221

merged 1 commit into from Mar 17, 2023

Conversation

c42f
Copy link
Member

@c42f c42f commented Mar 16, 2023

Implement a tokenize() function which retreives the tokens after parsing.

Going through the parser isn't hugely more expensive than plain tokenization, and allows us to be more precise and complete.

For example it automatically:

  • Determines when contextual keywords are keywords, vs identifiers. For example, the outer in outer = 1 is an identifier, but a keyword in for outer i = 1:10
  • Validates numeric literals (eg, detecting overflow cases like 10e1000 and flagging as errors)
  • Splits or combines ambiguous tokens. For example, making the ... in import ...A three separate . tokens.

@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Merging #221 (654128f) into main (2720980) will decrease coverage by 0.04%.
The diff coverage is 86.20%.

❗ Current head 654128f differs from pull request most recent head 8b71bbd. Consider uploading reports for the commit 8b71bbd to get more accurate results

@@            Coverage Diff             @@
##             main     #221      +/-   ##
==========================================
- Coverage   96.30%   96.27%   -0.04%     
==========================================
  Files          15       15              
  Lines        3869     3888      +19     
==========================================
+ Hits         3726     3743      +17     
- Misses        143      145       +2     
Impacted Files Coverage Δ
src/JuliaSyntax.jl 100.00% <ø> (ø)
src/tokenize.jl 98.35% <80.00%> (ø)
src/parser_api.jl 89.39% <89.47%> (+0.03%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Implement a `tokenize()` function which retreives the tokens *after*
parsing.

Going through the parser isn't hugely more expensive than plain
tokenization, and allows us to be more precise and complete.

For example it automatically:
* Determines when contextual keywords are keywords, vs identifiers. For
  example, the `outer` in `outer = 1` is an identifier, but a keyword in
  `for outer i = 1:10`
* Validates numeric literals (eg, detecting overflow cases like
  `10e1000` and flagging as errors)
* Splits or combines ambiguous tokens. For example, making the `...` in
  `import ...A` three separate `.` tokens.
@c42f c42f merged commit 36909cd into main Mar 17, 2023
20 of 21 checks passed
@c42f c42f deleted the c42f/tokens-API branch March 17, 2023 10:59
c42f added a commit that referenced this pull request Mar 17, 2023
Implement a `tokenize()` function which retreives the tokens *after*
parsing.

Going through the parser isn't hugely more expensive than plain
tokenization, and allows us to be more precise and complete.

For example it automatically:
* Determines when contextual keywords are keywords, vs identifiers. For
  example, the `outer` in `outer = 1` is an identifier, but a keyword in
  `for outer i = 1:10`
* Validates numeric literals (eg, detecting overflow cases like
  `10e1000` and flagging as errors)
* Splits or combines ambiguous tokens. For example, making the `...` in
  `import ...A` three separate `.` tokens.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant