-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Don't consume the dot in decimal literals if not followed by digit since 0.5.0 #891
Conversation
See ethereum/solidity#4172. This unblocks parsing member access expressions where the primary expression can't consume the dot in order to parse the `.member` postfix expression.
🦋 Changeset detectedLatest commit: 84fd161 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@@ -3692,6 +3692,21 @@ codegen_language_macros::compile!(Language( | |||
not_followed_by = Fragment(IdentifierStart) | |||
) | |||
), | |||
// Since 0.5.0, only consume a dot if it's followed by a digit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the fix here:
- For input
5
, it will always be matched by the first definition. - For input
5.
, it will be matched by the second definition till version0.5.0
, and rejected afterwards. - For input
5.6
, it will always be matched by the last definition.
Inserting this new definition into the 3rd position just overrides the last one, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to think in terms of accepted lexemes by the scanner here. From 0.5.0 upwards, if we have 5.member()
, the old definitions did not allow us to lex 5
as a standalone literal:
- it's followed by a dot, so 1st definition fails
- it's >=0.5.0, so the second rule is disabled
- it's not just
.[0-9]
, so the old 3rd definition fails - it's not
[0-9].[0-9]
, so it also fails.
This rule matches the logic introduced in the Solidity's lexer for 0.5.0, which is to stop and return a Token::Number if there's a period ahead that's not followed by a digit: https://github.com/ethereum/solidity/blob/cc79c91b93d322cd38beec08a1ce575d3622c7e4/liblangutil/Scanner.cpp#L972. Now, since we have Optional(Sequence([Atom("."),Fragment(DecimalDigits)]))
, we enforce that if there's a dot, it must have a decimal ahead; we must either consume the dot with digits after or nothing, following the first digit.
This allows us to lex 5
and then be able to parse .member()
as the postfix member access operator, since otherwise consuming the dot and lexing 5.
as a decimal literal would've made that impossible.
We can't modify the last rule that works across all versions, because <0.5.0 5.member()
did not parse correctly exactly because the solc lexer consumed the dot, as per above, so we can't enable that behaviour prior to 0.5.0.
Does this make sense? Do you have a better idea how to express that behaviour more clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for explaining!
I think my concern here is that the new definition basically supersedes other definitions. In this case, what do you think of the following:
An integer (without a dot or a fraction) is enabled in all versions
- stays the same
An integer and a dot (without a fraction) is enabled till "0.5.0"
- add the
fraction
here and make it optional
- add the
An integer, a dot, and a fraction is enabled from "0.5.0"
- add this new one (all three are required)
A dot and a fraction (without an integer) is enabled in all versions
- stays the same
An integer, a dot, and a fraction is enabled in all versions
- removed, as it is replaced by the second definition.
This has the additional benefit of keeping the specific scanners mutually exclusive, which is going to allow building a state machine for the scanner. Most of the TokenDefinition
s in the DSL are built for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good in theory and I've tried it, but we still need to be able to optionally skip the (. DECIMAL_DIGITS)
. That is because the first rule has to have a negative lookahead for .
; otherwise we end up accepting 5.member()
as valid in < 0.5.0.
I've pushed a version that is hopefully more acceptable, since a bit more disjoint now. Whenever we move to compiling these down to a state machine, I'm happy to revisit the definitions 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one question above. Thanks!
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and publish to npm yourself or [setup this action to publish automatically](https://github.com/changesets/action#with-publishing). If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @nomicfoundation/slang@0.14.0 ### Minor Changes - [#753](#753) [`b35c763`](b35c763) Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add tree query implementation as `Query::parse` and `Cursor::query` - [#755](#755) [`8c260fc`](8c260fc) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support parsing NatSpec comments - [#908](#908) [`ab3688b`](ab3688b) Thanks [@Xanewok](https://github.com/Xanewok)! - Changed the cst.NodeType in TS to use more descriptive string values rather than 0/1 integers - [#886](#886) [`0125717`](0125717) Thanks [@Xanewok](https://github.com/Xanewok)! - Add `TokenKind::is_trivia` - [#887](#887) [`dff1201`](dff1201) Thanks [@Xanewok](https://github.com/Xanewok)! - Add support for constant function modifier removed in 0.5.0 - [#885](#885) [`a9bd8da`](a9bd8da) Thanks [@Xanewok](https://github.com/Xanewok)! - Flatten the trivia syntax nodes into sibling tokens - [#908](#908) [`ab3688b`](ab3688b) Thanks [@Xanewok](https://github.com/Xanewok)! - Add RuleNode/TokenNode::toJSON() in the TypeScript API ### Patch Changes - [#801](#801) [`ecbba49`](ecbba49) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - unreserve pragma keywords in all versions - [#869](#869) [`951b58d`](951b58d) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support dots in yul identifiers from `0.5.8` till `0.7.0` - [#890](#890) [`1ff8599`](1ff8599) Thanks [@Xanewok](https://github.com/Xanewok)! - Mark `override` as being a valid attribute only after 0.6.0 - [#800](#800) [`d1827ff`](d1827ff) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support unicode characters in string literals up to `0.7.0` - [#797](#797) [`86f36d7`](86f36d7) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix source locations for unicode characters in error reports - [#854](#854) [`4b8970b`](4b8970b) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - parse line breaks without newlines - [#844](#844) [`f62de9e`](f62de9e) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix parsing empty `/**/` comments - [#799](#799) [`303dda9`](303dda9) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - prevent parsing multiple literals under `StringExpression` before `0.5.14` - [#847](#847) [`6b6f260`](6b6f260) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - prioritize parsing `MultiLineComment` over `MultiLineNatSpecComment` - [#796](#796) [`59e1e53`](59e1e53) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `public` and `internal` to `UnnamedFunctionAttribute` till `0.5.0` - [#756](#756) [`e839817`](e839817) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix parsing `payable` primary expressions - [#851](#851) [`67dfde8`](67dfde8) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix selection order of prefix/postfix AST fields - [#857](#857) [`f677d5e`](f677d5e) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - rename `FieldName` to `NodeLabel` - [#852](#852) [`ca79eca`](ca79eca) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - allow parsing `ColonEqual` as two separate tokens before `0.5.5` - [#889](#889) [`ce5050f`](ce5050f) Thanks [@Xanewok](https://github.com/Xanewok)! - Support `delete` as an expression rather than a statement - [#923](#923) [`bb30fc1`](bb30fc1) Thanks [@Xanewok](https://github.com/Xanewok)! - Support arbitrary ASCII escape sequences in string literals until 0.4.25 - [#887](#887) [`dff1201`](dff1201) Thanks [@Xanewok](https://github.com/Xanewok)! - Support view and pure function modifiers only from 0.4.16 - [#800](#800) [`d1827ff`](d1827ff) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - rename `AsciiStringLiteral` to `StringLiteral` - [#838](#838) [`ad98d1c`](ad98d1c) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - upgrade to rust `1.76.0` - [#849](#849) [`5c42e0e`](5c42e0e) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `override` and `virtual` to `ConstructorAttribute` - [#862](#862) [`5e37ea0`](5e37ea0) Thanks [@Xanewok](https://github.com/Xanewok)! - allow call options as a post-fix expression - [#786](#786) [`0bfa6b7`](0bfa6b7) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support Yul label statements before `0.5.0` - [#839](#839) [`2d698eb`](2d698eb) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support string literals in version pragmas - [#891](#891) [`70c9d7d`](70c9d7d) Thanks [@Xanewok](https://github.com/Xanewok)! - Fix parsing `<NUMBER>.member` member access expression - [#842](#842) [`2069126`](2069126) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `private` to `UnnamedFunctionAttribute` till `0.5.0` - [#840](#840) [`7fb0d20`](7fb0d20) Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - allow `var` in `TupleDeconstructionStatement` before `0.5.0` Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
See ethereum/solidity#4172.
This unblocks parsing member access expressions where the primary
expression can't consume the dot in order to parse the
.member
postfixexpression.