Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Don't consume the dot in decimal literals if not followed by digit since 0.5.0 #891

Merged
merged 5 commits into from
Mar 19, 2024

Conversation

Xanewok
Copy link
Contributor

@Xanewok Xanewok commented Mar 9, 2024

See ethereum/solidity#4172.

This unblocks parsing member access expressions where the primary
expression can't consume the dot in order to parse the .member postfix
expression.

See ethereum/solidity#4172.

This unblocks parsing member access expressions where the primary
expression can't consume the dot in order to parse the `.member` postfix
expression.
@Xanewok Xanewok requested a review from a team as a code owner March 9, 2024 16:46
Copy link

changeset-bot bot commented Mar 9, 2024

🦋 Changeset detected

Latest commit: 84fd161

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@nomicfoundation/slang Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@Xanewok Xanewok changed the title fix: After 0.5.0 we can't consume the dot if not followed by digit fix: Don't consume the dot in decimal literals if not followed by digit since 0.5.0 Mar 9, 2024
@@ -3692,6 +3692,21 @@ codegen_language_macros::compile!(Language(
not_followed_by = Fragment(IdentifierStart)
)
),
// Since 0.5.0, only consume a dot if it's followed by a digit:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the fix here:

  • For input 5, it will always be matched by the first definition.
  • For input 5., it will be matched by the second definition till version 0.5.0, and rejected afterwards.
  • For input 5.6, it will always be matched by the last definition.

Inserting this new definition into the 3rd position just overrides the last one, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to think in terms of accepted lexemes by the scanner here. From 0.5.0 upwards, if we have 5.member(), the old definitions did not allow us to lex 5 as a standalone literal:

  • it's followed by a dot, so 1st definition fails
  • it's >=0.5.0, so the second rule is disabled
  • it's not just .[0-9], so the old 3rd definition fails
  • it's not [0-9].[0-9], so it also fails.

This rule matches the logic introduced in the Solidity's lexer for 0.5.0, which is to stop and return a Token::Number if there's a period ahead that's not followed by a digit: https://github.com/ethereum/solidity/blob/cc79c91b93d322cd38beec08a1ce575d3622c7e4/liblangutil/Scanner.cpp#L972. Now, since we have Optional(Sequence([Atom("."),Fragment(DecimalDigits)])), we enforce that if there's a dot, it must have a decimal ahead; we must either consume the dot with digits after or nothing, following the first digit.

This allows us to lex 5 and then be able to parse .member() as the postfix member access operator, since otherwise consuming the dot and lexing 5. as a decimal literal would've made that impossible.

We can't modify the last rule that works across all versions, because <0.5.0 5.member() did not parse correctly exactly because the solc lexer consumed the dot, as per above, so we can't enable that behaviour prior to 0.5.0.

Does this make sense? Do you have a better idea how to express that behaviour more clearly?

Copy link
Collaborator

@OmarTawfik OmarTawfik Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for explaining!
I think my concern here is that the new definition basically supersedes other definitions. In this case, what do you think of the following:

  • An integer (without a dot or a fraction) is enabled in all versions
    • stays the same
  • An integer and a dot (without a fraction) is enabled till "0.5.0"
    • add the fraction here and make it optional
  • An integer, a dot, and a fraction is enabled from "0.5.0"
    • add this new one (all three are required)
  • A dot and a fraction (without an integer) is enabled in all versions
    • stays the same
  • An integer, a dot, and a fraction is enabled in all versions
    • removed, as it is replaced by the second definition.

This has the additional benefit of keeping the specific scanners mutually exclusive, which is going to allow building a state machine for the scanner. Most of the TokenDefinitions in the DSL are built for this reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good in theory and I've tried it, but we still need to be able to optionally skip the (. DECIMAL_DIGITS). That is because the first rule has to have a negative lookahead for .; otherwise we end up accepting 5.member() as valid in < 0.5.0.

I've pushed a version that is hopefully more acceptable, since a bit more disjoint now. Whenever we move to compiling these down to a state machine, I'm happy to revisit the definitions 🤞

Copy link
Collaborator

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one question above. Thanks!

@Xanewok Xanewok enabled auto-merge March 19, 2024 16:20
@Xanewok Xanewok added this pull request to the merge queue Mar 19, 2024
Merged via the queue into NomicFoundation:main with commit 70c9d7d Mar 19, 2024
1 check passed
@Xanewok Xanewok deleted the parse-numeric-methods branch March 19, 2024 16:33
@github-actions github-actions bot mentioned this pull request Mar 14, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 12, 2024
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and publish to npm
yourself or [setup this action to publish
automatically](https://github.com/changesets/action#with-publishing). If
you're not ready to do a release yet, that's fine, whenever you add more
changesets to main, this PR will be updated.


# Releases
## @nomicfoundation/slang@0.14.0

### Minor Changes

- [#753](#753)
[`b35c763`](b35c763)
Thanks [@AntonyBlakey](https://github.com/AntonyBlakey)! - Add tree
query implementation as `Query::parse` and `Cursor::query`

- [#755](#755)
[`8c260fc`](8c260fc)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support parsing
NatSpec comments

- [#908](#908)
[`ab3688b`](ab3688b)
Thanks [@Xanewok](https://github.com/Xanewok)! - Changed the
cst.NodeType in TS to use more descriptive string values rather than 0/1
integers

- [#886](#886)
[`0125717`](0125717)
Thanks [@Xanewok](https://github.com/Xanewok)! - Add
`TokenKind::is_trivia`

- [#887](#887)
[`dff1201`](dff1201)
Thanks [@Xanewok](https://github.com/Xanewok)! - Add support for
constant function modifier removed in 0.5.0

- [#885](#885)
[`a9bd8da`](a9bd8da)
Thanks [@Xanewok](https://github.com/Xanewok)! - Flatten the trivia
syntax nodes into sibling tokens

- [#908](#908)
[`ab3688b`](ab3688b)
Thanks [@Xanewok](https://github.com/Xanewok)! - Add
RuleNode/TokenNode::toJSON() in the TypeScript API

### Patch Changes

- [#801](#801)
[`ecbba49`](ecbba49)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - unreserve pragma
keywords in all versions

- [#869](#869)
[`951b58d`](951b58d)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support dots in
yul identifiers from `0.5.8` till `0.7.0`

- [#890](#890)
[`1ff8599`](1ff8599)
Thanks [@Xanewok](https://github.com/Xanewok)! - Mark `override` as
being a valid attribute only after 0.6.0

- [#800](#800)
[`d1827ff`](d1827ff)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support unicode
characters in string literals up to `0.7.0`

- [#797](#797)
[`86f36d7`](86f36d7)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix source
locations for unicode characters in error reports

- [#854](#854)
[`4b8970b`](4b8970b)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - parse line breaks
without newlines

- [#844](#844)
[`f62de9e`](f62de9e)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix parsing empty
`/**/` comments

- [#799](#799)
[`303dda9`](303dda9)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - prevent parsing
multiple literals under `StringExpression` before `0.5.14`

- [#847](#847)
[`6b6f260`](6b6f260)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - prioritize
parsing `MultiLineComment` over `MultiLineNatSpecComment`

- [#796](#796)
[`59e1e53`](59e1e53)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `public` and
`internal` to `UnnamedFunctionAttribute` till `0.5.0`

- [#756](#756)
[`e839817`](e839817)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix parsing
`payable` primary expressions

- [#851](#851)
[`67dfde8`](67dfde8)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - fix selection
order of prefix/postfix AST fields

- [#857](#857)
[`f677d5e`](f677d5e)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - rename
`FieldName` to `NodeLabel`

- [#852](#852)
[`ca79eca`](ca79eca)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - allow parsing
`ColonEqual` as two separate tokens before `0.5.5`

- [#889](#889)
[`ce5050f`](ce5050f)
Thanks [@Xanewok](https://github.com/Xanewok)! - Support `delete` as an
expression rather than a statement

- [#923](#923)
[`bb30fc1`](bb30fc1)
Thanks [@Xanewok](https://github.com/Xanewok)! - Support arbitrary ASCII
escape sequences in string literals until 0.4.25

- [#887](#887)
[`dff1201`](dff1201)
Thanks [@Xanewok](https://github.com/Xanewok)! - Support view and pure
function modifiers only from 0.4.16

- [#800](#800)
[`d1827ff`](d1827ff)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - rename
`AsciiStringLiteral` to `StringLiteral`

- [#838](#838)
[`ad98d1c`](ad98d1c)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - upgrade to rust
`1.76.0`

- [#849](#849)
[`5c42e0e`](5c42e0e)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `override`
and `virtual` to `ConstructorAttribute`

- [#862](#862)
[`5e37ea0`](5e37ea0)
Thanks [@Xanewok](https://github.com/Xanewok)! - allow call options as a
post-fix expression

- [#786](#786)
[`0bfa6b7`](0bfa6b7)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support Yul label
statements before `0.5.0`

- [#839](#839)
[`2d698eb`](2d698eb)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - support string
literals in version pragmas

- [#891](#891)
[`70c9d7d`](70c9d7d)
Thanks [@Xanewok](https://github.com/Xanewok)! - Fix parsing
`<NUMBER>.member` member access expression

- [#842](#842)
[`2069126`](2069126)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - add `private` to
`UnnamedFunctionAttribute` till `0.5.0`

- [#840](#840)
[`7fb0d20`](7fb0d20)
Thanks [@OmarTawfik](https://github.com/OmarTawfik)! - allow `var` in
`TupleDeconstructionStatement` before `0.5.0`

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants