Skip to content

Changing Alex/Happy parser to a megaparsec-based one#401

Merged
rodrigogribeiro merged 20 commits into
mainfrom
megaparsec-parser
Jun 1, 2026
Merged

Changing Alex/Happy parser to a megaparsec-based one#401
rodrigogribeiro merged 20 commits into
mainfrom
megaparsec-parser

Conversation

@rodrigogribeiro

Copy link
Copy Markdown
Collaborator
  • Added test cases for the parser
  • Changing the syntax of ternary expression to usual C syntax, since megaparsec allows backtracking parsers.

@mbenke

mbenke commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Forgotten semicolon in a class declaration leads to slightly misleading error message:

forall t.class t:Catenable {
  function cat(x:t) -> memory(bytes) 
}

<input>:7:3:
  |
7 |   function cat(x:t) -> memory(bytes) 
  |   ^
unexpected 'f'
expecting '}'

@mbenke

mbenke commented May 25, 2026

Copy link
Copy Markdown
Collaborator

This parses in a weird way (it's a parse error on main):

contract Pars {
function main() -> (){  let f:word; 42(); }
}

yields

$ esolc tmp/pars.solc  -g -s --dump-ast
tmp/pars.solc
> AST after name resolution for /home/ben/work/review/tmp/pars.solc
contract Pars {
   constructor () {
   }
   function main () -> () {
      let f : word ;
      42;
      ();
   }
}

@rodrigogribeiro

Copy link
Copy Markdown
Collaborator Author

contract Pars {
function main() -> (){ let f:word; 42(); }
}

It's weird because the same code piece returned:

contract Pars {
   function main () -> () {
      let f : word ;
      42;
      ();
   }
}

@rodrigogribeiro

Copy link
Copy Markdown
Collaborator Author

Forgotten semicolon in a class declaration leads to slightly misleading error message:

forall t.class t:Catenable {
  function cat(x:t) -> memory(bytes) 
}

<input>:7:3:
  |
7 |   function cat(x:t) -> memory(bytes) 
  |   ^
unexpected 'f'
expecting '}'

I've fixed it by improving the error message locally for signatures in class definitions. Now, it returns:

<input>:3:1:
  |
3 | }
  | ^
unexpected '}'
expecting "->" or ';' after function signature

@mbenke

mbenke commented May 29, 2026

Copy link
Copy Markdown
Collaborator

I think I understand the weird 42() behaviour now:

exprOrAssignP = do
  lhs <- expP
  choice
    [ do rhs <- equalsP *> expP; _ <- semicolon; return (Assign lhs rhs),
      do rhs <- symbol "+=" *> expP; _ <- semicolon; return (StmtPlusEq lhs rhs),
      do rhs <- symbol "-=" *> expP; _ <- semicolon; return (StmtMinusEq lhs rhs),
      StmtExp lhs <$ optional semicolon
    ]

The semicolon is optional so 42() gets parsed the same as 42; ();
This is probably done to parse Rust-like f(x:word) -> word { x+1 } right?

Perhaps we should change the syntax to f(x:word) -> word = x+1?

"lam",
"type",
"pragma"
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to read if they are sorted alphabetically

Comment on lines +44 to +49
[ do
_ <- symbol "."
entries <- braces (itemEntryP `sepBy` comma)
hids <- option [] hidingP
_ <- semicolon
return (ImportOnly path (SelectItems entries hids)),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be abstracted out and named to avoid repetitions and improve readability. Ditto for subsequent 'do's in importP

Comment thread src/Solcore/Frontend/Parser/Decl.hs Outdated
_ <- symbol "@"
lib <- identifier
_ <- char '.'
sc

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? doesn't lexeme, hence also identifier consume space?

Comment thread src/Solcore/Frontend/Parser/Decl.hs Outdated
Comment thread src/Solcore/Frontend/Parser/Decl.hs Outdated
Comment thread src/Solcore/Frontend/Parser/Decl.hs
Comment thread src/Solcore/Frontend/Parser/Decl.hs
Comment thread src/Solcore/Frontend/Parser/Decl.hs Outdated
Comment thread src/Solcore/Frontend/Parser/SolcoreTypes.hs Outdated
Comment thread src/Solcore/Frontend/Parser/SolcoreTypes.hs Outdated
Comment thread src/Solcore/Frontend/Parser/SolcoreTypes.hs Outdated
Comment thread src/Solcore/Frontend/Parser/SolcoreTypes.hs Outdated

@mbenke mbenke left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, the suggestions are mostly code golf, feel free to ignore the ones you do not like and keep the ones you do.

The 42() corner case is probably not worth the effort now, but worth keeping in mind when we redesign the syntax.

Once #415 is merged, it will need to be reflected in the new parser.

Comment thread src/Solcore/Frontend/Parser/Stmt.hs Outdated
Comment thread src/Solcore/Frontend/Parser/Stmt.hs Outdated
Comment thread src/Solcore/Frontend/Parser/Stmt.hs Outdated
@rodrigogribeiro

Copy link
Copy Markdown
Collaborator Author

The semicolon is optional so 42() gets parsed the same as 42; (); This is probably done to parse Rust-like f(x:word) -> word { x+1 } right?

Exactly!

Perhaps we should change the syntax to f(x:word) -> word = x+1?

We could discuss when deciding the language syntax.

rodrigogribeiro and others added 16 commits June 1, 2026 06:21
merging

update cabal

Fixing the .gitignore

removing unnecessary comments
* Improving error messages in class def parsing
* Adding test cases.
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
Co-authored-by: Marcin Benke <marcin.benke@argot.org>
@rodrigogribeiro rodrigogribeiro merged commit 782bc13 into main Jun 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants