Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend parsing breakdown #4131

Open
gtrepta opened this issue Mar 25, 2024 · 8 comments
Open

Frontend parsing breakdown #4131

gtrepta opened this issue Mar 25, 2024 · 8 comments
Assignees

Comments

@gtrepta
Copy link
Contributor

gtrepta commented Mar 25, 2024

(WIP)

Kompile calls parseDefinition, which calls out to the DefinitionParsing class that handles all the frontend outer/inner parsing. Its first step is to do outer parsing, things it does here that are noteworthy:

  • A parsing utility method called slurp is used to parse the modules in the definition file, resolve file paths in requires clauses, and recursively slurp those files into the definition.
  • Automatically parses and adds prelude.md to the definition unless --no-prelude was passed
  • Resolves the main/syntax modules, throwing an error if it can't do that
  • Checks for modules with duplicate names and throws an error if they aren't the same module (a module could somehow be parsed in twice, for instance)
  • Checks user list declarations for proper structure
  • Checks bracket productions for proper structure as well
  • Convert user lists into kore representable productions
  • Regex terminals are converted, with special handling of lookahead/lookbehind assertions.
  • Syntax priority/associativity sentences are generated from priority blocks, and production attributes.
  • Temporarily add sort declarations for configuration cells and fragments that are possibly being used as nonterminals in productions.
  • The modules are flat (imports in the module are just import sentences) by this point, they get resolved into kore modules where the imports are modules themselves.
  • Syntax synonyms get resolved (ie. if you have syntax Num = Int, then any productions with Num nonterminals become Int)
  • Trims any modules from the definition that can't be reached through the main module's imports. Also includes some builtin modules that are/were necessary (MAP, K-REFLECTION, STDIN-STREAM, STDOUT-STREAM. K-IO if --coverage is used)

At this point it's time to do inner parsing. This is done in two steps, first on the configuration declarations and then on the rest of the definition.

Briefly, Resolving the configuration bubbles does a few things:

  • Adds the DEFAULT-CONFIGURATION to the main module's imports if it doesn't have it's own configuration declaration
  • Adds the MAP module to the main module's imports as well, as it's necessary for configuration initialization. (An error will be thrown if it can't somehow find the MAP module in the definition!)
  • Parses the configuration bubbles across the entire definition.
  • Adds the parsed configurations to the modules in the definition.
  • Checks for duplicate K cells or usage of reserved cell names.
  • Adds the syntax/rules associated with the parsed configuration to the modules.
@Baltoli
Copy link
Collaborator

Baltoli commented Mar 26, 2024

I don't think I can find any real code that's actually using the sort synonym feature: https://github.com/search?q=org%3Aruntimeverification+%2Fsyntax.*%3D%2F+path%3A*.k+OR+path%3A*.md+NOT+%2Flexical%2F+NOT+%2F%3A%3A%3D%2F&type=code

@dwightguth @ehildenb the feature was added without context here (#791). Any idea whether this is something we need to keep supporting?

@Baltoli
Copy link
Collaborator

Baltoli commented Mar 26, 2024

(also - this is great stuff @gtrepta, thanks for putting it together!)

@Baltoli
Copy link
Collaborator

Baltoli commented Mar 26, 2024

Related: #1278

@Baltoli
Copy link
Collaborator

Baltoli commented Mar 26, 2024

Synonyms were historically used in KEVM to select different implementations of a sort depending on what backend you're on; it's not a lot of code and seems like a useful feature so we should document this and keep the code.

@Baltoli Baltoli assigned Baltoli and gtrepta and unassigned Baltoli Mar 26, 2024
@Baltoli
Copy link
Collaborator

Baltoli commented Mar 26, 2024

There are some subtle differences between how Pyk and the Java frontend handle outer parsing; @tothtamas28 has documented these - please link the relevant PRs here for @gtrepta

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 27, 2024

Just updated the outer parsing section with more details. Notably there are a couple of structural checks and generation of syntax priority/associativity sentences from syntax declarations.

@tothtamas28
Copy link
Contributor

There are some subtle differences between how Pyk and the Java frontend handle outer parsing

Known differences are the following.

Parser

  • UserList is not a ProductionItem but a ProductionLike (i.e. something that can inhabit a PriorityBlock). Reason: UserList is not allowed to appear with other ProductionItem-s.

Lexer

@gtrepta
Copy link
Contributor Author

gtrepta commented Mar 29, 2024

Another difference I've noticed is that the pyk parsed definition does not contain any source location information. I've opened an issue on the pyk repo for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants