Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on the ANTLR grammer #6173

Open
Beakerboy opened this issue Oct 31, 2023 · 8 comments
Open

Feedback on the ANTLR grammer #6173

Beakerboy opened this issue Oct 31, 2023 · 8 comments
Labels
support Whether you're using Rubberduck or you've forked it and have questions, never hesitate to ask!

Comments

@Beakerboy
Copy link

Beakerboy commented Oct 31, 2023

I’m using ANTLR to parse VBA files and see Rubberduck cited on the most recent version of the grammer file.

When exporting a Form object from Excel VBA, the header looks like this:

VERSION 5.00
Begin {C62A69F0-16DC-11CE-9E98-00AA00574A4F} Login 
   Caption         =   "Please Log In"
   ClientHeight    =   1920
   ClientLeft      =   120
   ClientTop       =   465
   ClientWidth     =   2295
   OleObjectBlob   =   "Login.frx":0000
   StartUpPosition =   1  'CenterOwner
End

This file fails to parse on three places:

  • The current grammer file mandates a CLASS after the version number
  • There cannot be anything after the BEGIN
  • Variables have to be assigned a literal, so literal COLON literal is not allowed

I have an updated grammer file that I was hoping to get some feedback on. Let me know if there is a reason the main file should not be updated with my changes.

I’ve:

  • made CLASS optional
  • Allowed for an optional GUID followed by an identifier
  • Allowed for an optional COLON followed by a literal

i Imagine the “real” changes will need to be more nuanced than I’ve made it. For example

  • should the grammer maybe be ’5.00’ | (DOUBLELITERAL WS CLASS)
  • Should STRINGLITERAL COLON DIGIT DIGIT DIGIT DIGIT be considered a separate literal type?
@Beakerboy Beakerboy added the support Whether you're using Rubberduck or you've forked it and have questions, never hesitate to ask! label Oct 31, 2023
@Vogel612
Copy link
Member

We're doing some preprocessing steps and I don't remember off the top of my head whether we actually parse exported Form components. There is basically multiple different coexisting gestalts of VBA code, not all of which we target (i.e. we have a separate grammar for dealing with preprocessor directives and we parse both exported as well as 'code pane' code in Rubberduck).

@MDoerner probably has smarter things to say about this than me :)

@MDoerner
Copy link
Contributor

MDoerner commented Nov 1, 2023

We actually only parse the form headers in order not to fail on them. Afterwards, we ignore that part of the parse tree.

The current grammar's rules around forms are purely based on examples I have seen as there does not seem to be a specification anywhere.

If you want, you can open a PR for the changes. However, if this too much hassle for you, I could also incorporate the changes and see whether our tests still pass.

@Beakerboy
Copy link
Author

if this too much hassle for you

Not at all. I was planning on doing that, but thought I’d start the conversation here beforehand. There is a separate Visual Basic 6 grammar that seems to have the changes I was planning on making. Is that to be expected?

@MDoerner
Copy link
Contributor

MDoerner commented Nov 1, 2023

Oh, now I realize that you referred to the VBA grammar on the Antlr repo. I think we provided that at some point. However, our grammar at https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Grammar/VBAParser.g4 has evolved since.

Two bugs in there were actually only fixed last month.

@Beakerboy
Copy link
Author

Nice! I didn’t notice a separate grammar within this repository. Should the latest copy be sent over to the ANTLR group? I noticed there were some recent changes over there to fix bugs with line continuation and line numbering.

@MDoerner
Copy link
Contributor

MDoerner commented Nov 1, 2023

By the way, our grammar assumes that precompiler directives are already taken care of. To do that we first parse with our precompiler parser https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Preprocessing/VBAConditionalCompilationParser.g4, evaluate which code is dead and then set the corresponding tokens to hidden before reusing the token stream for the actual parse.

@MDoerner
Copy link
Contributor

MDoerner commented Nov 1, 2023

Hm, one could see whether one can merge enhancements in.

@Beakerboy
Copy link
Author

I’m just dealing with Linting raw VBA code that is exported from Excel without Rubberduck, so I don’t think the precompiler directives will come into play.

I just published a github action to lint VBA code as part of a CI / CD process. I also created a github organization called VBA-actions, so if any of you have any actions you would like to publish separately from the Rubberduck project, let me know and I can add you to the org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Whether you're using Rubberduck or you've forked it and have questions, never hesitate to ask!
Projects
None yet
Development

No branches or pull requests

3 participants