Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ANTLR 2.x with updated ANTLR 4 #3055

Closed
benfry opened this issue Jan 22, 2015 · 19 comments
Closed

Replace ANTLR 2.x with updated ANTLR 4 #3055

benfry opened this issue Jan 22, 2015 · 19 comments
Labels
enhancement help wanted We have very little time and would like some help

Comments

@benfry
Copy link
Contributor

benfry commented Jan 22, 2015

We're using a very old version of ANTLR. @fjenett did some work to move us to ANTLR 4, but there hasn't been enough testing (or someone committed to finishing that work) to include it in 3.x yet.

See also #3054, which is about updating the grammar (the java.g file) to support newer Java language features.

@benfry benfry added help wanted We have very little time and would like some help enhancement labels Jan 22, 2015
@benfry
Copy link
Contributor Author

benfry commented Jan 22, 2015

Link to Florian's earlier work: https://github.com/fjenett/processing-preprocessor-antlr4

Note that because this is so fundamental to what the PDE does, it needs a lot of testing and a committed maintainer (or at least longer than "hey this is working / incorporate it / bye"). We can't incorporate a "mostly working" set of code.

@JakubValtar
Copy link
Contributor

I just happen to have a working ANTLR4 prototype branch based on Florian's work with complementary Android branch I worked on in August... it was more of a proof of concept work, but tests are green and I can run Java and Android sketches from PDE. Needs update to current HEAD, new error reporting, better tests. API for other modes will need some thought.

https://github.com/JakubValtar/processing/tree/feature-antlr4
https://github.com/JakubValtar/processing-android/tree/feature-antlr4-android

It's a great responsibility and I have to consider if I will be able to do good enough job and maintain it. However, I can at least make it work in current version in the meantime.

@benfry
Copy link
Contributor Author

benfry commented Jan 24, 2015

Cool, thanks for looking into it. Keep us posted on how it evolves, and/or someone may also be able to help build on your work as you've done w/ Florian's.

Here's an excerpt from an email from Florian:

As a question remains how we should handle errors from the parser. ANTLR-4 comes with a nice listener interface that let's one subscribe to the parser. This would make error handling a lot cleaner and easier for us … but would mean that JavaBuild would need to implement the listener interface:
https://github.com/fjenett/processing/blob/antlr4-preproc/app/src/processing/mode/java/preproc/PdePreprocessor.java#L127
The compilation and runtime error handling needs some rethinking too. The way i generate the .java files leaves the headerOffset solution not functioning anymore as all in-between imports are moved to the beginning of the file. One way of solving this could be that i return a mapping between sketch and java lines so we could do a simple reverse lookup. That would make integration with PDE-X easier too i assume.
As a goodie i am using the parser to fish out the arguments from the size() call. Currently it is only printed to console but could easily be returned through the ProcessorResult …
Another one: if an api keyword is used as variable name an error will be generated. For example the "Function" sketch in the examples has "size" as a method parameter.
Empty sketches raise an error too. This is mainly because the parser can not handle empty sketches at the moment.

And some back-and-forth with Florian responding to Manindra about incorporating ANTLR 4 with the PDE X code (which is now being merged into the main mode):

One way of solving this could be that i return a mapping between sketch and java lines so we could do a simple reverse lookup. That would make integration with PDE-X easier too i assume.
This has been a long standing problem with PDE X. I've recently been able to achieve not just line matching, but also precise offset matching (character by character). It isn't 100% accurate, but works 95% of the time.

The ANTLR parser generates a full parse tree including line numbers and char-position-in-line information for each token. To generate Java from this i update the parse tree itself and then call getText() on it. So the updated parse tree does have the information we need to map Java back to the sketch code.

In fact, PDE X does a lot of preprocessing itself. PDE X takes the sketch code, converts it into compilable Java code(so that it can be checked for compilation errors) by making the following changes:

  • Removal of import statements and moving them to the top of the java source file (using regex)
  • Conversion of int(), char(), etc to PApplet.parseInt(), etc. (using regex)
  • Replacing '#' with 0xff for color representation (using regex)
  • Converts all 'color' datatypes to int (using regex)
  • Appends class declaration statement after determining the mode the sketch is in - ACTIVE or STATIC
  • Convert all decimal literals (4.5, 3.14, etc) by appending a f. (4.5f, 3.14f, etc) (using Eclipse JDT ASTVisitor api)
  • Add 'public' access specifier to all methods which don't have an access specifier. (using Eclipse JDT ASTVisitor api)
    Let me know if any of this could be of use to you.

My preprocessor does all these through a parse tree visitor. Let me finish adding it to the old Java mode and then we can have a look at how useful it is to the PDE-X. From my point of view it is much cleaner than the regex based parsing and having only one preprocessor in the PDE would be favourable too. Ideally we find a way to produce the JDT-AST directly from the ANTLR parse tree without having to re-parse the generated Java code. Or?

@JakubValtar
Copy link
Contributor

Yeah, that's also a possibility to update the code and hand it to someone else.

I think I already fixed some of the issues... I have to revisit my branch and make a list, it's been four months since I touched it. Also, it would be great to talk about the whole system later, what is PDE X doing and what should this preprocessor exactly do and what not.

@JakubValtar
Copy link
Contributor

It's code time! https://github.com/JakubValtar/processing/tree/feature-antlr-4.5

Merged with current master. Ant build works, tests are green, size() is parsed into sketchXxxx() methods, you can run sketches from PDE. No error reporting yet.

Edit: just found Florian's other repository. Will update my branch with improvements he made.

@benfry
Copy link
Contributor Author

benfry commented Feb 21, 2015

Cool, thanks for the update.

@benfry
Copy link
Contributor Author

benfry commented Mar 30, 2015

@JakubValtar what's your status on this?

@JakubValtar
Copy link
Contributor

@benfry Sorry about lack of updates, I was really busy recently. I'm finishing multiple projects this week, will have time to work on this next week.

@benfry
Copy link
Contributor Author

benfry commented Mar 30, 2015

Thanks; let me know how it goes.

@JakubValtar
Copy link
Contributor

@benfry Hi, several things got delayed and sadly I'm not able to work on this one before my graduation mid-June. It would be probably best to tackle this during the summer.

@clankill3r
Copy link

@JakubValtar Did you happen to find time yet?

@sampottinger
Copy link

Hey there! Just to keep threads updated... I started continuing @JakubValtar's work over at sampottinger#15 but, in addition to fast forwarding to current master, it uses #5753 as its base. I have a build working over there again though there are some specific errors that still need to be handled. I'll redirect sampottinger#15 from my fork to this repo after #5753.

@JakubValtar
Copy link
Contributor

Hey @sampottinger, I think the latest plan was to get rid of ANTLR altogether and use Eclipse JDT instead, because JDT keeps up with Java updates and it is easier to build custom features on top of it, whereas messing with ANTLR grammar definitions is something of a lost art.

JDT is already powering the live error checking and advanced features like Rename, Go to definition and Find all references and under all of that is JDT based PreprocessingService.java. Most of the building blocks needed are already there.

The problem now is JDT and ANTLR work in a different way and also report errors in a different way. You will have to figure out a way to fit the JDT based preprocessor into an ANTLR shaped hole.

@clankill3r
Copy link

clankill3r commented Mar 28, 2019 via email

@sampottinger
Copy link

sampottinger commented Mar 29, 2019

Hello @JakubValtar and @clankill3r! Thanks for your notes back. 😄


Inline Responses

I think the latest plan was to get rid of ANTLR altogether and use Eclipse JDT instead ... messing with ANTLR grammar definitions is something of a lost art.

I love it ha ha! This would simplify things quite a bit. Is there a place where that was discussed or was this started at some point?


under all of that is JDT based PreprocessingService.java. Most of the building blocks needed are already there.

Hum I could maybe use your help in understanding this... I see we do edits via SourceUtils prior to compilation presumably since the JDT only allows typical Java grammar. However, it looks like SourceUtils only supports regex. Is that true? If so, aren't some of the processing language features' preprocessing inherently context free? For example, I thought that was why PreprocessingService still requires ANTLR to find the sketchMode. Not sure if this is fair but is it possible that ANTLR-based pre-processing might be more reliable at least for some features?


The problem now is JDT and ANTLR work in a different way and also report errors in a different way.

Yeah I agree there might be some re-plumbing to do. I am curious though, while not necessarily justifying the retention of ANTLR, wouldn't the ANTLR message be preferred in some situations? I've noticed the JDT errors are sometimes poorly attributed in the case of grammatically incorrect input like below.

void setup() {
}

void draw() {
}

class Test {
  
  public int getInt(x) {
  }
  
}

I see multiple lines (some correct) flagged in JDT but ANTLR identifies the localized issue. (On my branch the user gets Syntax error. Hint: Issue with parameter near 'x)'? whereas on processing master using JDT one gets the right line with Error on variableDeclarator but it also incorrectly highlights a closing curly with Syntax error on '}' delete this).


Questions for the community
I would love to simplify the build pipeline. That said, my quick hot take:

  • The change to remove ANTLR might cut a bit deeper than upgrading it. Certainly we still should get rid of ANTLR if possible but I wonder if it make sense to complete Move to ANTLR 4 with Java 11 language features and localization of more syntax errors. sampottinger/processing#15 and then remove ANTLR in a subsequent PR? That might also get the features out into the public a bit faster.
  • Is ANTLR still better at catching and describing grammatical issues (in Processing context) compared to the JDT because the later can only evaluate in the context of Java grammar?
  • Just real fast... what about non-Java Processing users (python, ruby, etc?. Don't they require ANTLR as well? I suppose either way they get stuck with some work though in moving to 4.

Either way, thank you both very much! We've been missing some pretty (ahem) nice features for a few years now and I'm excited to hopefully get one option (ANTLR + JDT) or the other (JDT no ANTLR) in.

@JakubValtar
Copy link
Contributor

@clankill3r

So the first step would be to get rid of all ANTLR stuff?

Not sure I understand. We can't just remove it, because without a preprocessor you wouldn't be able to run your sketch. It needs to be replaced by a preprocessor with equal functionality.

Also what influence has this on the primitive color type?

JDT is aware of color type and handles it correctly. ANTLR 4 in this branch should handle it correctly as well.

@sampottinger You should talk with @benfry before you start working on this. Here is what I remember:

Reasons for removing ANTLR:

  • Both live error checker and build would run very similar code and give the same errors, which means we don't need to maintain two different preprocessors and classes to simplify the error messages. ANTLR 4 has different error messages than ANTLR we have now and will probably need a new set of simplifier rules.
  • Compared to ANTLR, it would take very little effort to update to a new Java version. If we update JDT jars the PreprocessingService and PDEX already work with Java 11. The only problem now is that you can't run the code, because it does not pass through ANTLR (expects Java 5, or Java 7 in this branch).
  • The grammar in this branch is for Java 7, it still needs to be updated. The latest I've seen was for Java 9, and it is not clear when there will be one for newer Java. JDT is probably going to be updated as long as Eclipse will exist. I'm not sure how active ANTLR community is, but there is a risk that there won't be a grammar for the new Java version.

Reasons for keeping ANTLR:

  • The ANTLR 4 grammar seems quite nice, much cleaner than ANTLR 2 grammar we have now. Sometimes it gives nicer error messages, It can also detect the sketch mode, do some rewriting and extract info during parsing. With JDT we have to do some preprocessing with manual parsing and regex, but most of this already works and is maintainable.

I see we do edits via SourceUtils prior to compilation presumably since the JDT only allows typical Java grammar.

That is correct, JDT can only parse Java. The idea is to first perform only the necessary edits to make the code parsable by JDT (move imports to the top, replace type constructors and hex literals, detect mode and wrap the sketch into a class). After that JDT can parse the code into AST and by walking the tree we can do more advanced changes (replace color type by int, turn 1.6 into 1.6f, add public to methods so you can write void setup() instead of public void setup()).

The build has some extra steps which need to happen, first three can be implemented by traversing AST, other two are trivial:

  • check if there is settings()
  • if not, extract certain calls from setup() and move them to generated settings()
  • parse size() parameters and check that they are constants
  • generate main()
  • add noLoop() after the end of setup() in static mode

However, it looks like SourceUtils only supports regex. Is that true?

SourceUtils are a set of methods for common preprocessing tasks. They use whatever is necessary to get the job done, it can be regex, AST visitor, or manual parsing. Most of them return edits for TextTransform, which keeps track of changes and gives you OffsetMapper to map between the Processing code and the Java code. Accurate mapping is needed for PDEX.

If so, aren't some of the processing language features' preprocessing inherently context free?

I'm not sure right now whether context-sensitive or context-free, but I don't think you can do all preprocessing with regex.

For example, I thought that was why PreprocessingService still requires ANTLR to find the sketchMode.

PreprocessingService is designed to be standalone and ready for new Java versions. The mode is detected manually before running ANTLR or JDT. If the top scope contains a method then it's active mode, if it contains a class then it's Java mode, otherwise it's static mode. The code which does that is pretty much standalone and can be moved to SourceUtils later (and it will be possible to simplify it by passing in code with scrubbed comments and literals).

Not sure if this is fair but is it possible that ANTLR-based pre-processing might be more reliable at least for some features? I am curious though, while not necessarily justifying the retention of ANTLR, wouldn't the ANTLR message be preferred in some situations?

It's possible that ANTLR can sometimes give a better error, but I think JDT is in a good spot – it received most of the work in the recent years. It can be further improved by better error message rewriting (the example you posted is not rewritten properly).

The change to remove ANTLR might cut a bit deeper than upgrading it. Certainly we still should get rid of ANTLR if possible but I wonder if it make sense to complete sampottinger#15 and then remove ANTLR in a subsequent PR? That might also get the features out into the public a bit faster.

It would enable most of the syntax changes up to Java 8, except lambdas. That's almost all the changes, but I'm still not sure if that's worth it. Build process is a central component which everyone has to use, and if something breaks there, you have a big problem. I wouldn't recommend doing any unnecessary changes there. We should probably run beta releases until the new build system stabilizes.

Is ANTLR still better at catching and describing grammatical issues (in Processing context) compared to the JDT because the later can only evaluate in the context of Java grammar?

No, the mapping is almost 1:1 and there are no errors introduced or lost. One thing which is still missing is mapping the error message contents back to Processing code (when color or type constructors are involved), so e.g. for problem with color type, it says int in the error message, for int() the error will say parseInt(). But it is fixable with some code, the mapping info is there, it was just not a priority yet. The positions of the errors are mapped though, so you still get the red squiggle under the problematic token.

Just real fast... what about non-Java Processing users (python, ruby, etc?. Don't they require ANTLR as well? I suppose either way they get stuck with some work though in moving to 4.

Android mode shares some of the infrastructure with Java mode, but other modes use their own build process and error checking.

Uf, that's a lot of text, but it's also a complicated issue. I hope it's useful! If you need some more info, feel free to ask me, I will try to get back to you as my time allows.

@sampottinger
Copy link

sampottinger commented Mar 31, 2019

Hey @JakubValtar! Thanks for writing back. Just to clarify for those following along, sampottinger#15 is a working (but still WIP) branch that supports Java 8 language features within the Processing IDE (including lambdas) that integrates with the JDT to produce error message properly in the editor. It is on my branch though because it is branched from changes waiting I have waiting for review in my other PR at #5753 (support for Java 11 / OpenJDK / OpenJFX).


Anyway... @JakubValtar, thanks so much for hanging with me here. Again, apologies if I'm missing some obvious stuff. Just come clarifying questions though...

Uf, that's a lot of text, but it's also a complicated issue. I hope it's useful! If you need some more info, feel free to ask me, I will try to get back to you as my time allows.

Thank you. I really really appreciate it.

It would enable most of the syntax changes up to Java 8, except lambdas.

Why not lambdas? sampottinger#15 does change the grammar to support it in ANTLR. Is there an issue elsewhere?

Build process is a central component which everyone has to use, and if something breaks there, you have a big problem.

I agree but the build system right now uses ANTLR and the JDT is just for the editor right? Wouldn't the removal of ANTLR to use SourceUtils in JavaBuild instead be a larger change than updating the ANTLR preprocessor already in place?

It's possible that ANTLR can sometimes give a better error, but I think JDT is in a good spot – it received most of the work in the recent years.

I agree that the JDT is pretty active. However, isn't ANTLR also under active development? Were you talking about the grammar here specifically or ANTLR in general?

No, the mapping is almost 1:1 and there are no errors introduced or lost.

On official master processing, do you not get incorrect lines highlighted for the example I posted?

Screen Shot 2019-03-30 at 6 07 25 PM

it can be regex, AST visitor, or manual parsing

I guess my thought here is that some of the features probably require a formalized grammar so are we stuck with ANTLR or ANTLR adjacent anyway?

Most of them return edits for TextTransform, which keeps track of changes and gives you OffsetMapper to map between the Processing code and the Java code. Accurate mapping is needed for PDEX.

Just curious, what makes these systems incompatible? In sampottinger#15, I actually removed use of SourceUtils and had the JDT use the output of the preprocessor but I modified PdeParseTreeListener to emit Edits.

The code which does that is pretty much standalone and can be moved to SourceUtils later (and it will be possible to simplify it by passing in code with scrubbed comments and literals).

I can take another crack at it but wouldn't this be easier with preprocessing generating the full AST?

ellipse(50, 50, 50, 50);
/**
 I pass in these like int ellipse(int x, int y, int width, int height)
 */
ellipse(10, 10, 50, 50);

@sampottinger
Copy link

sampottinger commented Apr 2, 2019

Hello there!

I know there's still some conversation ongoing and I respect if we decide to go another direction but, given that sometimes things schedule-wise get in the way of contributing code back, I went ahead and finished off sampottinger#15 just to provide a complete PR if we decide to take that path forward. Hopefully always good to have a PR ready to go? See sampottinger#15 for more details but it unifies the preprocessing mechanisms used in JavaBuild and PreprocessingService, adds additional tests, adds error rewriting for more types of syntax errors, and introduces localization to those syntax error rewrites (localization for ANTLR errors more specifically). This is of course after getting it caught up to master / #5753 and some edits in the grammar itself. As shown below, it supports the new language features like generic type inference and lambdas but also demonstrates having ANTLR emit edits that are interpretable by the JDT. I also provide English and Spanish localizations alongside some more limited support for the other languages.

If you are interested in trying it out, it is dependent on #5753 but you can preview what everything looks like (after sampottinger#15 and a few other PRs I have in flight) by going to my fork's master branch. Let me know what you think: https://github.com/sampottinger/processing!

Thanks so much for the conversation and help so far. I'll try to keep an eye on this thread for further comments. Pending discussion here, I'll can also try looking into the other direction (removing ANTLR and expanding use of SourceUtils). I also don't think any of those edits in sampottinger#15 precludes later steps. Still, really happy to take review on the current proposal! I think ANTLR could be viable especially in short term but also respect other opinions.

[Edit updated picture] 😛

Java 8 features along with example error message in Spanish processing IDE Second java error example

sampottinger added a commit to sampottinger/processing4 that referenced this issue Oct 6, 2019
Introduces ANTLR4 and Java 8 language feature support within IDE while also adding additional hooks for localization of syntax error messages, addressing processing/processing#3054 and processing/processing#3055.

The PR is broadly a continuation of processing/processing#3055, bringing it up to speed with the latest Processing master plus the changes introduced at processing/processing#5753. **Requires processing/processing#5753 as pre-requisite.** This introduces a number of edits beyond processing/processing#3055 beyond compatibility with current Processing master and processing/processing#5753 including:

 - Update to the grammar itself
 - Change ANTLR listeners to emit `TextTransform.Edit` to unify JDT-based `PreprocessingService` and `JavaBuild`, removing code with duplicate purpose.
 - Introduction of syntax error rewriting with support for localization.
 - Addition of complete localized strings set for English and Spanish.
 - Addition of partial localized strings set for other languages.
 - Refactor of ANTLR-related code for testability and readability
 - Expansion of tests including full parse tests for new Java features (type inference, lambdas).
sampottinger added a commit to sampottinger/processing4 that referenced this issue Oct 6, 2019
* Move to ANTLR 4 with Java 11 lang features and localization.

Introduces ANTLR4 and Java 8 language feature support within IDE while also adding additional hooks for localization of syntax error messages, addressing processing/processing#3054 and processing/processing#3055.

The PR is broadly a continuation of processing/processing#3055, bringing it up to speed with the latest Processing master plus the changes introduced at processing/processing#5753. **Requires processing/processing#5753 as pre-requisite.** This introduces a number of edits beyond processing/processing#3055 beyond compatibility with current Processing master and processing/processing#5753 including:

 - Update to the grammar itself
 - Change ANTLR listeners to emit `TextTransform.Edit` to unify JDT-based `PreprocessingService` and `JavaBuild`, removing code with duplicate purpose.
 - Introduction of syntax error rewriting with support for localization.
 - Addition of complete localized strings set for English and Spanish.
 - Addition of partial localized strings set for other languages.
 - Refactor of ANTLR-related code for testability and readability
 - Expansion of tests including full parse tests for new Java features (type inference, lambdas).

* Ask travis for ant upgrade prior to run.

* Ask Travis for java11 update.

* Add openjdk ppa

* Travis no confirmation on add ppa.

* Force newer ant on travis.

* Swtich ant download to www-us mirror.

* Switch ant to 1.10.7

* Start x for unit tests in travis.

* More complete start x in travis.

* Revert x in travis.

* Try x in services.
@benfry
Copy link
Contributor Author

benfry commented Jan 17, 2020

Fixed with benfry/processing4#1, further development is at https://github.com/processing/processing4

@benfry benfry closed this as completed Jan 17, 2020
@processing processing locked as resolved and limited conversation to collaborators Jan 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement help wanted We have very little time and would like some help
Projects
None yet
Development

No branches or pull requests

4 participants