Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3403: Create and use ANTLR to parse IDL files #1588

Merged
merged 10 commits into from
Sep 11, 2023

Conversation

opwvhk
Copy link
Contributor

@opwvhk opwvhk commented Mar 8, 2022

What is the purpose of the change

Implement AVRO-3403: Migrate JavaCC to ANTLR

The implementation uses a shared ANTLR grammar, without actions or
predicates. This guarantees ANTLR can create usable parsers in any
language it supports. To implement IDL support for other languages, the
Java code to port is the class IdlReader.

The IDL reader using the ANTLR generated parser replaces the JavaCC
parser in the maven plugin and in the tools. The JavaCC parser has been
deprecated, but not removed.

Verifying this change

This change is huge, but consists roughly of these types of changes:

  1. A shared grammar; please verify if ANTLR generates valid code for your favourite language (I tested Java & Python)
  2. A parser in the IDLReader class in a new Java avro-idl module
  3. A lot of tests and some utilities copied from avro-compiler
  4. Using the new IDLReader class instead of the (now deprecated) IDL class

Note that reviews become a lot easier by reviewing per commit.

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added build Java Pull Requests for Java binding labels Mar 8, 2022
@RyanSkraba RyanSkraba self-requested a review March 9, 2022 16:54
@opwvhk opwvhk force-pushed the AVRO-3403-migrate-JavaCC-to-ANTLR branch 2 times, most recently from 5d85cd1 to 36779cc Compare March 22, 2022 10:08
@opwvhk
Copy link
Contributor Author

opwvhk commented Mar 22, 2022

Added a change to the grammar (sorry for the way in which I did...) such that it the generated code can also work on Python.

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review yet the new IdlReader class and few related classes, mostly because it is big and I have zero experience with ANTLR. I will try do to it soon!

Apart from my comments there are also some CodeQL comments in the diff view which might be addressed.

@opwvhk opwvhk force-pushed the AVRO-3403-migrate-JavaCC-to-ANTLR branch from ce7b25f to 11fa739 Compare September 8, 2022 10:16
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 14 potential problems in the proposed changes. Check the Files changed tab for more details.

@opwvhk opwvhk force-pushed the AVRO-3403-migrate-JavaCC-to-ANTLR branch from 0538c89 to 90694ab Compare November 8, 2022 12:41
@lmove
Copy link

lmove commented Nov 18, 2022

Hi! Our tool BreakBot found that this pull request introduces 3 breaking changes and they appear to impact 5 of your clients. You can find the full BreakBot report in our fork repository: report for PR#1588.

We hope this information is valuable to you, and apologize otherwise. If you're willing to help, we would kindly ask for your help to fill in a 5-minutes survey about the report. Your feedback will help us improve the tool and provide a better experience for users in the future!

@opwvhk
Copy link
Contributor Author

opwvhk commented Nov 18, 2022

The BreakBot report marks the @Deprecated annotations as breaking changes.

Set aside that these are not actually breaking changes, they were added on purpose to signal a migration is needed for future compatibility.

@tdegueul
Copy link

You are right. We include the impact of @Deprecated elements as it may give a better idea of which clients will be faced with a deprecated API. Thanks for your input @opwvhk!

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spend too much time (the change list is huge!) but I didn't notice anything bad.

@opwvhk
Copy link
Contributor Author

opwvhk commented Nov 18, 2022

I didn't spend too much time (the change list is huge!) but I didn't notice anything bad.

Ah, yes... the tricky part about replacing the IDL parser is that that's unavoidable.

Thank you for taking the time though!

The subsequent PR's (#1589 & #1954) are significantly smaller 😅

Copy link
Contributor

@clesaec clesaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit complex, but, as you said, unavoidable :-)
Why not put "Idl.g4" file in src/main/resources of Idl module ?
Need a rebase

@opwvhk
Copy link
Contributor Author

opwvhk commented Aug 3, 2023

Why not put "Idl.g4" file in src/main/resources of Idl module ?

Because this is a shared location, which allows the grammar to be reused for e g. a Python implementation.

The reason to migrate to ANTLR is partly because it is supported, but also because it supports multiple languages.

opwvhk and others added 5 commits August 9, 2023 09:58
This commit also copies several utilities from avro-compiler.
The build on Travis compiles files in a different order, causing the
tested new log lines to appear in the 'wrong' order. This removes these
new log lines.
Improved the code thanks to good review comments.
Integer/Long parsing now also allows the hexadecimal and octal syntax
that the grammar allows.
@opwvhk opwvhk force-pushed the AVRO-3403-migrate-JavaCC-to-ANTLR branch from 78ffe6c to b76d39d Compare August 10, 2023 09:19
@opwvhk
Copy link
Contributor Author

opwvhk commented Sep 11, 2023

After resolving the conflicts, I'd like to merge this one and move on to PR #1589 (making the IDL syntax a full .avsc equivalent). As there's 2 approvals other than mine, are there any objections to merge?

@opwvhk opwvhk merged commit cae0cb1 into apache:master Sep 11, 2023
13 checks passed
@opwvhk opwvhk deleted the AVRO-3403-migrate-JavaCC-to-ANTLR branch September 26, 2023 14:38
RanbirK pushed a commit to RanbirK/avro that referenced this pull request May 13, 2024
This large change encompasses a new avro-idl module, a shared ANTLR
grammar to allow implementations in other languages, and uses (copies
of) all existing IDL tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants