Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ANTLR3 produce Reproducible output #209

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hboutemy
Copy link

@hboutemy hboutemy commented Aug 8, 2022

as found while rebuilding projects using ANTLR3 (including ANTLR3 itself), there are non-reproducible outputs at 2 levels:

├── org/antlr/grammar/v3/ANTLRv3Parser.java
│ @@ -1,8 +1,8 @@
│ -// $ANTLR 3.5.2 org/antlr/grammar/v3/ANTLRv3.g 2022-08-08 12:16:39
│ +// $ANTLR 3.5.2 org/antlr/grammar/v3/ANTLRv3.g 2022-08-08 12:17:56
│
│      package org.antlr.grammar.v3;
│
│
│  import org.antlr.runtime.*;
│  import java.util.Stack;
│  import java.util.List;
│ @@ -427,15 +427,15 @@
│  				cnt7++;
│  			}
│
│  			EOF12=(Token)match(input,EOF,FOLLOW_EOF_in_grammarDef489); if (state.failed) return retval;
│  			if ( state.backtracking==0 ) stream_EOF.add(EOF12);
│
│  			// AST REWRITE
│ -			// elements: DOC_COMMENT, attrScope, optionsSpec, rule, tokensSpec, action, id
│ +			// elements: action, tokensSpec, attrScope, rule, optionsSpec, DOC_COMMENT, id
│  			// token labels:
│  			// rule labels: retval
│  			// token list labels:
│  			// rule list labels:
│  			// wildcard labels:
│  			if ( state.backtracking==0 ) {
│  			retval.tree = root_0;

this PR fixes the 2 issues in 2 separate commits:

  • removes the timestamp from generated sources
  • ensure order or rewritten AST by sorting

Signed-off-by: Hervé Boutemy <hboutemy@apache.org>
Signed-off-by: Hervé Boutemy <hboutemy@apache.org>
@parrt
Copy link
Member

parrt commented Aug 26, 2022

Is it removing the timestamp going to break any codes on it? I agree it was a dumb idea but I'm afraid to change it now. haha

@hboutemy
Copy link
Author

perhaps adding an option for users to choose if they want this timestamp or not is a better choice (like JAXB that provides -no-header option for such case)

@parrt
Copy link
Member

parrt commented Sep 16, 2022

I'm generally opposed to options, I'm afraid. In this case it's a fairly heavy change just to get rid of this date output, which in retrospect was definitely a mistake on my part. I agree that the should be reproducible but I'm not sure risking backward compatibility is worth it.

I do know that some companies simply remove that line using their build tools. Is this possible with maven?

@hboutemy
Copy link
Author

hboutemy commented Oct 8, 2022

yes, we do it with maven-replacer-plugin
I just added the workaround for ANTLR4 antlr/antlr4@270f13b

@hboutemy
Copy link
Author

hboutemy commented Oct 8, 2022

one question: there are 2 sources of non-reproducible bit

  1. the timestamp; that we can postprocess with maven-replacer-plugin
  2. the AST rewrite elements, like
			// AST REWRITE
-			// elements: stringArg, multiAttrFunction
+			// elements: multiAttrFunction, stringArg

fixed by the sorting in the commit on file tool/src/main/antlr3/org/antlr/grammar/v3/CodeGenTreeWalker.g

is it possible to fix the reproducibility issue for the elements, please? This would reduce the places where we need to postprocess

@greedy
Copy link

greedy commented Oct 24, 2022

Instead of totally dropping the timestamp, antlr could be updated to support SOURCE_DATE_EPOCH for the build-time-stamp. This should leave any existing use-cases for the build-stamp undisturbed but make it easy for users to opt-in to a deterministic mode using this standard.

I have also observed non-determinism in the order of methods in the "Delegated rules" section of the generated parser.

Other hazards from looking through the templates and the code:

  • throwsSpec is a HashSet<String>
  • getAllImportedRules uses a HashSet<Rule>
  • altToRuleRefMap and altToTokenRefMap are populated with HashMap<String, _> which are used to populate the result of getAllRuleRefsInAltsWithRewrites which is then used in the AST template

My audit is far from complete though.

Things to ponder: One could replace all HashSets and HashMaps by their Linked versions but that seems like a rather heavy hammer. It would also be nice if the collections that needed to have deterministic iteration had a distinguishing static type to avoid re-introducing problems.

@hboutemy
Copy link
Author

hboutemy commented Nov 6, 2022

@parrt elements order commit extracted to a separate PR: see #213
is it possible to merge this one, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants