Parsing recovery #624

arifogel · 2017-10-30T05:33:25Z

Fixes Better recovery for parsers and lexers #610
For line-based configs, invalid lines are thrown out and recorded as unrecognized lines.
Failure should not be possible for such grammars except in severe pathological circumstances
Recovery can be disabled altogether with -disableunrecognized flag, yielding legacy-ish behavior
Recovery can trigger graceful failure with -urf false flag, which has no effect when -disableunrecognized is set.
Currently should work for following grammars:
- Cisco
- FlatJuniper
- FlatVyos
- Iptables
- EosRoutingTable
- IosRoutingTable
- NxosRoutingTable

This change is

dhalperi · 2017-10-30T17:45:23Z

Pretty tough to review as-is.

Expect to see unit tests of the code in various new classes involved in recovery (not end-to-end, unit tests of the functions themselves).

Reviewed 41 of 64 files at r1.
Review status: 41 of 64 files reviewed at latest revision, 27 unresolved discussions.

projects/pom.xml, line 200 at r1 (raw file):

              <goals><goal>add-test-source</goal></goals>
              <configuration>
                <sources><source>${project.build.directory}/generated-test-sources/antlr4</source> </sources>

Drop space?

projects/pom.xml, line 200 at r1 (raw file):

              <goals><goal>add-test-source</goal></goals>
              <configuration>
                <sources><source>${project.build.directory}/generated-test-sources/antlr4</source> </sources>

We should version the plugin in the top-level module, but only add antlr4 test sources in the module(s) that have them. just batfish?

projects/suppressions.xml, line 7 at r1 (raw file):

    "http://www.puppycrawl.com/dtds/suppressions_1_0.dtd">

<suppressions>

Explicitly listing the main/src and test/src seems a cleaner/simpler solution. See, e.g., https://github.com/spotify/dockerfile-maven/pull/60/files#diff-600376dffeb79835ede4a0b285078036

projects/batfish/pom.xml, line 113 at r1 (raw file):

          <!-- Test grammars -->
          <execution>
            <id>antlr4-test-simple</id>

Add <phase>generate-test-sources</phase>, I think link. That way the test sources will not be generated unless tests are to be run.

projects/batfish-common-protocol/src/main/java/org/batfish/common/CompositeBatfishException.java, line 58 at r1 (raw file):

  }

  public BatfishException asSingleException() {

Note to self: understand why this is here.

Add javadoc?

projects/batfish-common-protocol/src/main/java/org/batfish/common/util/CommonUtil.java, line 336 at r1 (raw file):

  public static <E, T extends E, E1 extends T, E2 extends T, S extends Set<E>>
      Set<E> immutableDifference(Set<E1> set1, Set<E2> set2, Supplier<S> setConstructor) {
    return Collections.unmodifiableSet(difference(set1, set2, setConstructor));

Any reason not to use Guava Sets for these? It already returns an unmodifiable view, lazily evaluated. You could also use ImmutableSet.copyOf(Sets.difference(set1, set2)) if there's worry that set1/set2 might change underneath you.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 1 at r1 (raw file):

package org.batfish.grammar;

I'm not precisely sure of the "mode" in which to review this. Clearly I am missing a lot of ANTLR background.

Should I expect implementation comments in this file to give me an overview of the necessary context? Should I expect the reader to be an expert on ANTLR?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 23 at r1 (raw file):

import org.antlr.v4.runtime.tree.ParseTree;

public class BatfishANTRLErrorStrategy extends DefaultErrorStrategy {

Top-level javadoc would be helpful as to what this class does from the public API context. What types of errors can it recover and how does it do so?

Also a (non-javadoc) block comment as to how it does it would be useful for those reading the code.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 29 at r1 (raw file):

  private int _separatorToken;

  public BatfishANTRLErrorStrategy(

Javadoc - what are these parameters for?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 30 at r1 (raw file):

  public BatfishANTRLErrorStrategy(
      int separatorToken, String minimumRequiredSeparatorText, String text) {

Inferring from the code, it looks like the first two parameters are specific to the strategy and the third is the actual text being parsed? I'd probably separate these into two different classes / functions for clarity and reusability.

F = new BatfishErrorStrategy(separatorToken, minimumRequiredSeparatorText);
E = F.forText(text);

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 32 at r1 (raw file):

      int separatorToken, String minimumRequiredSeparatorText, String text) {
    _lines =
        Collections.unmodifiableList(

ImmutableList.copyOf?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 53 at r1 (raw file):

      String separator = separatorToken.getText();

      // Get the line number and separator text from the separator token

Comment seems wrong -- copy and paste of the one 5 lines up?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 59 at r1 (raw file):

      recognizer.consume();

      nextToken = recognizer.getInputStream().LA(1);

?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 91 at r1 (raw file):

  }

  private Token recover(Parser recognizer) {

what does this do? Expected pre-post conditions? etc.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 127 at r1 (raw file):

  }

  public void recoverInCurrentNode(Parser recognizer) {

javadoc on all public functions

(Without looking elsewhere to see how it's used:) does this need to be public?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishCombinedParser.java, line 73 at r1 (raw file):

    _parser.initErrorListener(this);
    _parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
    if (!settings.getDisableUnrecognized()) {

? - don't get this.

comment?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishGrammarErrorListener.java, line 36 at r1 (raw file):

  public void reportContextSensitivity(
      Parser arg0, DFA arg1, int arg2, int arg3, int arg4, ATNConfigSet arg5) {}

also delete blank line?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexer.java, line 15 at r1 (raw file):

  private BatfishCombinedParser<?, ?> _parser;

  private BatfishLexerRecoveryStrategy _recoveryStrategy;

@Nullable

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexer.java, line 25 at r1 (raw file):

  }

  public BatfishLexerRecoveryStrategy getRecoveryStrategy() {

@Nullable

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexer.java, line 62 at r1 (raw file):

  }

  public void setRecoveryStrategy(BatfishLexerRecoveryStrategy recoveryStrategy) {

is this @Nullable?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexerRecoveryStrategy.java, line 12 at r1 (raw file):

import org.batfish.common.util.CommonUtil;

public class BatfishLexerRecoveryStrategy {

javadoc

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexerRecoveryStrategy.java, line 29 at r1 (raw file):

  private static Set<Integer> whitespaceAndNewlines() {
    return CommonUtil.immutableUnion(whitespace(), newlines(), HashSet::new);

ImmutableSet.builder().addAll(whitespace()).addAll(newlines()).build(); ? (guessed at that syntax btw)

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishParserATNSimulator.java, line 8 at r1 (raw file):

import org.antlr.v4.runtime.atn.ParserATNSimulator;

public class BatfishParserATNSimulator extends ParserATNSimulator {

Javadoc. What does this do that's different than the default?

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishRecognitionException.java, line 8 at r1 (raw file):

import org.antlr.v4.runtime.Recognizer;

public class BatfishRecognitionException extends RecognitionException {

Looks like this adds nothing. Why?

final class?

Javadoc?

projects/batfish/src/main/antlr4/org/batfish/grammar/cisco/Cisco_bgp.g4, line 858 at r1 (raw file):

   { !_disableUnrecognized }?

   unrecognized_line

we now support all lines here? or?

projects/batfish/src/main/antlr4/org/batfish/grammar/flatjuniper/FlatJuniperParser.g4, line 120 at r1 (raw file):

s_vlans
:
   VLANS

I expected to see more changes to Juniper parsers to go with this change. What am I missing?

test_rigs/parsing-errors-tests/recovery-no-verbose.ref, line 13 at r1 (raw file):

        "as1r1" : {
          "Red flags" : {
            "1" : "MISCELLANEOUS: Unrecognized Line: 4:  ip address 10.12.11.1 255.255.255.0aaaaa:wq\nLINES BELOW LINE  ip address 10.12.11.1 255.255.255.0aaaaa:wq ARE UNLIKELY TO BE PROCESSED CORRECTLY",

"are unlikely to" vs "may not be"?

Comments from Reviewable

arifogel · 2017-10-30T18:16:39Z

Fair complaint. After adding javadocs I will be in a better position to address and add more granular tests.

Review status: 41 of 64 files reviewed at latest revision, 27 unresolved discussions.

projects/pom.xml, line 200 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Drop space?

Will address.

projects/pom.xml, line 200 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

We should version the plugin in the top-level module, but only add antlr4 test sources in the module(s) that have them. just batfish?

Sure OK. Yes. Will address.

projects/suppressions.xml, line 7 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Explicitly listing the main/src and test/src seems a cleaner/simpler solution. See, e.g., https://github.com/spotify/dockerfile-maven/pull/60/files#diff-600376dffeb79835ede4a0b285078036

I tried something similar but it didn't work. There is a slight difference at your link, though, so I can try that and get back to you.

projects/batfish/pom.xml, line 113 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Add <phase>generate-test-sources</phase>, I think link. That way the test sources will not be generated unless tests are to be run.

Makes sense. Will address.

projects/batfish-common-protocol/src/main/java/org/batfish/common/CompositeBatfishException.java, line 58 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Note to self: understand why this is here.

Add javadoc?

Actually not sure if I ended using this. But it is still a useful helpful function. I will add javadoc, and that should become clear.

projects/batfish-common-protocol/src/main/java/org/batfish/common/util/CommonUtil.java, line 336 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Any reason not to use Guava Sets for these? It already returns an unmodifiable view, lazily evaluated. You could also use ImmutableSet.copyOf(Sets.difference(set1, set2)) if there's worry that set1/set2 might change underneath you.

My version allows you to specify the underlying set implementation. If there is a way to do that with Guava library functions, I'm happy to switch. I will investigate.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 1 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

I'm not precisely sure of the "mode" in which to review this. Clearly I am missing a lot of ANTLR background.

Should I expect implementation comments in this file to give me an overview of the necessary context? Should I expect the reader to be an expert on ANTLR?

I'll see what I can come up with in terms of documentation, but I hope not to explain the whole ANTLR recognition process in docs in this file.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 23 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Top-level javadoc would be helpful as to what this class does from the public API context. What types of errors can it recover and how does it do so?

Also a (non-javadoc) block comment as to how it does it would be useful for those reading the code.

I'll see what I can do here.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 29 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Javadoc - what are these parameters for?

Will add.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 30 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Inferring from the code, it looks like the first two parameters are specific to the strategy and the third is the actual text being parsed? I'd probably separate these into two different classes / functions for clarity and reusability.
F = new BatfishErrorStrategy(separatorToken, minimumRequiredSeparatorText);
E = F.forText(text);

The text is used by the strategy to create ErrorNodes including the whole line being thrown out. Without access to the raw text, it would be complicated, error prone, and expensive to recover this text.
I'm not sure if this addresses your comment at all, but in any case I am unclear on the pattern you are suggesting.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 32 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

ImmutableList.copyOf?

Why copy it? No one else has a reference to it, so it seems like wasted work.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 53 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Comment seems wrong -- copy and paste of the one 5 lines up?

You are correct. Will address.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 59 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

?

Are you asking for a comment? If so I will provide.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 91 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

what does this do? Expected pre-post conditions? etc.

Will document.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 127 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

javadoc on all public functions

(Without looking elsewhere to see how it's used:) does this need to be public?

It is used elsewhere.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishCombinedParser.java, line 73 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

? - don't get this.

comment?

If we are not doing recovery, none of the recovery infrastructure needs to be supplied (it creates overhead).

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishGrammarErrorListener.java, line 36 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

also delete blank line?

What blank line? I'll take a look in code in case the review system is hiding for me.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexer.java, line 15 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

@Nullable

Will do.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexer.java, line 25 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

@Nullable

Will do.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexer.java, line 62 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

is this @Nullable?

That's a philosophical question. If you set it, presumably you want it to do something. But since the field is not required to be non-null, I don't suppose there is any specific harm for calling it Nullable here.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexerRecoveryStrategy.java, line 12 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

javadoc

Will do.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishLexerRecoveryStrategy.java, line 29 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

ImmutableSet.builder().addAll(whitespace()).addAll(newlines()).build(); ? (guessed at that syntax btw)

Will address contingent on decision for my earlier comment about Guava set functions.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishParserATNSimulator.java, line 8 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Javadoc. What does this do that's different than the default?

Will add

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishRecognitionException.java, line 8 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Looks like this adds nothing. Why?

final class?

Javadoc?

It's just meant to be a general Recognition exception to be caught by the Parser. Unfortunately this one is hard to understand without seeing generated parser code. I will add javadoc.

projects/batfish/src/main/antlr4/org/batfish/grammar/cisco/Cisco_bgp.g4, line 858 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

we now support all lines here? or?

No special handling of unrecognized lines needed, as this whole PR generalizes it for all parsers of this kind.

projects/batfish/src/main/antlr4/org/batfish/grammar/flatjuniper/FlatJuniperParser.g4, line 120 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

I expected to see more changes to Juniper parsers to go with this change. What am I missing?

The intentional blank was supposed to go after the VLANS keyword. But the only reason I would have put a blank there is if an apply-groups was also valid there. The apply rule allows a blank, or apply-groups, or apply-groups-except, so it covers all needs here.

test_rigs/parsing-errors-tests/recovery-no-verbose.ref, line 13 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

"are unlikely to" vs "may not be"?

After this PR should be "may not be". I wil change.

Comments from Reviewable

corinaminer · 2017-10-30T18:18:47Z

Review status: 41 of 64 files reviewed at latest revision, 4 unresolved discussions.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 1 at r1 (raw file):

package org.batfish.grammar;

Class name has ANTRL instead of ANTLR.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 119 at r1 (raw file):

      recognizer.consume();
      return createErrorNode(recognizer, parent, lineIndex, separator);
    }

These if statements are opaque to me. Would definitely help to have a block comment outlining the three cases and what we want to do with the parse tree in each case.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 197 at r1 (raw file):

           */
          throw new InputMismatchException(recognizer);
        }

I don't understand why this block in the switch is necessary, since the block after case ATNState.STAR_LOOP_BACK: is identical.

Comments from Reviewable

arifogel · 2017-10-30T18:23:04Z

Review status: 41 of 64 files reviewed at latest revision, 4 unresolved discussions.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 1 at r1 (raw file):

Previously, corinaminer (Corina Miner) wrote…

Class name has ANTRL instead of ANTLR.

Good catch. Will fix.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 119 at r1 (raw file):

Previously, corinaminer (Corina Miner) wrote…

These if statements are opaque to me. Would definitely help to have a block comment outlining the three cases and what we want to do with the parse tree in each case.

Will document and add tests for each case.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTRLErrorStrategy.java, line 197 at r1 (raw file):

Previously, corinaminer (Corina Miner) wrote…

I don't understand why this block in the switch is necessary, since the block after case ATNState.STAR_LOOP_BACK: is identical.

Until the very last change, these had different content. But now that they are the same, they can be merged. Good catch.

Comments from Reviewable

arifogel · 2017-10-30T22:42:43Z

Review status: 41 of 64 files reviewed at latest revision, 3 unresolved discussions.

projects/batfish/pom.xml, line 113 at r1 (raw file):

Previously, arifogel (Ari Fogel) wrote…

Makes sense. Will address.

So I tried it. While I think it does bind it to the generate-test-sources phase, you should know that generate-test-sources is run during e.g. mvn install -DskipTests anyway. That is, tests are generated and compiled, but just not run.

Comments from Reviewable

arifogel · 2017-10-30T23:08:52Z

Review status: 41 of 64 files reviewed at latest revision, 3 unresolved discussions.

projects/batfish-common-protocol/src/main/java/org/batfish/common/util/CommonUtil.java, line 336 at r1 (raw file):

Previously, arifogel (Ari Fogel) wrote…

My version allows you to specify the underlying set implementation. If there is a way to do that with Guava library functions, I'm happy to switch. I will investigate.

I think there is a case for at least removing references to all these utility functions. I will defer to a future PR, and remove references and modifications to utility functions in this PR.

Comments from Reviewable

- compile test grammars - update eclipse metadata to use generated test grammars - add suppressions to checkstyle to force it not to analyze generated test grammars

- enabled when disableUnrecognized is false (this is DEFAULT) - requires specification of separator token, minimal string always present in separator token (e.g. NEWLINE, "\n")

dhalperi · 2017-11-01T16:29:03Z

projects/pom.xml, line 200 at r1 (raw file):

Previously, arifogel (Ari Fogel) wrote…

Will address.

Did not address.

Comments from Reviewable

arifogel · 2017-11-01T17:03:02Z

Review status: 24 of 63 files reviewed at latest revision, 2 unresolved discussions.

projects/pom.xml, line 200 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

Did not address.

Whoops got it now.

Comments from Reviewable

dhalperi · 2017-11-01T17:57:34Z

Reviewed 5 of 71 files at r1, 42 of 48 files at r2, 2 of 2 files at r3.
Review status: 62 of 63 files reviewed at latest revision, 14 unresolved discussions.

projects/batfish/pom.xml, line 113 at r1 (raw file):

Previously, arifogel (Ari Fogel) wrote…

So I tried it. While I think it does bind it to the generate-test-sources phase, you should know that generate-test-sources is run during e.g. mvn install -DskipTests anyway. That is, tests are generated and compiled, but just not run.

Yep, that's expected. There's a difference between not running tests (-DskipTests) and not running the test phase at all (-Dmaven.test.skip or -Dtest.skip, [I think]). The latter is almost never what you want.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 28 at r3 (raw file):

 * Intended as a replacement for default ANTLR parser error recovery strategy. The basic idea here
 * is to throw out any lines containing invalidities, and try to parse as if those lines were never
 * there. Meanwhile, each invalid lines should show up in the parse tree as {@link ErrorNode} at an

typo: invalid line, not lines.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 36 at r3 (raw file):

  /**
   * Construct a factory for making instances of {@link BatfishANTLRErrorStrategy} with supplied
   * {@link separatorToken} and {@link minimRequiredSeparatorText}.

these should be @code, I think, as those are not valid javadoc references.

typo in minimum.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 99 at r3 (raw file):

   * @param recognizer The {@link Parser} to whom to delegate creation of each {@link ErrorNode}
   */
  private void consumeBlocksUntilWanted(Parser recognizer) {

I'm having a hard time connecting this function's doc to its name. Is the intent to turn something like

mtu abcdef 1234a 1500

into

mtu 1500
```

(skipping `abcdef` and `1234a` which are invalid numbers) ?

Or is the goal to skip the entire line?

The javadoc text in terms of "tokens" makes me think it's the first one, but then text in terms of "lines" makes me think maybe the latter, but the function name "consume blocks" makes me question both.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 166 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsYrYUpW_Sccsq5kkU:-KxsYrYUpW_Sccsq5kkV:bfper44) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L166)):*
> ```Java
>    *     created @{link ErrorNode}.
>    */
>   private Token recover(Parser recognizer) {
> ```

should this be called `parser`? I ran into issues trying to find Recognizer#consume, when I realized the function was actually limited to Parsers :)

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 174 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsVL5_mnPwJUFmqn4K:-KxsVL5_mnPwJUFmqn4L:bik40qi) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L174)):*
> ```Java
>     IntervalSet followSet = new IntervalSet();
>     followSet.add(_separatorToken);
>     consumeUntil(recognizer, followSet);
> ```

I think this could be clearer as just:

```java
// Consume anything left on this line.
consumeUntil(recognizer, IntervalSet.of(_separatorToken));
```

or even using a `consumeUntilEndOfLine(recognizer)` helper method. FWICT you only ever use `consumeUntil` in this way, and you do so three times in the file.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 179 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-Kxs_zOiZiN5UuiaBMt5:-KxsZA0WvdFSl1-EIPFu:b-3doq9b) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L179)):*
> ```Java
>     Token separatorToken = recognizer.getCurrentToken();
>     int lineIndex = separatorToken.getLine() - 1;
>     String separator = separatorToken.getText();
> ```

Maybe I'm missing something, but why do we need to extract the separator string from a parse token -- don't we already know the separator string?

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 194 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-Kxsa4EtMb6PxpQu8PAa:-Kxsa4EtMb6PxpQu8PAb:b-w65aga) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L194)):*
> ```Java
>       recognizer.consume();
>       return createErrorNode(recognizer, parent, lineIndex, separator);
>     }
> ```

I might restructure this in a simpler way:

1. At the top of the function, or as early as possible*, check for the case that we have a parent on the same line and throw.
    (*FWICT, this can be done before any of the work above this `if`, other than setting up parent and ctx variables. I may be wrong, tho – I only recursed a few levels deep in ANTLR functions.).

2. This code can get simplified:

    ```java
    if (parent != null) {
      // Delete this node from its parent, to be replaced by the error node we are about to create.
      List<ParseTree> parentChildren = parent.children;
      parentChildren.remove(parentChildren.size() - 1);
    }
    
    // Eat the separator token, enabling the parser to resume from the next line. (? is this right?)
    recognizer.consume();
    return createError(recognizer, parent, lineIndex, separator);
    ```

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 247 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsdOrzx0PLuzT6WrJn:-KxsdOrzx0PLuzT6WrJo:bdj4yh6) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L247)):*
> ```Java
>   public void sync(Parser recognizer) throws RecognitionException {
>     /*
>      * Copied from super
> ```

Which code here is not copied from super? If it's a clean break, just add a comment where the new code is?

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/TestGrammarSettings.java, line 1 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsfLDxWUBmZDmtRw8s:-KxsfLDxWUBmZDmtRw8t:b-9jrht) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/TestGrammarSettings.java#L1)):*
> ```Java
> package org.batfish.grammar;
> ```

Put this in `src/test`, not `src/main`

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/TestGrammarSettings.java, line 3 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsfQhszwZBnbvDHP7H:-KxsfQhszwZBnbvDHP7I:b5fubzl) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/TestGrammarSettings.java#L3)):*
> ```Java
> package org.batfish.grammar;
> 
> public class TestGrammarSettings implements GrammarSettings {
> ```

javadoc

---

*[projects/batfish/src/test/java/org/batfish/grammar/flatjuniper/FlatJuniperGrammarTest.java, line 78 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxshK45_b-gWLZTiNeY:-KxshK45_b-gWLZTiNeZ:b-altuu4) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish/src/test/java/org/batfish/grammar/flatjuniper/FlatJuniperGrammarTest.java#L78)):*
> ```Java
> 
>     assertThat(extractor.getNumSets(), equalTo(8));
>     assertThat(extractor.getNumErrorNodes(), equalTo(8));
> ```

Since the test is so opaque (magic numbers manually computed from text in a different file), I might add a sanity check on the total number of lines in the file, e.g.,

```
int totalSetsLines = recoveryTest.split('\n').stream().filter(s -> s.trim().startsWith('set')....count.
assertThat(error.getNumSets() + error.getNumErrorNodes(), equalTo(totalSetsLines));
```

---


*Comments from [Reviewable](https://reviewable.io:443/reviews/batfish/batfish/624)*
<!-- Sent from Reviewable.io -->

dhalperi · 2017-11-01T17:58:07Z

Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions.

Comments from Reviewable

arifogel · 2017-11-01T19:45:56Z

Review status: 59 of 63 files reviewed at latest revision, 13 unresolved discussions.

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 28 at r3 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

typo: invalid line, not lines.

Fixed

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 36 at r3 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

these should be @code, I think, as those are not valid javadoc references.

typo in minimum.

Fixed

projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 99 at r3 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

I'm having a hard time connecting this function's doc to its name. Is the intent to turn something like

mtu abcdef 1234a 1500

into

mtu 1500
```

(skipping `abcdef` and `1234a` which are invalid numbers) ?

Or is the goal to skip the entire line?

The javadoc text in terms of "tokens" makes me think it's the first one, but then text in terms of "lines" makes me think maybe the latter, but the function name "consume blocks" makes me question both.
</blockquote></details>

Adjusted javadoc for clarity.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 166 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsYrYUpW_Sccsq5kkU:-Kxt5miIjkLk7d8OkgJN:b-7uqqfs) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L166)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

should this be called `parser`? I ran into issues trying to find Recognizer#consume, when I realized the function was actually limited to Parsers :)
</blockquote></details>

Changed as suggested.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 174 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsVL5_mnPwJUFmqn4K:-Kxt5qAlsCYlfbmBwzlI:b-896fix) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L174)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

I think this could be clearer as just:

```java
// Consume anything left on this line.
consumeUntil(recognizer, IntervalSet.of(_separatorToken));
```

or even using a `consumeUntilEndOfLine(recognizer)` helper method. FWICT you only ever use `consumeUntil` in this way, and you do so three times in the file.
</blockquote></details>

Done.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 179 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-Kxs_zOiZiN5UuiaBMt5:-Kxst3vHMaK7vKz-dvQu:bhmf0f9) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L179)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

Maybe I'm missing something, but why do we need to extract the separator string from a parse token -- don't we already know the separator string?
</blockquote></details>

We want to have all the text of the file in the parse tree. The separator string is a MINIMAL version of the separator token, but that token could have longer instantiations. We want to get the text for the specific instantiation at this point.
For reference, the minimum for most `NEWLINE` tokens we've defined is `"\n"`, but `"\r\n"` is also `lexed` as a single NEWLINE token.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 194 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-Kxsa4EtMb6PxpQu8PAa:-Kxsz5ro0GG8HlRRz2RT:b-im08lw) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L194)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

I might restructure this in a simpler way:

1. At the top of the function, or as early as possible*, check for the case that we have a parent on the same line and throw.
    (*FWICT, this can be done before any of the work above this `if`, other than setting up parent and ctx variables. I may be wrong, tho – I only recursed a few levels deep in ANTLR functions.).

2. This code can get simplified:

    ```java
    if (parent != null) {
      // Delete this node from its parent, to be replaced by the error node we are about to create.
      List<ParseTree> parentChildren = parent.children;
      parentChildren.remove(parentChildren.size() - 1);
    }
    
    // Eat the separator token, enabling the parser to resume from the next line. (? is this right?)
    recognizer.consume();
    return createError(recognizer, parent, lineIndex, separator);
    ```
</blockquote></details>

Bug here. In the first base case (`parent == null`), `ctx` instead of `parent` should be passed to `createError`. Since tests were not crashing with NPE, I think I need to add another one that triggers this branch. Although it is possible that the logic in `sync` prevents this case from ever occurring. Need to explore.
In any case, I still refactored to address your comment.

---

*[projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java, line 247 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsdOrzx0PLuzT6WrJn:-Kxt68UZV4-9IpTinG1B:bb6un1j) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/main/java/org/batfish/grammar/BatfishANTLRErrorStrategy.java#L247)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

Which code here is not copied from super? If it's a clean break, just add a comment where the new code is?
</blockquote></details>

Done

---

*[projects/batfish-common-protocol/src/test/java/org/batfish/grammar/TestGrammarSettings.java, line 1 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsfLDxWUBmZDmtRw8s:-Kxt6DRmLROB_3xbi9it:bb6un1j) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/test/java/org/batfish/grammar/TestGrammarSettings.java#L1)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

Put this in `src/test`, not `src/main`
</blockquote></details>

Done

---

*[projects/batfish-common-protocol/src/test/java/org/batfish/grammar/TestGrammarSettings.java, line 3 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxsfQhszwZBnbvDHP7H:-Kxt6EJ9d0-7EZwAKAj4:b-we7dfl) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish-common-protocol/src/test/java/org/batfish/grammar/TestGrammarSettings.java#L3)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

javadoc
</blockquote></details>

Done, and also added javadocs to `GrammarSettings`

---

*[projects/batfish/src/test/java/org/batfish/grammar/flatjuniper/FlatJuniperGrammarTest.java, line 78 at r3](https://reviewable.io:443/reviews/batfish/batfish/624#-KxshK45_b-gWLZTiNeY:-Kxt6LUn1hmEEwtxI1Jl:bgfgdig) ([raw file](https://github.com/batfish/batfish/blob/114aa796ad4c3228c3145bfcbbfe0509a5b0f007/projects/batfish/src/test/java/org/batfish/grammar/flatjuniper/FlatJuniperGrammarTest.java#L78)):*
<details><summary><i>Previously, dhalperi (Dan Halperin) wrote…</i></summary><blockquote>

Since the test is so opaque (magic numbers manually computed from text in a different file), I might add a sanity check on the total number of lines in the file, e.g.,

```
int totalSetsLines = recoveryTest.split('\n').stream().filter(s -> s.trim().startsWith('set')....count.
assertThat(error.getNumSets() + error.getNumErrorNodes(), equalTo(totalSetsLines));
```
</blockquote></details>

Addressed kind of. The number of lines referenced in the parse tree is only guaranteed to be less than or equal to the number of lines in the input text.

---


*Comments from [Reviewable](https://reviewable.io:443/reviews/batfish/batfish/624)*
<!-- Sent from Reviewable.io -->

dhalperi · 2017-11-01T22:15:28Z

Reviewed 2 of 49 files at r2, 1 of 1 files at r4, 5 of 5 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.

Comments from Reviewable

dhalperi · 2017-11-01T22:21:22Z

At this point only unresolved comment is your own as to whether you're going to add a test to catch the potential NPE issue with recovery at the top level.

Review status: all files reviewed at latest revision, 1 unresolved discussion.

Comments from Reviewable

arifogel requested a review from dhalperi October 30, 2017 05:33

arifogel added 9 commits October 31, 2017 11:24

build updates

dbcb92a

- compile test grammars - update eclipse metadata to use generated test grammars - add suppressions to checkstyle to force it not to analyze generated test grammars

implement parser recovery

2644c2c

- enabled when disableUnrecognized is false (this is DEFAULT) - requires specification of separator token, minimal string always present in separator token (e.g. NEWLINE, "\n")

update select parsers to make use of recovery

dbe85af

add recovery test with associated grammar, text file

0f923ca

update cisco unrecognized lines infrastructure, add recovery test

4884b46

stop flatjuniper s_vlan rule incorrectly allowing blank 'set' line

cfbf6e0

flatjuniper recovery test

f47d47b

update parsing-errors test commands, add new refs

b46b47e

address reviewer comments

114aa79

arifogel force-pushed the ari-parsing-recovery branch from ec35a4e to 114aa79 Compare October 31, 2017 18:24

drop space in pom

2f9483e

address further review comments

d5d872e

arifogel merged commit 2832e4b into master Nov 1, 2017

arifogel deleted the ari-parsing-recovery branch November 9, 2017 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing recovery #624

Parsing recovery #624

arifogel commented Oct 30, 2017 •

edited

dhalperi commented Oct 30, 2017

arifogel commented Oct 30, 2017

corinaminer commented Oct 30, 2017

arifogel commented Oct 30, 2017

arifogel commented Oct 30, 2017

arifogel commented Oct 30, 2017

dhalperi commented Nov 1, 2017

arifogel commented Nov 1, 2017

dhalperi commented Nov 1, 2017

dhalperi commented Nov 1, 2017

arifogel commented Nov 1, 2017

dhalperi commented Nov 1, 2017

dhalperi commented Nov 1, 2017

Parsing recovery #624

Parsing recovery #624

Conversation

arifogel commented Oct 30, 2017 • edited

dhalperi commented Oct 30, 2017

arifogel commented Oct 30, 2017

corinaminer commented Oct 30, 2017

arifogel commented Oct 30, 2017

arifogel commented Oct 30, 2017

arifogel commented Oct 30, 2017

dhalperi commented Nov 1, 2017

arifogel commented Nov 1, 2017

dhalperi commented Nov 1, 2017

dhalperi commented Nov 1, 2017

arifogel commented Nov 1, 2017

dhalperi commented Nov 1, 2017

dhalperi commented Nov 1, 2017

arifogel commented Oct 30, 2017 •

edited