json to g4 only with "parser" cause some syntax error #43

0x7Fancy · 2024-01-18T09:28:18Z

In my experimental environment, I found json to g4 only with "parser" cause some syntax error, syntax parsing errors may lead to the possibility of losing a large amount of mutated data.

I made mincase lex.json:

{
    "<A>": [["<NUMBER>", "<STRING>", "\n"]],
    "<NUMBER>": [["10"], ["99"]],
    "<STRING>": [["(", "<HEXSTRING>", ")"]],
    "<HEXSTRING>": [["<CHAR>", "<HEXSTRING>"], []],
    "<CHAR>": [
            ["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
            ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"]
    ]
}

Grammar-Mutator make it, generate Grammar.g4 is:

grammar Grammar;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
node_STRING
    : '(' node_HEXSTRING ')'
    ;
node_HEXSTRING
    : 
    | node_CHAR node_HEXSTRING
    ;
node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;

we prepared input data seed1 / seed2, and use antlr4-parse to testing:

why is 10(10) parsed incorrectly? because antlr4 is divided into two stages: lexer and parser. during lexer stage, node_NUMBER:10 will be recognized as TOKEN, and in the parser stage, the result is node_NUMBER (node_NUMBER), so an error occurred.

in the antlr4 grammar, lex rules begin with an uppercase letter, parser rules begin with a lowercase letter, so we should tell antlr4 the lexical rules clearly, patch Grammar_patch.g4:

grammar Grammar_patch;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER Node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
Node_STRING
    : '(' Node_HEXSTRING ')'
    ;
Node_HEXSTRING
    : 
    | Node_CHAR Node_HEXSTRING
    ;
Node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;

testing again:

the "warning" prompts us it can match the empty string, this may cause antlr4 parsing backtrace issues, but we can easily mark it with fragment Node_HEXSTRING

maybe we can optimize the json to g4 generation code, to distinguish between lexer and parser?

The text was updated successfully, but these errors were encountered:

h1994st · 2024-01-18T16:30:00Z

If you have time to work on this, that would be great! Feel free to submit a PR and add your test cases.

I can take a look at this later when I am available. Probably not in recent weeks :(

0x7Fancy · 2024-01-19T09:27:38Z

okay, I'll try my best to provide

0x7Fancy · 2024-01-23T11:00:51Z

I tried to solve the problem but I found that Grammar-Mutator has to rely on AST to work, if I add lexical rules, this is not transparent to Grammar-Mutator.

In the above example, we focus on the input data of 10(12), respectively using Grammar.g4 and Grammar_patch.g4, we can see that Grammar.g4 only with grammar parser has a complete AST structure (in line with Grammar-Mutator expectations)

I think this is a trade-off in the Grammar-Mutator design. It loses part of the mutation data, but the program design is more concise, clear and direct.

so, currently Grammar-Mutator is perfect

h1994st added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Jan 18, 2024

0x7Fancy mentioned this issue Jan 23, 2024

fix: revert '+' syntax optimize && add growth check limit for 'random_recursive_mutation' #44

Merged

github-project-automation bot added this to Grammar mutator Aug 29, 2024

github-project-automation bot moved this to To do in Grammar mutator Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json to g4 only with "parser" cause some syntax error #43

json to g4 only with "parser" cause some syntax error #43

0x7Fancy commented Jan 18, 2024

h1994st commented Jan 18, 2024

0x7Fancy commented Jan 19, 2024

0x7Fancy commented Jan 23, 2024

json to g4 only with "parser" cause some syntax error #43

json to g4 only with "parser" cause some syntax error #43

Comments

0x7Fancy commented Jan 18, 2024

h1994st commented Jan 18, 2024

0x7Fancy commented Jan 19, 2024

0x7Fancy commented Jan 23, 2024