how to find the grammer file? #3652

walkertest · 2022-04-13T07:02:32Z

https://github.com/TarsCloud/TarsJava/tree/v1.7.x/tools/tars-maven-plugin/src/main/java/com/qq/tars/maven/parse

I have the java file ,how to get back the antlr grammer file.
the file is lost.

Thanks first.

KvanTTT · 2022-04-13T08:05:40Z

No way. You can try to ask repository's author or try to search it over Google.

walkertest · 2022-04-13T08:11:13Z

No way. You can try to ask repository's author or try to search it over Google.

I had tried these ways, but no result.
I think antlr may give a javaFileToGrammerFile tool to solve these problems.

KvanTTT · 2022-04-13T08:13:08Z

I don't think it's fully possible because part of information is lost. Also, I think it's a rare case and such tool won't be implemented.

kaby76 · 2022-04-18T12:43:55Z

One can likely reverse engineer the generated code, but there's little demand to write such a tool, especially for Antlr3, which is extremely old. Besides,

What you can try is this:

grep for the embedded comments in the generated parser, e.g., // TarsParser.g:32:1: namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE -> ^( TARS_NAMESPACE[$TARS_IDENTIFIER.text] ( definition )+ ) ;.

grep 'TarsParser.g:' x.g4 | sed 's/^[ \t]*//' > o1

Remove duplicate TarsParser.g:<some-line-seen-before>:.*: by hand editing the grep results, in o2.
Remove //TarsParser.g:...: prefix.

cat o2 | sed 's#^// TarsParser.g[:][0-9]*[:][0-9]*[:]##' > o3

Remove the tree rewrite rules, then patch up the missing ')'.

cat o3 | sed 's#[-][>].*$#;#' > o4

Add in token declarations (but not rules).

cat o4 | sed 's/ /\n/g' | grep '[A-Z][A-Z]' | sort -u

If you do that (~30m work), you get a start:

grammar x;

tokens
{
COLON;
COMMA;
EQ;
GT;
LBRACE;
LBRACKET;
LPAREN;
LT;
RBRACE;
RBRACKET;
RPAREN;
SEMI;
TARS_BOOL;
TARS_BYTE;
TARS_CONST;
TARS_CONST;
TARS_DOUBLE;
TARS_ENUM;
TARS_FLOAT;
TARS_IDENTIFIER;
TARS_INCLUDE;
TARS_INT;
TARS_INTEGER_LITERAL;
TARS_INTERFACE;
TARS_KEY;
TARS_LONG;
TARS_MAP;
TARS_NAMESPACE;
TARS_OPERATION;
TARS_OPTIONAL;
TARS_OUT;
TARS_PARAM;
TARS_REF;
TARS_REQUIRE;
TARS_ROOT;
TARS_ROUTE_KEY;
TARS_SHORT;
TARS_STRING;
TARS_STRING_LITERAL;
TARS_STRUCT;
TARS_STRUCT_MEMBER;
TARS_UNSIGNED;
TARS_VECTOR;
TARS_VOID;
}

start : ( include_def )* ( namespace_def )+ ;
include_def : TARS_INCLUDE TARS_STRING_LITERAL ;
namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE ;
definition : ( const_def | enum_def | struct_def | key_def | interface_def );
const_def : TARS_CONST type_primitive TARS_IDENTIFIER EQ v= const_initializer ;
enum_def : ( TARS_ENUM TARS_IDENTIFIER LBRACE TARS_IDENTIFIER ( COMMA TARS_IDENTIFIER )* ( COMMA )? RBRACE );
struct_def : TARS_STRUCT TARS_IDENTIFIER LBRACE ( struct_member SEMI )+ RBRACE ;
struct_member : TARS_INTEGER_LITERAL (r= TARS_REQUIRE |r= TARS_OPTIONAL ) type TARS_IDENTIFIER ( EQ v= const_initializer )? ;
key_def : TARS_KEY LBRACKET n= TARS_IDENTIFIER ( COMMA k+= TARS_IDENTIFIER )+ RBRACKET ;
interface_def : TARS_INTERFACE TARS_IDENTIFIER LBRACE ( operation SEMI )+ RBRACE ;
operation : type TARS_IDENTIFIER LPAREN ( param ( COMMA param )* )? RPAREN ;
param : ( TARS_ROUTE_KEY )? ( TARS_OUT )? type TARS_IDENTIFIER ;
type : ( type_primitive | type_vector | type_map | type_custom );
type_primitive : ( TARS_VOID );
type_vector : TARS_VECTOR LT type GT ;
type_map : TARS_MAP LT type COMMA type GT ;
type_custom : ( TARS_IDENTIFIER ) ;
const_initializer : ( TARS_INTEGER_LITERAL | TARS_FLOATING_POINT_LITERAL | TARS_STRING_LITERAL | TARS_FALSE | TARS_TRUE );

With this grammar in hand, I would highly advise you move to Antlr4. If you insist on an AST rather than CST constructed, you can write a bottom-up visitor to synthesize the AST, but keep with Antlr4.

walkertest · 2022-04-21T06:41:48Z

Thanks a lot.
If i get back the grammer file, I will update the antlr version.

KvanTTT · 2022-04-21T14:52:52Z

If you insist on an AST rather than CST constructed, you can write a bottom-up visitor to synthesize the AST, but keep with Antlr4.

If performance matters, I'd recommend using bottom-up listener with turned off IsParseTreeExists option to skip creating CST at all. It affects memory and performance especially on big files.

walkertest · 2022-05-07T08:56:22Z

One can likely reverse engineer the generated code, but there's little demand to write such a tool, especially for Antlr3, which is extremely old. Besides,

What you can try is this:

grep for the embedded comments in the generated parser, e.g., // TarsParser.g:32:1: namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE -> ^( TARS_NAMESPACE[$TARS_IDENTIFIER.text] ( definition )+ ) ;.
grep 'TarsParser.g:' x.g4 | sed 's/^[ \t]*//' > o1
Remove duplicate TarsParser.g:<some-line-seen-before>:.*: by hand editing the grep results, in o2.

Remove //TarsParser.g:...: prefix.
cat o2 | sed 's#^// TarsParser.g[:][0-9]*[:][0-9]*[:]##' > o3
Remove the tree rewrite rules, then patch up the missing ')'.
cat o3 | sed 's#[-][>].*$#;#' > o4
Add in token declarations (but not rules).
cat o4 | sed 's/ /\n/g' | grep '[A-Z][A-Z]' | sort -u
If you do that (~30m work), you get a start:
grammar x;

tokens
{
COLON;
COMMA;
EQ;
GT;
LBRACE;
LBRACKET;
LPAREN;
LT;
RBRACE;
RBRACKET;
RPAREN;
SEMI;
TARS_BOOL;
TARS_BYTE;
TARS_CONST;
TARS_CONST;
TARS_DOUBLE;
TARS_ENUM;
TARS_FLOAT;
TARS_IDENTIFIER;
TARS_INCLUDE;
TARS_INT;
TARS_INTEGER_LITERAL;
TARS_INTERFACE;
TARS_KEY;
TARS_LONG;
TARS_MAP;
TARS_NAMESPACE;
TARS_OPERATION;
TARS_OPTIONAL;
TARS_OUT;
TARS_PARAM;
TARS_REF;
TARS_REQUIRE;
TARS_ROOT;
TARS_ROUTE_KEY;
TARS_SHORT;
TARS_STRING;
TARS_STRING_LITERAL;
TARS_STRUCT;
TARS_STRUCT_MEMBER;
TARS_UNSIGNED;
TARS_VECTOR;
TARS_VOID;
}

start : ( include_def )* ( namespace_def )+ ;
include_def : TARS_INCLUDE TARS_STRING_LITERAL ;
namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE ;
definition : ( const_def | enum_def | struct_def | key_def | interface_def );
const_def : TARS_CONST type_primitive TARS_IDENTIFIER EQ v= const_initializer ;
enum_def : ( TARS_ENUM TARS_IDENTIFIER LBRACE TARS_IDENTIFIER ( COMMA TARS_IDENTIFIER )* ( COMMA )? RBRACE );
struct_def : TARS_STRUCT TARS_IDENTIFIER LBRACE ( struct_member SEMI )+ RBRACE ;
struct_member : TARS_INTEGER_LITERAL (r= TARS_REQUIRE |r= TARS_OPTIONAL ) type TARS_IDENTIFIER ( EQ v= const_initializer )? ;
key_def : TARS_KEY LBRACKET n= TARS_IDENTIFIER ( COMMA k+= TARS_IDENTIFIER )+ RBRACKET ;
interface_def : TARS_INTERFACE TARS_IDENTIFIER LBRACE ( operation SEMI )+ RBRACE ;
operation : type TARS_IDENTIFIER LPAREN ( param ( COMMA param )* )? RPAREN ;
param : ( TARS_ROUTE_KEY )? ( TARS_OUT )? type TARS_IDENTIFIER ;
type : ( type_primitive | type_vector | type_map | type_custom );
type_primitive : ( TARS_VOID );
type_vector : TARS_VECTOR LT type GT ;
type_map : TARS_MAP LT type COMMA type GT ;
type_custom : ( TARS_IDENTIFIER ) ;
const_initializer : ( TARS_INTEGER_LITERAL | TARS_FLOATING_POINT_LITERAL | TARS_STRING_LITERAL | TARS_FALSE | TARS_TRUE );
With this grammar in hand, I would highly advise you move to Antlr4. If you insist on an AST rather than CST constructed, you can write a bottom-up visitor to synthesize the AST, but keep with Antlr4.

Hi,Bro.
I have tranlated the lexer antlr3 gramar file(The link is :https://github.com/walkertest/TarsJava/blob/feature/antlrfind/tools/tars-maven-plugin/src/main/resources/antlr/TarsLexer.g). I met these questions. (env : 3.5 antlr3 version.)
The first is the java comment is a little different.

The second is the COMMENT define deifferrence like this:

Thanks first.

Update-2022-05-10
These questions has been fixed.
Don't care about them.

walkertest · 2022-05-11T02:00:53Z

antlr/antlr3#208

I have move this issue to antlr3. @KvanTTT @kaby76
I meet three remaining questions . Please help me.

Thanks first.

walkertest closed this as completed May 11, 2022

walkertest mentioned this issue May 11, 2022

The questions when i try to translate java class to antlr3 grammer file. antlr/antlr3#208

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to find the grammer file? #3652

how to find the grammer file? #3652

walkertest commented Apr 13, 2022

KvanTTT commented Apr 13, 2022

walkertest commented Apr 13, 2022

KvanTTT commented Apr 13, 2022

kaby76 commented Apr 18, 2022 •

edited

Loading

walkertest commented Apr 21, 2022

KvanTTT commented Apr 21, 2022

walkertest commented May 7, 2022 •

edited

Loading

walkertest commented May 11, 2022

how to find the grammer file? #3652

how to find the grammer file? #3652

Comments

walkertest commented Apr 13, 2022

KvanTTT commented Apr 13, 2022

walkertest commented Apr 13, 2022

KvanTTT commented Apr 13, 2022

kaby76 commented Apr 18, 2022 • edited Loading

walkertest commented Apr 21, 2022

KvanTTT commented Apr 21, 2022

walkertest commented May 7, 2022 • edited Loading

walkertest commented May 11, 2022

kaby76 commented Apr 18, 2022 •

edited

Loading

walkertest commented May 7, 2022 •

edited

Loading