Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to find the grammer file? #3652

Closed
walkertest opened this issue Apr 13, 2022 · 8 comments
Closed

how to find the grammer file? #3652

walkertest opened this issue Apr 13, 2022 · 8 comments

Comments

@walkertest
Copy link

https://github.com/TarsCloud/TarsJava/tree/v1.7.x/tools/tars-maven-plugin/src/main/java/com/qq/tars/maven/parse


I have the java file ,how to get back the antlr grammer file.
the file is lost.

Thanks first.

@KvanTTT
Copy link
Member

KvanTTT commented Apr 13, 2022

No way. You can try to ask repository's author or try to search it over Google.

@walkertest
Copy link
Author

No way. You can try to ask repository's author or try to search it over Google.

I had tried these ways, but no result.
I think antlr may give a javaFileToGrammerFile tool to solve these problems.

@KvanTTT
Copy link
Member

KvanTTT commented Apr 13, 2022

I don't think it's fully possible because part of information is lost. Also, I think it's a rare case and such tool won't be implemented.

@kaby76
Copy link
Contributor

kaby76 commented Apr 18, 2022

One can likely reverse engineer the generated code, but there's little demand to write such a tool, especially for Antlr3, which is extremely old. Besides,

What you can try is this:

  • grep for the embedded comments in the generated parser, e.g., // TarsParser.g:32:1: namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE -> ^( TARS_NAMESPACE[$TARS_IDENTIFIER.text] ( definition )+ ) ;.
grep 'TarsParser.g:' x.g4 | sed 's/^[ \t]*//' > o1
  • Remove duplicate TarsParser.g:<some-line-seen-before>:.*: by hand editing the grep results, in o2.
  • Remove //TarsParser.g:...: prefix.
cat o2 | sed 's#^// TarsParser.g[:][0-9]*[:][0-9]*[:]##' > o3
  • Remove the tree rewrite rules, then patch up the missing ')'.
cat o3 | sed 's#[-][>].*$#;#' > o4
  • Add in token declarations (but not rules).
cat o4 | sed 's/ /\n/g' | grep '[A-Z][A-Z]' | sort -u

If you do that (~30m work), you get a start:

grammar x;

tokens
{
COLON;
COMMA;
EQ;
GT;
LBRACE;
LBRACKET;
LPAREN;
LT;
RBRACE;
RBRACKET;
RPAREN;
SEMI;
TARS_BOOL;
TARS_BYTE;
TARS_CONST;
TARS_CONST;
TARS_DOUBLE;
TARS_ENUM;
TARS_FLOAT;
TARS_IDENTIFIER;
TARS_INCLUDE;
TARS_INT;
TARS_INTEGER_LITERAL;
TARS_INTERFACE;
TARS_KEY;
TARS_LONG;
TARS_MAP;
TARS_NAMESPACE;
TARS_OPERATION;
TARS_OPTIONAL;
TARS_OUT;
TARS_PARAM;
TARS_REF;
TARS_REQUIRE;
TARS_ROOT;
TARS_ROUTE_KEY;
TARS_SHORT;
TARS_STRING;
TARS_STRING_LITERAL;
TARS_STRUCT;
TARS_STRUCT_MEMBER;
TARS_UNSIGNED;
TARS_VECTOR;
TARS_VOID;
}

start : ( include_def )* ( namespace_def )+ ;
include_def : TARS_INCLUDE TARS_STRING_LITERAL ;
namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE ;
definition : ( const_def | enum_def | struct_def | key_def | interface_def );
const_def : TARS_CONST type_primitive TARS_IDENTIFIER EQ v= const_initializer ;
enum_def : ( TARS_ENUM TARS_IDENTIFIER LBRACE TARS_IDENTIFIER ( COMMA TARS_IDENTIFIER )* ( COMMA )? RBRACE );
struct_def : TARS_STRUCT TARS_IDENTIFIER LBRACE ( struct_member SEMI )+ RBRACE ;
struct_member : TARS_INTEGER_LITERAL (r= TARS_REQUIRE |r= TARS_OPTIONAL ) type TARS_IDENTIFIER ( EQ v= const_initializer )? ;
key_def : TARS_KEY LBRACKET n= TARS_IDENTIFIER ( COMMA k+= TARS_IDENTIFIER )+ RBRACKET ;
interface_def : TARS_INTERFACE TARS_IDENTIFIER LBRACE ( operation SEMI )+ RBRACE ;
operation : type TARS_IDENTIFIER LPAREN ( param ( COMMA param )* )? RPAREN ;
param : ( TARS_ROUTE_KEY )? ( TARS_OUT )? type TARS_IDENTIFIER ;
type : ( type_primitive | type_vector | type_map | type_custom );
type_primitive : ( TARS_VOID );
type_vector : TARS_VECTOR LT type GT ;
type_map : TARS_MAP LT type COMMA type GT ;
type_custom : ( TARS_IDENTIFIER ) ;
const_initializer : ( TARS_INTEGER_LITERAL | TARS_FLOATING_POINT_LITERAL | TARS_STRING_LITERAL | TARS_FALSE | TARS_TRUE );

With this grammar in hand, I would highly advise you move to Antlr4. If you insist on an AST rather than CST constructed, you can write a bottom-up visitor to synthesize the AST, but keep with Antlr4.

@walkertest
Copy link
Author

Thanks a lot.
If i get back the grammer file, I will update the antlr version.

@KvanTTT
Copy link
Member

KvanTTT commented Apr 21, 2022

If you insist on an AST rather than CST constructed, you can write a bottom-up visitor to synthesize the AST, but keep with Antlr4.

If performance matters, I'd recommend using bottom-up listener with turned off IsParseTreeExists option to skip creating CST at all. It affects memory and performance especially on big files.

@walkertest
Copy link
Author

walkertest commented May 7, 2022

One can likely reverse engineer the generated code, but there's little demand to write such a tool, especially for Antlr3, which is extremely old. Besides,

What you can try is this:

  • grep for the embedded comments in the generated parser, e.g., // TarsParser.g:32:1: namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE -> ^( TARS_NAMESPACE[$TARS_IDENTIFIER.text] ( definition )+ ) ;.
grep 'TarsParser.g:' x.g4 | sed 's/^[ \t]*//' > o1
  • Remove duplicate TarsParser.g:<some-line-seen-before>:.*: by hand editing the grep results, in o2.
  • Remove //TarsParser.g:...: prefix.
cat o2 | sed 's#^// TarsParser.g[:][0-9]*[:][0-9]*[:]##' > o3
  • Remove the tree rewrite rules, then patch up the missing ')'.
cat o3 | sed 's#[-][>].*$#;#' > o4
  • Add in token declarations (but not rules).
cat o4 | sed 's/ /\n/g' | grep '[A-Z][A-Z]' | sort -u

If you do that (~30m work), you get a start:

grammar x;

tokens
{
COLON;
COMMA;
EQ;
GT;
LBRACE;
LBRACKET;
LPAREN;
LT;
RBRACE;
RBRACKET;
RPAREN;
SEMI;
TARS_BOOL;
TARS_BYTE;
TARS_CONST;
TARS_CONST;
TARS_DOUBLE;
TARS_ENUM;
TARS_FLOAT;
TARS_IDENTIFIER;
TARS_INCLUDE;
TARS_INT;
TARS_INTEGER_LITERAL;
TARS_INTERFACE;
TARS_KEY;
TARS_LONG;
TARS_MAP;
TARS_NAMESPACE;
TARS_OPERATION;
TARS_OPTIONAL;
TARS_OUT;
TARS_PARAM;
TARS_REF;
TARS_REQUIRE;
TARS_ROOT;
TARS_ROUTE_KEY;
TARS_SHORT;
TARS_STRING;
TARS_STRING_LITERAL;
TARS_STRUCT;
TARS_STRUCT_MEMBER;
TARS_UNSIGNED;
TARS_VECTOR;
TARS_VOID;
}

start : ( include_def )* ( namespace_def )+ ;
include_def : TARS_INCLUDE TARS_STRING_LITERAL ;
namespace_def : TARS_NAMESPACE TARS_IDENTIFIER LBRACE ( definition SEMI )+ RBRACE ;
definition : ( const_def | enum_def | struct_def | key_def | interface_def );
const_def : TARS_CONST type_primitive TARS_IDENTIFIER EQ v= const_initializer ;
enum_def : ( TARS_ENUM TARS_IDENTIFIER LBRACE TARS_IDENTIFIER ( COMMA TARS_IDENTIFIER )* ( COMMA )? RBRACE );
struct_def : TARS_STRUCT TARS_IDENTIFIER LBRACE ( struct_member SEMI )+ RBRACE ;
struct_member : TARS_INTEGER_LITERAL (r= TARS_REQUIRE |r= TARS_OPTIONAL ) type TARS_IDENTIFIER ( EQ v= const_initializer )? ;
key_def : TARS_KEY LBRACKET n= TARS_IDENTIFIER ( COMMA k+= TARS_IDENTIFIER )+ RBRACKET ;
interface_def : TARS_INTERFACE TARS_IDENTIFIER LBRACE ( operation SEMI )+ RBRACE ;
operation : type TARS_IDENTIFIER LPAREN ( param ( COMMA param )* )? RPAREN ;
param : ( TARS_ROUTE_KEY )? ( TARS_OUT )? type TARS_IDENTIFIER ;
type : ( type_primitive | type_vector | type_map | type_custom );
type_primitive : ( TARS_VOID );
type_vector : TARS_VECTOR LT type GT ;
type_map : TARS_MAP LT type COMMA type GT ;
type_custom : ( TARS_IDENTIFIER ) ;
const_initializer : ( TARS_INTEGER_LITERAL | TARS_FLOATING_POINT_LITERAL | TARS_STRING_LITERAL | TARS_FALSE | TARS_TRUE );

With this grammar in hand, I would highly advise you move to Antlr4. If you insist on an AST rather than CST constructed, you can write a bottom-up visitor to synthesize the AST, but keep with Antlr4.


Hi,Bro.
I have tranlated the lexer antlr3 gramar file(The link is :https://github.com/walkertest/TarsJava/blob/feature/antlrfind/tools/tars-maven-plugin/src/main/resources/antlr/TarsLexer.g). I met these questions. (env : 3.5 antlr3 version.)
The first is the java comment is a little different.
image

The second is the COMMENT define deifferrence like this:
image

Thanks first.


Update-2022-05-10
These questions has been fixed.
Don't care about them.

@walkertest
Copy link
Author

antlr/antlr3#208

I have move this issue to antlr3. @KvanTTT @kaby76
I meet three remaining questions . Please help me.

Thanks first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants