Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the verilog/systemverilog grammar generates parsers that behave differently when the target is java or c++ #2401

Closed
ogheri opened this issue Nov 24, 2021 · 18 comments
Labels
system-verilog target:cpp Grammars for C++ target, https://github.com/antlr/antlr4/blob/master/doc/cpp-target.md target:java Grammars for Java target, https://github.com/antlr/antlr4/blob/master/doc/java-target.md verilog

Comments

@ogheri
Copy link

ogheri commented Nov 24, 2021

Hi dear all, I was interested in the verilog/systemverilog grammar and tried to use it with the cpp (c plus plus) target.
Well , when I do

java org.antlr.v4.Tool -Dlanguage=cpp SystemVerilogLexer.g4 SystemVerilogParser.g4 and then

g++ -o antlrparser *cpp -I. -I.....Cpp/runtime/src -L ...../runtime/Cpp/dist -lantlr4-runtime

that successfully creates the executable (provided I added a main.cpp that instantiates a Lexer and a Parser (by the way, why do I get a parser that is called SystemVerilogParserParser (with 2times Parser ? ?) ?) and then starts with the parsing rule "source_text" I get that all the "examples/bla bla" systemverilog designs are not parsed correctly...

doing the same in java, creating the java parser and lexer and running the java compiled and created equivalent main java program, instead, works fine ...

Please see also the attached snapshot...
VirtualBox_Ubuntu_24_11_2021_15_13_16

@KvanTTT KvanTTT added system-verilog target:cpp Grammars for C++ target, https://github.com/antlr/antlr4/blob/master/doc/cpp-target.md target:java Grammars for Java target, https://github.com/antlr/antlr4/blob/master/doc/java-target.md verilog labels Nov 24, 2021
@kaby76
Copy link
Contributor

kaby76 commented Nov 24, 2021

Right. Reading your main() driver, you call the lexer independently before the parser. You don't reset the lexer state before the parse. You really should try creating the CommonTokenStream and pass that to the parser without any intervening calls to "fill()" or "getTokens()". Just don't touch the token stream--let the parser deal with it. Or, you can call "reset()" on the CommonTokenStream, I think. In the driver I generate with trgen, I do something like this.

@ogheri
Copy link
Author

ogheri commented Nov 25, 2021 via email

@ogheri
Copy link
Author

ogheri commented Nov 25, 2021

thx, unfortunately still this code:

#include <iostream>
#include <fstream>

#include "antlr4-runtime.h"
#include "SystemVerilogLexer.h"
#include "SystemVerilogParserParser.h"

using namespace antlr4;

int main(int argc, const char **args) {
  std::ifstream ins;

  ins.open(args[1]);
  ANTLRInputStream input(ins);
  SystemVerilogLexer lexer(&input);
  CommonTokenStream tokens(&lexer);

 //  tokens.fill();
//   for (auto token : tokens.getTokens()) {
//     std::cout << token->toString() << std::endl;
//   }
//   tokens.reset();
  SystemVerilogParserParser parser(&tokens);
  tree::ParseTree* tree = parser.source_text();

  std::cout << tree->toStringTree(&parser) << std::endl << std::endl;

  return 0;
}

produces :

ogheri@ogheri-VirtualBox:~/grammars-v4-master/verilog/systemverilog$ ./antlrparser examples/async_fifo.sv  g++ -o antlrparser  *cpp -I. -I/usr/local/lib/antlr4/runtime/Cpp/runtime/src -L /usr/local/lib/antlr4/runtime/Cpp/dist -lantlr4-runtime 
line 4:7 no viable alternative at input 'module async_fifo'
line 4:18 token recognition error at: '#'
line 4:19 token recognition error at: '('
line 5:25 token recognition error at: '='
...

and this code:

/* Copyright (c) 2012-2017 The ANTLR Project. All rights reserved.
 * Use of this file is governed by the BSD 3-clause license that
 * can be found in the LICENSE.txt file in the project root.
 */

//
//  main.cpp
//  antlr4-cpp-demo
//
//  Created by Mike Lischke on 13.03.16.
//

#include <iostream>
#include <fstream>

#include "antlr4-runtime.h"
#include "SystemVerilogLexer.h"
#include "SystemVerilogParserParser.h"

using namespace antlr4;

int main(int argc, const char **args) {
  std::ifstream ins;

  ins.open(args[1]);
  ANTLRInputStream input(ins);
  SystemVerilogLexer lexer(&input);
  CommonTokenStream tokens(&lexer);

  tokens.fill();
  for (auto token : tokens.getTokens()) {
    std::cout << token->toString() << std::endl;
  }
  tokens.reset();
  SystemVerilogParserParser parser(&tokens);
  tree::ParseTree* tree = parser.source_text();

  std::cout << tree->toStringTree(&parser) << std::endl << std::endl;

  return 0;
}

produces:

ogheri@ogheri-VirtualBox:~/grammars-v4-master/verilog/systemverilog$ ./antlrparser examples/async_fifo.sv 
line 4:18 token recognition error at: '#'
line 4:19 token recognition error at: '('
line 5:25 token recognition error at: '='
...
line 4:7 no viable alternative at input 'module async_fifo'

the java one:

alias grun='java org.antlr.v4.runtime.misc.TestRig'

produces :

ogheri@ogheri-VirtualBox:~/grammars-v4-master/verilog/systemverilog$ grun SystemVerilogParser source_text   examples/async_fifo.sv   -tree -gui
Warning: TestRig moved to org.antlr.v4.gui.TestRig; calling automatically
(source_text (description (module_declaration (module_ansi_header (module_keyword module) (module_identifier (identifier async_fifo)) (parameter_port_list # ( (parameter_port_declaration (parameter_declaration parameter (data_type_or_implicit implicit_data_type) (list_of_param_assignments (param_assignment (parameter_identifier (identifier DATA_WIDTH)) = (constant_param_expression (constant_mintypmax_expression (constant_expression (constant_primary (primary_literal (number 8)))))))))) , (parameter_port_declaration (parameter_declaration parameter (data_type_or_implicit implicit_data_type) (list_of_param_assignments (param_assignment (parameter_identifier (identifier ADDR_WIDTH)) = (constant_param_expression (constant_mintypmax_expression (constant_expression (constant_primary (primary_literal (number 3)))))))))) , (parameter_port_declaration (parameter_declaration parameter (data_type_or_implicit implicit_data_type) (list_of_param_assignments (param_assignment (parameter_identifier (identifier PIPELINE_LENGTH)) = (constant_param_expression (constant_mintypmax_expression (constant_expression (constant_primary (primary_literal (number 1)))))))))) )) ...

and the tree is seen in the snapshot, which seems much more correct to me ...
(but I admit I am still pretty inexpert and newbie for what concerns ANTLR) ...

Screenshot from 2021-11-25 08-49-09

@kaby76
Copy link
Contributor

kaby76 commented Nov 25, 2021

The systemverilog grammar appears to be a split grammar but it's not done right. For example, the parse fails at the start of a #. The lexer grammar doesn't have a token type for that. Further the parser grammar isn't really a parser grammar because it doesn't have a "parser grammar" declaration, instead a combined "grammar" declaration. So, it is a combined grammar and has a lexer grammar. The grammar is messed up.

@ogheri
Copy link
Author

ogheri commented Nov 25, 2021 via email

@KvanTTT
Copy link
Member

KvanTTT commented Nov 25, 2021

@ogheri please strip such long messages, keep only important info. It's not informative for other users and it interrupts messages flow.

@kaby76
Copy link
Contributor

kaby76 commented Nov 25, 2021

This is a bug in the grammar; it should be fixed. I've noticed malformed split/combined grammars elsewhere in grammars-v4 that I've had to fix recently. So, it would be a good idea to add a check for this to the Antlr tool, and/or Antlr Maven plugin (which calls the tool), and/or trgen to detect and flag malformed split/combined grammars. I'll make a note in Trash to add the feature. I don't have time to fix this grammar. If someone would like to do that, excellent. I am still working on the scraping problem for ISO C++, with a diversion into converting the C grammar into Lark syntax for the Lark parser generator.

@ogheri
Copy link
Author

ogheri commented Nov 25, 2021 via email

@msagca
Copy link
Contributor

msagca commented Nov 25, 2021

Hi @ogheri,
I'm the author of the SystemVerilog grammar. I know it's a mess. I've only (kind of) verified it for Python and Java targets. TBH, I was an ANTLR newbie when I wrote that grammar. I'm still reading through the The Definitive ANTLR 4 Reference. I'm currently working on an updated Verilog grammar in this fork. I will turn SystemVerilog into a combined grammar to fix this issue.

@msagca
Copy link
Contributor

msagca commented Nov 25, 2021

Hi @ogheri,
For now, you can use this grammar. If you experience any other issue, please let me know.

@ogheri
Copy link
Author

ogheri commented Nov 26, 2021

this now WORKS GREAT!!!! thx a lot Mustafa and....by the way...no need to say that it is a mess!
I know people contributing on freeware and open source things, do that on their spare time and without compensation and I am TRULY grateful to you and the other nice people here that helped me immensely!!!!
thx!!

@ogheri ogheri closed this as completed Nov 26, 2021
@ogheri
Copy link
Author

ogheri commented Nov 26, 2021

now for my needs and for the testing I did so far, i.e. on the 4 .sv systemverilog examples inside the "examples" directory plus just some of me I have not found errors...

@kaby76
Copy link
Contributor

kaby76 commented Nov 26, 2021

@ogheri Could you please check my PR #2402 as to whether it runs with the Cpp target? I understand that you are using a different grammar now, but the grammar here really should not be left broken. I cannot add reviewers in the GitHub system because I don't have write privileges to the repo, which is fine. Github is ineptly designed so I can't add a reviewer.

@ogheri
Copy link
Author

ogheri commented Nov 26, 2021 via email

@ogheri
Copy link
Author

ogheri commented Nov 26, 2021

@ogheri Could you please check my PR #2402 as to whether it runs with the Cpp target? I understand that you are using a different grammar now, but the grammar here really should not be left broken. I cannot add reviewers in the GitHub system because I don't have write privileges to the repo, which is fine. Github is ineptly designed so I can't add a reviewer.

Hi Ken and Kaby (or...ehr...is the same ? ?) (sorry if i am confused...)

YES it works perfectly with the cpp target and I do not have any warning any more now, FIRST

and ...SECOND.....is now SIGNIFICANTLY faster...
probably what you made makes the state machine discover the right production of the grammatic quicker than before

thx again to all of you (Mustafa included, though, for the original files, too...) !

@kaby76
Copy link
Contributor

kaby76 commented Nov 26, 2021

Hi Alex, Thank you for checking this quickly. I'm happy this is working. I have a few ideas on things to add to Trash for tranalyze. "Ken Domino" is my real name; "kaby" my user name, from a friend over 30 years ago. --Ken

@ogheri
Copy link
Author

ogheri commented Nov 26, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system-verilog target:cpp Grammars for C++ target, https://github.com/antlr/antlr4/blob/master/doc/cpp-target.md target:java Grammars for Java target, https://github.com/antlr/antlr4/blob/master/doc/java-target.md verilog
Projects
None yet
Development

No branches or pull requests

5 participants
@KvanTTT @kaby76 @ogheri @msagca and others