Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go target] Antlr Visitor does not work properly #4398

Closed
kaby76 opened this issue Aug 28, 2023 · 26 comments
Closed

[Go target] Antlr Visitor does not work properly #4398

kaby76 opened this issue Aug 28, 2023 · 26 comments

Comments

@kaby76
Copy link
Contributor

kaby76 commented Aug 28, 2023

The Visitor tree walker for the Go target does not seem to work properly, at least how it works in comparison to all the other targets. This is a continuation of #4395. I don't know why that was closed, the initial comment completely erased, and #4404 opened. In any case, #4398 is a detailed analysis on the Visitor code for the Go target.

Starting with the original grammar, I added debugging output to all visitor code "derived" in the parent struct of "BaseExprVisitor" (aka "Visitor"), the "BaseExprVisitor" struct, and the structs and functions in the Antlr4 Go runtime. The "workaround" (func (v *Visitor) Visit(t antlr.ParseTree) any ), which is claimed to fix the problem of getting func (v *Visitor) VisitStart(ctx *parser.StartContext) any to be called, does not really fix the issues with the Visitor code. The problem goes far beyond that. Replacing the VisitStart() function with a VisitExpr() or VisitDecl() function shows that the "work-around" does not work because VisitExpr() or VisitDecl() are not called. The default implementation generated by the Antlr tool does not actually traverse the parse tree! You can test this by simply adding debugging output to the generated "Visit" functions for Visit, BaseExprVisitor, and BaseParseTreeVisitor and see for yourself.

For an example that does work, see https://github.com/kaby76/antlr-4395. But, this implementation is not what I expected.

I am not clear what the Visitor code is supposed to look like. There is no example in the documentation.

VisitChildren() does not have an implementation

For all targets except Go, the base class code has an implementation that loops through all children of a node and calls Visit() on the child. There is no function like this anywhere in the Antlr Go runtime, nor is it generated by the tool. The Antlr tool generates code that contains VisitChildren() function calls.

func (v *BaseExprVisitor) VisitProg(ctx *ProgContext) interface{} {
	return v.VisitChildren(ctx)
}

For all targets, the code does exactly what is expected.

For Go, the only function that defines VisitChildren() is:

func (v *BaseParseTreeVisitor) VisitChildren(_ RuleNode) interface{}     { return nil }

I have to conclude that the Visitor code cannot work as is because there is no implementation for VisitChildren(). One must implement the function for Go.

Unclear how to define the derived Visitor struct--or interface

To use the code, all functions need to be defined for the composite visitor struct.

Is it this?

type Visitor struct {
    *parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.ExprVisitor
}

In any of these implementations, one needs to define every function associated with the visitor: Visit(), VisitChildren(), VisitProg(), VisitDecl(), VisitExpr(), VisitTerminal(), VisitErrorNode(). Otherwise, it either doesn't work or crashes.

In order to get the code to work, I essentially needed to reimplement the entire code from scratch.

If this is true, then this is a significant departure from how most people would understand how the code should work. Yes, Go does not support Inheritance. All the more, the documentation should provide an explicit, working example, complete with driver, grammar, visitor, go.mod, and makefile.

Personally, I don't use the Visitor code because semantic analysis is better served with the Listener code (for LR-attribute analysis), which people also claim to work.

@jimidle
Copy link
Collaborator

jimidle commented Aug 30, 2023 via email

@mkohlhaas
Copy link

Is it this?

type Visitor struct {
    *parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.ExprVisitor
}

The last one is not in accordance to the Java implementation.
The first one crashes.

Personally, I don't use the Visitor code because semantic analysis is better served with the Listener code (for LR-attribute analysis), which people also claim to work.

Most projects use the listener interface. But the visitor gives you more control because you explicitly have to - and want to, depending on your use case of course - call the appropriate visit methods.

Say e.g. you have the following grammar:

prog: (declaration| expression)+
...
<more stuff>
...

Then you can easily first go through all your declarations and afterwards your expressions.

@mkohlhaas
Copy link

mkohlhaas commented Sep 4, 2023

  • CSharp
  • Cpp
  • Dart
  • Java
  • JavaScript/TypeScript
  • Python3
  • Swift
  • PHP

I guess all the other languages have proper class/object inheritance and method overloading. Golang has only embedding and interfaces. Makes it much more difficult to write a port from Java.

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 8, 2023

https://groups.google.com/g/antlr-discussion/c/gEiAPs-XGps

I cannot see how this person's code targeted for Go can work whatsoever. I have not replied there because the Google Antlr Discussion Group is only for announcements (but we haven't seen Antlr 4.13.1 announced either).

@cwarden
Copy link

cwarden commented Nov 28, 2023

@thesues provides a working example, (updated to antlr 4.13 in https://github.com/cwarden/antlr-calc-golang-example/blob/master/visitor/main.go) that looks like this:

type Visitor struct {
	parser.BaseCalcVisitor
}

func (v *Visitor) visitRule(node antlr.RuleNode) interface{} {
	node.Accept(v)
	return nil
}

func (v *Visitor) VisitStart(ctx *parser.StartContext) interface{} {
	return v.visitRule(ctx.Expression())
}
...
func main() {
	is := antlr.NewInputStream(input)

	// Create the Lexer
	lexer := parser.NewCalcLexer(is)
	tokens := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

	// Create the Parser
	p := parser.NewCalcParser(tokens)

	v := &Visitor{}
	v.Start_().Accept(v)
}

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 28, 2023

@cwarden That example illustrates my point. Every single production in the grammar requires an implementation of the visit function. If you do not implement the VisitStart() function, then the expression nodes are never visited. For all other targets, that is not the case because there is an implementation for VisitChildren(). There is none in the Go runtime. For a toy example like Calc.g4, implementing a handful of extra visitor functions in the chain of notes from the root is not hard. But, for a large grammar with hundreds of rules, this becomes an issue.

@mburbidg
Copy link

One other thing. Error handling is very awkward! The visitor methods should all return (interface{}, error). This code is very Java-centric, and doesn't appear to be written with a deep understanding of Go.

@jimidle
Copy link
Collaborator

jimidle commented Jan 24, 2024 via email

@mkohlhaas
Copy link

One other thing. Error handling is very awkward! The visitor methods should all return (interface{}, error). This code is very Java-centric, and doesn't appear to be written with a deep understanding of Go.

Please be constructive. A lot of code had to be written for this library. I appreciate the effort a lot!

@mburbidg
Copy link

I apologize, for my un-constructive comment.

I don't doubt that it works. And I appreciate the work that went into it. I think what is at issue is the WAY it works. Some very constructive suggestions have been made in this thread and some valid reasons as to why the way the code currently works is much less than ideal. Including the fact that the current implementation does not account for how error handling is done in Go. This applies to the listener as well as the visitor.

I would be willing to do some incremental work on this to improve it. It is unclear to me whether improvements would be considered or not. I don't want to do a lot of work, submit a PR and then have it rejected out of hand.

Is there a process for proposing a change, that is then discussed, and agreed to before work begins? I imagine that one of the issues is backwards compatibility. Is the runtime versioned, so that folks could adopt the changes as needed?

I would suggest making changes in stages. The first PR would be to change the return type of both the listener and visitor to include an error. The second would be to include default behavior to the visitor, similar to the default behavior present in some of the other implementations. Of course this cannot be done via inheritance, but there are other ways to do it in Go.

What do you think?

@ycl2018
Copy link

ycl2018 commented May 10, 2024

@kaby76 hey! i fixed this problem in my repo , just replace the field of BaseParseTreeVisitor to interface antlr.ParseTreeVisitor in the generated file pie_base_visitor.go. I rename this file to pie_base_visitor_modify.go in my repo, and use my own BaseParseVisitor which implements antlr.ParseTreeVisitor!, it works well!

@jimidle
Copy link
Collaborator

jimidle commented May 10, 2024 via email

@ycl2018
Copy link

ycl2018 commented May 11, 2024

@jimidle Yes, I shouldn't have modified the generated code. But I can't complete my Visitor through other means. I think this should be a problem with the implementation of antlr4 go runtime. The generated BaseVisitor has a *BaseParseTreeVisitor field embedded in it which is a struct rather than interface{}(what i do in my repo is to fix this), so any implementation that 'inherits' BaseVisitor cannot really override its VisitChildren() or other methods. and *BaseParseTreeVisitor does not implement methods other than the Visit method, so it cannot be used at all.

As you mentioned, in the Java version implementation, the BaseVisitor inherits AbstractParseTreeVisitor, which has a complete implementation of VisitChildren and so on other methods. And users can implement custom methods by overriding its methods.

You should not need to change the generated code. I have a dozen visitors. You need to embed the generated base visitor in your own struct and go from there. I still have not had time to make a demo of this. Jim

On May 9, 2024, at 21:29, ycl2018 @.***> wrote: hey! i fixed this problem in my repo https://github.com/ycl2018/pie-go , just replace the field of BaseParseTreeVisitor to interface antlr.ParseTreeVisitor in the generated file pie_base_visitor.go https://github.com/ycl2018/pie-go/blob/main/gen/pie_base_visitor_modify.go. I rename this file to pie_base_visitor_modify.go in my repo, and use my own BaseParseVisitor https://github.com/ycl2018/pie-go/blob/main/tree/basevisitor.go which implements antlr.ParseTreeVisitor!, it works well! — Reply to this email directly, view it on GitHub <#4398 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDANQG4CBLM5HEA2MTZBQ5JRAVCNFSM6AAAAAA4B3MC36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBTG44TEMZSGM. You are receiving this because you commented.

@jimidle
Copy link
Collaborator

jimidle commented May 11, 2024

Go is not Java. There is no need for an interface. All you do is this (Listener here but it is the same):

// Listener is a listener that collects information about the program and performs a kind of
// lint analysis.
type Listener struct {
	parsing.BaseMVBListener. // Go is NOT Java. There is no inheritence
	errCount int
        // Whatever else you want in your Listener
}

func (s *Listener) HasErrors() bool {
	return s.errCount > 0
}

func (t *Listener) EnterEveryRule(ctx antlr.ParserRuleContext) {
	// fmt.Println("Entering rule", ctx.GetRuleIndex(), "at line", ctx.GetStart().GetLine(), "col", ctx.GetStart().GetColumn())
}

func (t *Listener) ExitEveryRule(ctx antlr.ParserRuleContext) {
	//fmt.Println("Exiting rule", ctx.GetRuleIndex(), "at line", ctx.GetStart().GetLine(), "col", ctx.GetStart().GetColumn())
}

func (t *Listener) ExitTunit(ctx *parsing.TunitContext) {
  // Overrides generated base func
  }
  
lexer.SetInputStream(t.InStream)
tokstream.SetTokenSource(t.Lexer)
parser.SetTokenStream(t.TokStream)
tree := parser.TranslationUnit()
// check errors
antlr.ParseTreeWalkerDefault.Walk(&Listener{}, tree)

@jimidle
Copy link
Collaborator

jimidle commented May 11, 2024

The embedded * of the abstract visitor is wrong, (shoudl not be a pointer) but for reasons that I cannot now remember I decided not to fix that. I think too much of the runtime expected it to be a pointer. I probably have a note somewhere telling me to fix it at some point. That was code I inherited.

But all you need do is embedded in your various vistors and start accept() ing.

// Code generated from MVB.g4 by ANTLR 4.12.0. DO NOT EDIT.

package parsing // MVB
import "github.com/antlr4-go/antlr/v4"

type BaseMVBVisitor struct {
	*antlr.BaseParseTreeVisitor
}

func (v *BaseMVBVisitor) VisitTranslationUnit(ctx *TranslationUnitContext) interface{} {
	return v.VisitChildren(ctx)
}
...

type MyVisitor struct {
  BaseMVBVisitor
}

func (v * MyVisitor) VisitTranslationUnit(ctx * TranslationUnitContext) someresult {
  x = ctx.something.accept(v)
  // do something
  // return the reult you want
}

And so on.

@mburbidg
Copy link

mburbidg commented May 18, 2024

I would like to give this one more shot. I am building a grammar and front-end for a very large language. I want to use the Visitor pattern, not the Listener because of the flexibility it provides in controlling the visit path and error handling. Many of my Visit methods look as follows:

func (v *Visitor) VisitLinearCatalogModifyingStatementAlt(ctx *gen.LinearCatalogModifyingStatementAltContext) interface{} {
	return ctx.LinearCatalogModifyingStatement().Accept(v)
}

These nodes have a single child and essentially call Accept on that single child.

To remove a lot of boilerplate code, I'd like to be able to provide a VisitChildren method that is used by default, if a more specific method is not provided. It would look something like this:

func (v *Visitor) VisitChilrdren(ctx *?) interface{} {
    return ctx.GetChild(0).Accept()
}

The Go runtime looks like it was designed to enable this, but as it is currently constructed it does not. You have to provide an implementation of every single Visit Method. There's no default traversal and you cannot provide a default traversal, without modifying generated code. I believe this is what @kaby76 was trying to convey as well as the fact that the other runtimes provide this capability.

As you know, Go does provide the ability to "override" functions on an embedded struct. I think the problem with the current implementation is where the VisitChildren function is declared.

A default implementation of this method is provided on the antlr.BaseParseTreeVisitor as follows.

func (v *BaseParseTreeVisitor) VisitChildren(_ RuleNode) interface{}     { return nil }

I believe if you moved this default implementation to the generated base class (e.g. BaseMVBVisitor) then it could be overriden in my visitor (e.g. MyVisitor).

This would provide significant reduction of boilerplate code for many real grammars. As has been pointed out, for toy and sample grammars this is not an issue. Just provide a Visit function for every rule.

@jimidle
Copy link
Collaborator

jimidle commented May 18, 2024

There is no need to provide an implementation of every visit function. The default in the base visitor should accept automatically on. its children. I am afraid I don't have time to create a full example. I think you can override visitChildren, but IIRC that just calls accept on the children. I am not sure it would be that useful to try and do everything in that func.

Maybe you can provide your code to me (privately if you wish) and I can see what you are trying to achieve.

To the person asking for an error signature - while that is indeed a common pattern in Go, it is not how you walk parser trees. Create an error collector and make it avaiable in the struct that embeds BaseVisitor, you can also create nodes that mark errors and unresolved. Remember that this is a parse tree walker - generally, you would construct some kind of AST using it. Don't throw exceptions or return errors in the walker.

At some point I would like to generate a generic visitor and listener. I touched the listener and visitor the least in the huge changes made to make the runtime work. It probably is time to give the walkers some attention.

type JimVisitor struct {
        // Base generated by ANTLR
	parsing.BaseMVBVisitor
	// Add things like an error collector here
}

func (v *JimVisitor) VisitChildren(node antlr.RuleNode) interface{} {
       // This is what the default implementation does, I just added a print
	fmt.Println("Visiting children. Rule", node.GetRuleContext().GetRuleIndex())
	return v.BaseMVBVisitor.VisitChildren(node)
}

// Override any other visitors here - this example is from a working compiler

Here is how the very base of the visitor is defined (this is all code in the source of course):

type ParseTreeVisitor interface {
	Visit(tree ParseTree) interface{}
	VisitChildren(node RuleNode) interface{}
	VisitTerminal(node TerminalNode) interface{}
	VisitErrorNode(node ErrorNode) interface{}
}


type BaseParseTreeVisitor struct{}

var _ ParseTreeVisitor = &BaseParseTreeVisitor{}

func (v *BaseParseTreeVisitor) Visit(tree ParseTree) interface{}         { return tree.Accept(v) }
func (v *BaseParseTreeVisitor) VisitChildren(_ RuleNode) interface{}     { return nil }
func (v *BaseParseTreeVisitor) VisitTerminal(_ TerminalNode) interface{} { return nil }
func (v *BaseParseTreeVisitor) VisitErrorNode(_ ErrorNode) interface{}   { return nil }

/// the generated code then usesu:


type BaseMVBVisitor struct {
        // Note that I obviously missed changing this from a pointer - I will do so, but the functionality is not affected
	*antlr.BaseParseTreeVisitor
}

func (v *BaseMVBVisitor) VisitTranslationUnit(ctx *TranslationUnitContext) interface{} {
	return v.VisitChildren(ctx)
}

So I embed the generated visitor in my own structure as above, and supply implementation of just those things I need.

You can ovveride visitChildren. The Accept functions are auto generated in the parser code.

func (s *TranslationUnitContext) Accept(visitor antlr.ParseTreeVisitor) interface{} {
	switch t := visitor.(type) {
	case MVBVisitor:
		return t.VisitTranslationUnit(s)

	default:
		return t.VisitChildren(s)
	}
}

Now, do I think that the tree walking is complete and needs no improvement? No.

One last thing is because of that pointer (which is definitely a bug), the BaseParseTreeVisitor will be nil unless you initlialize it. Maybe that is what is throwing people off. I will fix that for sure.

For now, you could remove the * from generated BaseVisitor

@jimidle
Copy link
Collaborator

jimidle commented May 18, 2024

@parrt - let's close this - I have supplied a number of examples and at some point I will create a full working repo. I don't think I can add any more here. If we do a 4.14.x then I will try and create generic Listeners and Walkers, though that will likely be a breaking change, which is difficult to do with go semver expectations, but I suppose people could stick with 4.13.1

@parrt parrt closed this as completed May 18, 2024
@jimidle
Copy link
Collaborator

jimidle commented May 18, 2024

One final comment though...

I have just implemented the visitor with Generics, fixed the interface definitions and fixed the generation of that stupid pointer. It will be another day or so before I push that to the development branch and the main tree. I guess I will give in and also get rid of the exp imports ;)

@mburbidg
Copy link

mburbidg commented May 18, 2024

I'm not sure why this does not work for me. As far as I can see, I'm set up just as in your example. But for me, VisitChildren is never called. In my example, I would expect VisitChildren to get called when ctx.ProgramActivity().Accept(v) is called, since I've commented out the corresponding visit function.

Here's is my visitor:

package parser

import (
	"fmt"
	"github.com/antlr4-go/antlr/v4"
	"github.com/mburbidg/mygql/gql/parser/ast"
	"github.com/mburbidg/mygql/gql/parser/gen"
	"github.com/mburbidg/mygql/utils"
	"strconv"
	"strings"
)

type Visitor struct {
	gen.BaseGQLVisitor
}

func NewVisitor() gen.GQLVisitor {
	return &Visitor{}
}

func (v *Visitor) VisitChildren(node antlr.RuleNode) interface{} {
	fmt.Println("Visiting children. Rule", node.GetRuleContext().GetRuleIndex())
	return v.BaseGQLVisitor.VisitChildren(node)
}
...
// Root of the tree visit function
func (v *Visitor) VisitGqlProgram(ctx *gen.GqlProgramContext) interface{} {
	pgm := &ast.GQLProgram{}
	if ctx.ProgramActivity() != nil {
		activity := ctx.ProgramActivity().Accept(v)
		pgm.Activity = activity
	}
	if ctx.SessionCloseCommand() != nil {
		pgm.CloseOnCompletion = true
	}
	return pgm
}

// Commented out visit function.
//func (v *Visitor) VisitProgramActivity(ctx *gen.ProgramActivityContext) interface{} {
//	return ctx.TransactionActivity().Accept(v)
//}

Here's my code that invokes the parser and visits the resulting tree.

	src := `CREATE SCHEMA IF NOT EXISTS /mymovies`
	input := antlr.NewInputStream(src)
	lexer := gen.NewGQLLexer(input)
	stream := antlr.NewCommonTokenStream(lexer, 0)
	p := gen.NewGQLParser(stream)
	errListener := newErrorListener()
	p.AddErrorListener(errListener)
	p.BuildParseTrees = true
	tree := p.GqlProgram()
	assert.Equal(t, 0, len(errListener.errors))
	visitor := NewVisitor()
	tree.Accept(visitor)

@YuriyKortev
Copy link

There are a lot of words in this discussion, but in fact the visitor does not work for Go.

There is no need to provide an implementation of every visit function.`

If some visit function not implemented, it will call generated method which call BaseParseTreeVisitor.VisitChildren which do nothing. Custom VisitChildren implementation will not help, because generated visitor knows nothing about custom visitor which embeds it.

@jimidle
Copy link
Collaborator

jimidle commented Jun 3, 2024

Yes standby for some updates

@onsi
Copy link

onsi commented Oct 2, 2024

@jimidle (gentle nudge) - any updates on this? Do you have ideas/direction for a PR? Perhaps some of us following this thread could contribute a fix?

@jimidle
Copy link
Collaborator

jimidle commented Oct 3, 2024 via email

@JoshSharpe
Copy link

I have a fix and a huge reduction in generated code but have not had time to complete it for release. It will be done before the next release. But it is not practical to use generics in the walker as go doesn’t have double dispatch

I've just started looking into how the code works to try to put up a PR and wanted to share a few thoughts. Hopefully as a summary of the above comments.

I think the patch can be broken down into parts:

  1. The visitChildren baked in should include some default behavior that checks for children and visits each. Here is a simple example:
	results := []interface{}{}
	children := node.GetChildren()
	for _, child := range children {

		termNode, ok := child.(TerminalNode)
		if ok {
			fmt.Println("termNode: ")
			newRes := v.Visit(termNode)
			results = append(results, newRes)
			continue
		}

		ruleNode, ok := child.(RuleNode)
		if ok {
			fmt.Println("ruleNode: ")

			newRes := v.Visit(ruleNode)
			results = append(results, newRes)
			continue
		}
	}
	fmt.Println("results: ", results)

	return results

I have shown in my local that this will help visit each child node. However, there are still issues so I haven't yet created a PR.

  1. Overwriting a single visitor node does not get picked up. This is because the visitor is using the "lower-level" object to determine what to do. This problem is harder to resolve because it is architectural. It's why people mention they have to re-write each node. In fact, the code above that would exist in tree.go right now would call the Visit/Accept method defined there. It would not even call the generated base code, much less the user-defined code on top of the generated code.

I believe this means that the ParseTreeVisitor may have to be part of the generated base_visitor code as well. By moving the VisitChildren to the base visitor it correctly calls. This still does not truly enable "overwrites" of the base function since it is referring to the lower type.

Sorry this does not offer any direct help or solutions. I am happy to help make code changes @jimidle but it seems you are currently doing it. If you'd like help feel free to post where you are or some docs that explain the direction you'd prefer.

It's a tricky problem to solve in go. I think the generated base_visitor struct has to also be used directly. Meaning that the pattern may be to generate the code as-is but any overwrite occur in the same package by moving the method into an impl file. Anytime code generation occurs again that method would have to be removed.

@jimidle
Copy link
Collaborator

jimidle commented Oct 14, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants