Fix issue #79: Incorrect parsing of complex ++ expressions #215

fioriandrea · 2023-10-11T18:54:39Z

As the issue pointed out, there was an error while parsing complex post-increment expressions involving field expressions. The fix on the parser was easy enough. However, while working on the parser, I realized that there was a bug in the compiler too. For example:

./goawk 'BEGIN { $0="3 4 5 6 7 8 9"; a=3; print ($($a++)++); print $0; print a }' prints 7\n3 4 7 6 7 8 9\n3, whereas

gawk 'BEGIN { $0="3 4 5 6 7 8 9"; a=3; print ($($a++)++); print $0; print a }' prints 7\n3 4 6 6 8 8 9\n3

./goawk 'BEGIN { n = split("12345", a, ""); i = 1; a[a[a[1]++]++]++; for (i=1;i<=n;i++) printf a[i]; print } prints 34345\n whereas

gawk 'BEGIN { n = split("12345", a, ""); i = 1; a[a[a[1]++]++]++; for (i=1;i<=n;i++) printf a[i]; print } prints 33345\n.

This pull request fixes both issues. I uncommented the TODO tests in interp/interp_test.go and added one more.

Fixes #79

post-increment/post-decrement operators there is still a bug in the compiler and vm for this kind of expression

benhoyt · 2023-10-11T19:29:09Z

This looks great at a glance, and fixes a long-standing issue. Thanks @fioriandrea! I'll take a look at the code change more closely in the next few days.

benhoyt

Thanks, this is good work, and as I've noted has uncovered some other (related) issues! Let me know how you'd like to proceed.

parser/parser.go

internal/compiler/compiler.go

benhoyt · 2023-10-14T03:42:48Z

internal/compiler/compiler.go

 		} else {
-			c.expr(e.Expr)
+			c.incrExpr(e)


You've actually uncovered (or made me notice) a larger problem with GoAWK here! All the augmented assignment operations that assign to an lvalue -- ++ / --, += and other augmented assignment operators, and sub / gsub (because they can assign to an lvalue) -- are buggy when used as expressions too.

They evaluate the field/array index twice. For example, in the above, c.expr(e.Expr) evaluates the field/array index for FieldExpr/IndexExpr, and the c.assign(e.Expr) does it again. This is a bug.

I've added a bunch of tests in this Gist that show this.

So we actually need to step back a bit and fix it in expr for IncrExpr, AugAssignExpr, and CallExpr for the sub / gsub cases. I had a play with this today, but it's not trivial!

My current thinking is to not use c.expr() directly on the expression, but use a function with only handles lvalues and saves the index away somewhere to be used by c.assign later. But I'm not sure -- there's bound to be a more elegant way to solve this.

If you'd like to suggest a good way to fix all of these, go ahead. Otherwise we can reduce this PR to just fixing the parsing issue and I'll address the compiler bugs in a separate PR.

One possible solution could be to introduce a data structure on the virtual machine side (like a parallel stack just for the indices) where you can save the indices and then use them within c.assign by fetching them with specific instructions.

Another type of solution could be to create functions similar to my function incrExpr that handle all cases (one for IncrExpr with pre-increment, one for IncrExpr with post-increment, one for AugAssignExpr, and one for sub and gsub, all with switch statements similar to those in my incrExpr function). I played a bit with this approach and it seemed to work, but there was lots of repetitive code. Maybe it can be factored somehow. Also, I didn't know exactly how to handle sub/gsub without introducing a new instruction in the VM, since sub/gsub push two values on the stack.

A final approach (which is what I committed) is to save the index on the stack and retrieve it using the Roll instruction (similar to the roll instruction in Forth), which allows you to move an element from a lower position on the stack to the top. So, essentially, the solution is similar to the first one, but instead of creating a new data structure, the stack is reused. I understand that this approach might be inefficient as you need to "shift" the stack by a few elements each time you call Roll. I was tempted to just copy the value without shifting the stack, but that way there would remain a lot of garbage on the stack. Also, the code I committed for this approach could be cleaned up a bit (maybe) though.

Let me know what you think

Thanks for this! I think your basic approach is great. I've pushed a couple of commits to simplify a little: we don't need the full "roll-by-n" functionality, just what I've called Rote (rotate 3, which is called ROT in Forth). So I've replaced your Roll opcode with Rote which has the stack effect a b c -- b c a.

I've also tweaked the names of a couple of functions and removed some redundancy.

In addition, I removed the new ast.IsIndexed function, as I'd rather not export a new function just for that (it's really an implementation detail of the compiler). So I just did a type switch for those, which is only one line longer.

Let me know what you think, and if you can see anything else that needs adjusting before merging the PR.

Oh, just in case: @fioriandrea

I think the code can be merged. I just committed one last additional test for nested gsub expressions (I think this case is worth testing, since gsub/sub, as opposed to the other augmented assignment operators, pushes two values on the stack)

… post-increment is necessary

Implement Roll instruction to save the index on the stack and retrieve it at a later point through Roll

benhoyt · 2023-10-19T19:04:23Z

Thanks again @fioriandrea for the fix!

fioriandrea added 3 commits October 11, 2023 18:31

fix parsing issue for nested field expressions and

ede779e

post-increment/post-decrement operators there is still a bug in the compiler and vm for this kind of expression

Fix bug on nested post-increment operations in the compiler

41bd338

don't skip test parse1

29c8436

benhoyt requested changes Oct 14, 2023

View reviewed changes

fioriandrea and others added 10 commits October 14, 2023 12:29

Rename temporary variable in switch condition while parsing DOLLAR

dc7ff2d

add comment explaining why special handling for field expressions and…

488c1ab

… post-increment is necessary

Restore optimization for IncrExpr in stmt

a0941a2

Fix augmented assignment and gsub / sub for indexed expressions

328d568

Implement Roll instruction to save the index on the stack and retrieve it at a later point through Roll

Drive-by fix to fuzz test (go vet flagged this)

7837367

Remove ast.IsIndexed; simplify and reduce some redundancy; Roll -> Rote

af7ef5a

Test that we evaluate Right before Left in AugAssignExpr

4031ee7

Oops, reinstate blank line

ba72474

Remove debugging comments from opcodes.go

7e591de

add test for nested gsub expressions

e7adc65

benhoyt merged commit 9241da4 into benhoyt:master Oct 19, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue #79: Incorrect parsing of complex ++ expressions #215

Fix issue #79: Incorrect parsing of complex ++ expressions #215

fioriandrea commented Oct 11, 2023 •

edited by benhoyt

benhoyt commented Oct 11, 2023

benhoyt left a comment

benhoyt Oct 14, 2023

fioriandrea Oct 14, 2023 •

edited

benhoyt Oct 19, 2023

benhoyt Oct 19, 2023

fioriandrea Oct 19, 2023 •

edited

benhoyt commented Oct 19, 2023

Fix issue #79: Incorrect parsing of complex ++ expressions #215

Fix issue #79: Incorrect parsing of complex ++ expressions #215

Conversation

fioriandrea commented Oct 11, 2023 • edited by benhoyt

benhoyt commented Oct 11, 2023

benhoyt left a comment

Choose a reason for hiding this comment

benhoyt Oct 14, 2023

Choose a reason for hiding this comment

fioriandrea Oct 14, 2023 • edited

Choose a reason for hiding this comment

benhoyt Oct 19, 2023

Choose a reason for hiding this comment

benhoyt Oct 19, 2023

Choose a reason for hiding this comment

fioriandrea Oct 19, 2023 • edited

Choose a reason for hiding this comment

benhoyt commented Oct 19, 2023

fioriandrea commented Oct 11, 2023 •

edited by benhoyt

fioriandrea Oct 14, 2023 •

edited

fioriandrea Oct 19, 2023 •

edited