Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing "cond && var=value" #166

Closed
benhoyt opened this issue Jan 23, 2023 · 3 comments · Fixed by #170
Closed

Error parsing "cond && var=value" #166

benhoyt opened this issue Jan 23, 2023 · 3 comments · Fixed by #170

Comments

@benhoyt
Copy link
Owner

benhoyt commented Jan 23, 2023

This is valid in Gawk, onetrueawk, and mawk, but not in GoAWK (incidentally, frawk has the same issue).

$ goawk 'BEGIN { if (1 && x=1) print "t" }'
<cmdline>:1:19: expected ) instead of =
BEGIN { if (1 && x=1) print "t" }
                  ^

# Compare
$ gawk 'BEGIN { if (1 && x=1) print "t" }'
t

Discovered from the results at https://github.com/juntuu/advent_of_code_2022 (see results)

There are probably more issues this repo has found in GoAWK -- would be worth going through all the failures to find other unique issues.

@juntuu
Copy link
Contributor

juntuu commented Jan 26, 2023

I haven't gone through all the cases in my repo, but there are three differences (from all gawk, onetrueawk and mawk) I remember having noticed:

  1. the $++a one, that has issue open already
  2. the one in this issue, also with logical or || and the augmented assignment operators += etc.
  3. non-associative binary operators with assignment as the right hand side:
    • comparison operators 1 < x=2, others parse as 1 < (x=2) vs. goawk (1 < x) = 2
    • regex match operators 1 ~ x=2

I'm not sure what would be "the correct" parse for 2. and 3. in terms of the posix spec, but all the other major implementations seem to behave the same way.

Also the 2. and 3. might be somewhat related. At least both parse similarly (using the gawk pretty-printing option):

$ gawk -o- 'BEGIN { 1 < x=2 < y=3 }'
BEGIN {
	1 < (x = 2 < (y = 3))
}

$ gawk -o- 'BEGIN { 1 && x=2 && y=3 }'
BEGIN {
	1 && (x = 2 && (y = 3))
}

Going on a tangent...

I just noticed while writing this, that the handling of the non-associative operators is all over the place when there is no assignment expressions involved:

gawk accepts chaining of match-ops (left-associative), but not comparisons:

$ gawk -o- 'BEGIN { 1 ~ 2 ~ 3 }'
BEGIN {
	(1 ~ 2) ~ 3
}

$ gawk -o- 'BEGIN { 1 < 2 < 3 }'
gawk: cmd. line:1: BEGIN { 1 < 2 < 3 }
gawk: cmd. line:1:               ^ syntax error

onetrueawk rejects both with similar error message:

$ nawk 'BEGIN { 1 ~ 2 ~ 3 }'
nawk: syntax error at source line 1
 context is
	BEGIN { 1 ~ 2 >>>  ~ <<<
nawk: illegal statement at source line 1
nawk: illegal statement at source line 1

and mawk accepts both (left-associative):

$ mawk -W dump 'BEGIN { 1 < 2 < 3 }'
BEGIN
000 .	pushd	1
002 .	pushd	2
004 .	lt
005 .	pushd	3
007 .	lt
008 .	pop
009 .	exit0

In conclusion, I'd hope that any important awk code would be written in more unambiguous style 😅

@benhoyt
Copy link
Owner Author

benhoyt commented Jan 26, 2023

Thanks @juntuu! After I submitted this issue I did go through all the solutions in your repo that were failing in GoAWK, and that's the list I came up with too. Yep, I figured I'd address the cond || var += value and similar cases at the same time -- they're no doubt manifestations of the same issue.

As far as the POSIX spec goes, it says that && and || are higher-precedence than lvalue=expr (and augmented assignment), which is how I'm parsing it. However, it also says that && and || are left-associative and lvalue=expr is right-associative. I actually don't really know what that means! I thought that associativity was only needed when operators have the same precedence. The POSIX spec says, "In expression evaluation, where the grammar is formally ambiguous, higher precedence operators shall be evaluated before lower precedence operators." I'll have to take a closer look and improve my understanding here...

benhoyt added a commit that referenced this issue Feb 12, 2023
Expressions like "1 && x=1" aren't really valid (IMO), because
assignments are lower-precedence than binary operators, but onetrueawk,
Gawk, and mawk all support this for logical, match and comparison
operators.

The other awks support this by using a yacc grammar which supports
backtracking, and as Vitus13 said on reddit: "If there are two
syntactically valid parsings and one is a semantic error, the error
handling may resolve the ambiguity towards the valid parsing. In this
case, you can only assign to L values, so trying to assign to (1&&x)
doesn't make any sense."

In GoAWK, this requires a form of backtracking (I call it "partial
backtracking" because it's not actually backing up the lexer). It works
by parsing as (1&&x)=1 according to the operator precedence, then
determining that you're trying to assign something that isn't an
lvalue, then confirming that 1&&x is a binary expression, that the "x"
part is an lvalue, and that the operator (&& in this case) is one that
the other awks handle similarly for this case.

Also make the error message a bit clearer when you don't have an lvalue
on the left hand side of an assignment, like "rand() = 1".

Fixes #166
@benhoyt
Copy link
Owner Author

benhoyt commented Feb 12, 2023

Hi @juntuu. FWIW, I've fixed the $++a one in #168, and I'm planning to fix the 1&&x=1 one in #170 (which I don't love, but oh well).

benhoyt added a commit that referenced this issue Feb 17, 2023
Expressions like "1 && x=1" aren't really valid (IMO), because
assignments are lower-precedence than binary operators, but onetrueawk,
Gawk, and mawk all support this for logical, match and comparison
operators.

The other awks support this by using a yacc grammar which supports
backtracking, and as Vitus13 said on reddit: "If there are two
syntactically valid parsings and one is a semantic error, the error
handling may resolve the ambiguity towards the valid parsing. In this
case, you can only assign to L values, so trying to assign to (1&&x)
doesn't make any sense."

In GoAWK, this requires a form of backtracking (I call it "partial
backtracking" because it's not actually backing up the lexer). It works
by parsing as (1&&x)=1 according to the operator precedence, then
determining that you're trying to assign something that isn't an
lvalue, then confirming that 1&&x is a binary expression, that the "x"
part is an lvalue, and that the operator (&& in this case) is one that
the other awks handle similarly for this case.

Also make the error message a bit clearer when you don't have an lvalue
on the left hand side of an assignment, like "rand() = 1".

Fixes #166
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants