Correct grammar of in-out expressions #194

amartini51 · 2023-10-10T00:31:25Z

Fixes: #193

Co-authored-by: Sarah Canto <61300898+sarahnicoleboo@users.noreply.github.com>

amartini51 · 2023-10-10T00:31:36Z

The current grammar dates back to commit b0bc885 for Swift 1. The original RST langref grammar doesn't have specific discussion of in-out expressions; the older HTML langref says:

inout indicates that the argument will be passed as an "in-out" parameter. The caller must pass an lvalue decorated with the & prefix operator as the argument. Semantically, the value of the argument is passed "in" to the callee to a local value, and that local value is stored back "out" to the lvalue when the callee exits. This is normally indistinguishable from pass-by-reference semantics.

Neither of the langref formal grammars include a production for in-out-expression.

amartini51 · 2023-10-10T00:35:58Z

@lorentey Is this something you can review? You and I looked at some corrections and additions around in-out in 2020 (merged in commit 7c9e800).

This grammar avoids productions like &x+y -- even though x+y is a valid expression, you can't use it this way in an in-out expression. However, it does still overproduce -- for example, a.b() is a valid postfix expression but &a.b() isn't a valid in-out expression. Co-authored-by: Joe Groff <jgroff@apple.com>

lorentey · 2023-10-10T23:53:42Z

Interesting! I am not a Swift grammar expert, but to my naive, innocent eyes this does look like a better match with InOutExpr in AST. Not sure who would be a good person to confirm -- let's try @ahoppen, maybe?

ahoppen

The change looks correct to me.

Moving up one level in the grammar introduces additional overgeneration, but otherwise code like f(&(x)) doesn't match.

amartini51 · 2023-10-11T17:39:54Z

@ahoppen Thanks for taking a look. Reading it again this morning, I see that 'postfix-expression' doesn't include parenthesized-expression, but you can use parentheses in an in-out expression.

One option is to define 'in-out-expression' as follows:

in-out-expression → & primary-expression

This overproduces even more — allowing literals, closures, and conditional expressions inside the in-out expression where the aren't actually allowed. Elsewhere in the reference, we often handle places where the grammar overproduces by explaining the limitations in prose.

Do we have a list of what we specifically do support after the & for an in-out expression? For example, are key paths ever allowed?

Trying code out in the REPL, we allow parentheses for grouping, dot for member lookup, square brackets for array subscripts. An alternate approach that might be feasible here, if the list of allowed in-out things is small, is to build them up in the in-out-expression grammar specifically by repeating/re-using bits of other grammar rules:

Grammar of an in-out expression:

in-out-expression → & in-out-expression-tail
in-out-expression-tail → in-out-expression-tail . decimal-digits
in-out-expression-tail → in-out-expression-tail . identifier
in-out-expression-tail → in-out-expression-tail [ function-call-argument-list ]
in-out-expression-tail → ( in-out-expression-tail )
in-out-expression-tail → identifier

It doesn't look like we support tuple expressions — but the error I get suggests maybe we do and I need to spell something more explicitly:

var x = 10; var y = 10
func f(_ z: inout (Int, Int)) { z = (100, 200) }
f( &(x, y) )
// error: repl.swift:3:1: error: type of expression is ambiguous without more context
// f( &(x, y) )
// ^~~~~~~~~~~~

There isn't a clear point in the expression grammar where everything produced by that rule is allowed in an in-out expression, but nothing else, so we will always either over- or under-produce if we write in-out-expression in terms of existing productions. Combined rules from the parts of the grammar that are definitely supported to make these production rules, although this list might be incomplete: - The grammar for . is borrowed from explicit-member-expression - The grammar for [] is borrowed from subscript-expression - The grammar for () is adapted from parenthesized-expression

ahoppen · 2023-10-11T20:06:43Z

The Swift parser itself allows any expression after &. parseExprUnary calls back into parseExprUnary after consuming the &.

https://github.com/apple/swift/blob/e298fb5a046cde1b588a8e39ba622ac6e17b5dba/lib/Parse/ParseExpr.cpp#L616-L626

Any enforcement that you can’t pass a literal after the ampersand is semantic. For example the diagnostic of passing an integer literal is the same as when passing a let constant.

func foo(_ x: inout Int) {}

foo(&1) // error: Cannot pass immutable value as inout argument: literals are not mutable

let myInt = 1
foo(&myInt) // error: Cannot pass immutable value as inout argument: 'myInt' is a 'let' constant

I’m not entirely sure what’s used to represent parseExprUnary in the grammar but I think that’s what we should use as a child here.

CC @rintaro In case you have opinions / thoughts as well.

rintaro · 2023-10-11T20:42:21Z

Grammar of an in-out expression:

in-out-expression → & in-out-expression-tail
in-out-expression-tail → in-out-expression-tail . decimal-digits
in-out-expression-tail → in-out-expression-tail . identifier
in-out-expression-tail → in-out-expression-tail [ function-call-argument-list ]
in-out-expression-tail → ( in-out-expression-tail )
in-out-expression-tail → identifier

This is not enough. At least it's missing

in-out-expression-tail → 'self'
in-out-expression-tail → in-out-expression-tail . 'self'

And somewhat surprisingly, the following code also works

struct S {
  static var shared: Self = S()
}

func foo(_ x: inout S) {}
foo(&.shared)

As @ahoppen mentioned, the parser parses any prefix-expression (parseExprUnary). And the semantic analysis decides if it accepts the operand. IMO, as the grammar, it's OK to say just

in-out-expression → '&' postfix-expression

NOTE: not prefix-expression because both &&foo or &!foo is parsed as && or &! prefix-operator respectively.

rintaro · 2023-10-12T19:32:03Z

On second thought,

the parser parses any prefix-expression (parseExprUnary). And the semantic analysis decides if it accepts the operand.

This is similar to keypath expression. Parser parses any postfix components, but in the current grammar, key-path-components only accepts certain suffixes. I don't have strong opinion on whether the grammar should reflect technically-semantic things, or not . But at least we should be consistent.

amartini51 · 2023-10-12T21:58:34Z

Thanks, @rintaro.

TSPL doesn't generally distinguish between parsing, lexing, semantic analysis, etc. — so its formal grammar doesn't specifically try to include content that parses but is semantically invalid. If there's something that isn't valid Swift code, we try to exclude it from the grammar, regardless of which stage of compilation rejects it.

Defining 'in-out-expression' in terms of 'prefix-expression' as in your comment doesn't entirely work, because grouping parentheses appear in the grammar in the production for 'primary-expression', but they're valid after the &. I see three approaches we could take:

Define 'in-out-expression' in terms of 'postfix-expression' and re-implement the productions for grouping parentheses as part of 'in-out-expression'. ~~I don't particularly like this — it still overproduces, and it's less than ideal to have grouping parentheses defined in multiple places in the grammar.~~ [EDIT: That's incorrect; 'postfix-expression' includes 'primary-expression', which includes 'parenthesized-expression'. No need to re-implement grouping.]
Define 'in-out-expression' in terms of 'primary-expression' and add a prose description of what's actually allowed. We've used this to handle overproduction elsewhere in the grammar, where it's not practical to reduce it in the grammar.
Define 'in-out-expression' in terms of an explicit list of what grouping and postfixes are allowed, similar to how 'key-path-expression' handles things. (That approach was added by commit 2757f7a and accumulated additional pieces over time.)

If the third option produces a reasonable grammar, that's my initial preference. That said, it might not be practical — adding implicit member expressions and self expressions might make the list of productions here unwieldy.

For the second option, I'd add a paragraph before the "for more information" line like the following:

The expression is a combination of the following kinds of expression:

An identifier or self

Access to a member, including properties, tuple elements, and .self

Access to an array subscript

Member access can be explicit like f(&x.y) or implicit like f(&.z).
The expression can also include grouping parentheses.

The previous grammar didn't include `self` or `.self` or implicit member expressions, as Rintaro Ishizaki <rishizaki@apple.com> noted on the PR. Duplicating those productions here, with modifications to prevent them from pulling in all of primary-expression or postfix-expression, would make this grammar a bit too long and unwieldy.

The old grammar didn't allow things like f(&x.y) which are legal in Swift. The new grammar allows a bunch of extra stuff -- literals, closures, and conditional expressions -- but that's a better problem to have. See also #194, where we're discussing a more specific fix. Fixes: #193 (partially)

sarahnicoleboo · 2023-10-26T08:25:54Z

Hi everyone.
I'm the person who originally posted the issue about this in #193

I've read through this thread along with the partial fix that was merged.
I just wanted to leave my thoughts here:

I think that @rintaro proposed the fix that I believe is the most correct with & postfix-expression
This allows everything that is syntactically valid and pushes rejection to semantic analysis.
A language grammar is really a syntactic representation of the language, not semantic.
And of course we want the grammar to be readable and not too complex thus we end up with this mix between concrete and abstract grammar that defines the Swift grammar.
A side effect of this is that there's going to be overproduction and lots of it.
I could provide dozens of examples here of overproductions in other areas of the grammar.

With that being said, I also see the concern of consistency when looking at how keypath-expression is handled.
It's an idea of "If overproduction is limited there, where else should it be limited?"

I'm not sure it's my place to be chiming in on that decision since it's really a design decision.
I just wanted to share my thoughts and also point out that the partial fix doesn't address the original issue that I brought up (&s.a).
s.a was produced through the production postfix-expression. So if we limit it to & primary-expression the original example is still not producible.
Thus, defining it in terms of primary-expression and adding a prose specification would not work.

Thanks for the consideration.

amartini51 · 2023-11-02T20:52:46Z

This got incorrectly (and accidentally) closed as part of merging #199 because both PRs resolve the same GitHub issue. EDIT: Or possibly because commit 028a2ec links to this PR.

The grammar for postfix-expression produces this case, and includes primary-expression. This grammar also allows for production of parentheses because primary-expression produces parenthesized-expression. Co-authored-by: Sarah Canto <61300898+sarahnicoleboo@users.noreply.github.com>

Kept this branch's changes to the grammar.

In-out supports more than just simple variables.

c8ccdbf

Co-authored-by: Sarah Canto <61300898+sarahnicoleboo@users.noreply.github.com>

amartini51 requested a review from lorentey October 10, 2023 00:34

lorentey requested a review from ahoppen October 10, 2023 23:53

stzn mentioned this pull request Oct 11, 2023

in-out expressionの修正 stzn/the-swift-programming-language-jp#479

Open

ahoppen approved these changes Oct 11, 2023

View reviewed changes

Generalize to allow parenthesized expression.

3d0044b

Moving up one level in the grammar introduces additional overgeneration, but otherwise code like f(&(x)) doesn't match.

ahoppen requested a review from rintaro October 11, 2023 20:06

amartini51 mentioned this pull request Oct 19, 2023

Fix in-out grammar underproduction #199

Merged

amartini51 closed this in 028a2ec Oct 20, 2023

amartini51 mentioned this pull request Oct 20, 2023

Error in grammar regarding in-out-expression #193

Open

amartini51 reopened this Nov 2, 2023

amartini51 and others added 2 commits November 2, 2023 14:05

Track 'main' to resolve merge conflicts.

46fb198

Kept this branch's changes to the grammar.

amartini51 marked this pull request as draft December 19, 2023 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct grammar of in-out expressions #194

Correct grammar of in-out expressions #194

amartini51 commented Oct 10, 2023

amartini51 commented Oct 10, 2023 •

edited

amartini51 commented Oct 10, 2023

lorentey commented Oct 10, 2023

ahoppen left a comment

amartini51 commented Oct 11, 2023

ahoppen commented Oct 11, 2023

rintaro commented Oct 11, 2023 •

edited

rintaro commented Oct 12, 2023 •

edited

amartini51 commented Oct 12, 2023 •

edited

sarahnicoleboo commented Oct 26, 2023 •

edited

amartini51 commented Nov 2, 2023 •

edited

Correct grammar of in-out expressions #194

Are you sure you want to change the base?

Correct grammar of in-out expressions #194

Conversation

amartini51 commented Oct 10, 2023

amartini51 commented Oct 10, 2023 • edited

amartini51 commented Oct 10, 2023

lorentey commented Oct 10, 2023

ahoppen left a comment

Choose a reason for hiding this comment

amartini51 commented Oct 11, 2023

ahoppen commented Oct 11, 2023

rintaro commented Oct 11, 2023 • edited

rintaro commented Oct 12, 2023 • edited

amartini51 commented Oct 12, 2023 • edited

sarahnicoleboo commented Oct 26, 2023 • edited

amartini51 commented Nov 2, 2023 • edited

amartini51 commented Oct 10, 2023 •

edited

rintaro commented Oct 11, 2023 •

edited

rintaro commented Oct 12, 2023 •

edited

amartini51 commented Oct 12, 2023 •

edited

sarahnicoleboo commented Oct 26, 2023 •

edited

amartini51 commented Nov 2, 2023 •

edited