spec: clarify comprehension scope issues #84

alandonovan · 2019-10-04T15:52:38Z

The Go implementation of Starlark follows the Python3 semantics---specifically, the outermost for loop's sequence expression is evaluated outside the new lexical block---and its spec describes this in some detail (https://github.com/bazelbuild/starlark/blob/master/spec.md#comprehensions). By contrast, the Java implementation evaluates the for-operand sequence in the outer lexical block, which resembles Python2.

So:

$ cat a.star
x = [1]
dummy = [(1,)]
print([x for x in x])
print([x for x in x for _ in dummy])

$ python2 a.star
[1]
Traceback (most recent call last):
  File "a.star", line 4, in <module>
    print([x for x in x for _ in dummy])
TypeError: 'int' object is not iterable

$ python3 a.star
[1]
[1]

$ starlark a.star # Go
[1]
[1]

$ blaze-bin/third_party/bazel/src/main/java/com/google/devtools/starlark/Starlark a.star
[1]
Traceback (most recent call last):
        File "", line 4
                print([x for x in x for _ in dummy])
        File "", line 4, in print
                x
local variable 'x' is referenced before assignment.

We should decide on the semantics and make the implementations agree. I propose we aim for the Python3-compatible behavior, for "least surprise". (The difference is entirely in the name resolver.)

The text was updated successfully, but these errors were encountered:

alandonovan · 2019-10-04T15:53:23Z

@brandjon

brandjon · 2019-10-04T16:36:37Z

I dislike the asymmetry of Python's semantics here. But this probably doesn't cause problems for implementors, or much surprise for users. Unlike other deviations for Python, e.g. not supporting implicit string concatenation, there's little tangible benefit to deviating here -- it'd just be "improving" the language in an abstract way.

So I'm fine with codifying the Python 3 behavior in Starlark if Alan is.

laurentlb · 2019-10-07T14:34:39Z

The only user-visible difference is when the loop variable conflicts with the sequence expression?
This is a safe change (meaning, it cannot break any user)?

If so, I'm okay with it.

brandjon · 2019-10-09T23:18:44Z

This is not 100% safe to existing code.

$ cat ~/b.star
x = [1]
dummy = [(1,)]
print([x for x in x for _ in dummy])
print([x for _ in dummy for x in x])

$ ./blaze-bin/third_party/bazel/src/main/java/com/google/devtools/starlark/Starlark ~/b.star
[1]
[1]

$ python3 ~/b.star
[1]
Traceback (most recent call last):
  File "/usr/local/google/home/brandjon/b.star", line 4, in <module>
    print([x for _ in dummy for x in x])
  File "/usr/local/google/home/brandjon/b.star", line 4, in <listcomp>
    print([x for _ in dummy for x in x])
UnboundLocalError: local variable 'x' referenced before assignment

I think the change for consistency's sake may be worth it anyway.

alandonovan · 2020-07-31T14:42:03Z

In addition, we need to specify the behavior of [() for x in () for y in y]. All four implementations (Python2,3, Starlark/{Go,Java}) agree that the answer is [], that is, for y in y is not a static error, but the behavior cannot be derived from the Starlark spec, and AFAICT Python doesn't even attempt to specify it.

Recall that the first for clause (for x in ()) is special, and its operand is resolved in the enclosing scope and thus cannot reference the new x. But for all subsequent for clauses, the vars are bound before the body. I suspect the reason for this behavior is that it resembles the desugaring of comprehensions to nested loop statements, in which the scope of all bindings is the entire function, including the lines before the binding.

Similarly, the operand of the second for clause may refer to vars bound by the third:

>>> [() for x in () for y in () for z in y]
[]

Again, all four implementations agree but the spec says nothing.

@stepancheg

brandjon · 2020-07-31T14:58:55Z

Yes, in Python, the first for-clause's iterable expression is evaluated in the outer scope, while everything else occurs inside the closure created by the comprehension. This can be confirmed by looking at the generated bytecode.

I suspect the reason for this behavior is that it resembles the desugaring of comprehensions to nested loop statements

It's actually an intentional design decision to benefit generator expressions, whose implementation leaks over to other kinds of comprehensions where there's little or no benefit. In generator expressions, this causes the RHS of the first for clause to be evaluated eagerly and therefore catch errors more quickly. There's nothing more that can be eagerly evaluated without forcing iteration over some of the generator expression. (Presumably they're not worried about supporting generator expressions where the first RHS is undefined until a later statement executed between the generator's construction and its first evaluation.)

Where do we stand on this github issue? My memory is that we decided (and implemented) that comprehensions define variables in their own local scopes. I furthermore thought that we did away with Python 3's special casing of the first RHS, so that [() for x in x] is an error even when x is properly defined outside the comprehension. If so, I'm happy with that state of affairs.

- first loop operand is resolved in enclosing environment - all loop vars are bound before any expression is resolved See bazelbuild/starlark#84 Change-Id: I2c4283719cf78cea27dccba1ffb0932857254909

brandjon · 2020-07-31T15:05:10Z

I see from the PR discussion there's also a point about whether each for clause of a comprehension forms its own nested scope or whether they're all part of one flat scope? The difference being whether [None for _ in y for y in [...]] (where y is not otherwise defined) is a static or a dynamic error.

I'm not sure I have strong opinions about this. The two possible design philosophies are

Match what Python does (preferring Python 3 over Python 2 where they differ), to allow for least surprise.
"Improve" upon Python by simplifying away its warts, or by prohibiting features that lead to common errors (like banning implicit string concatenation and unparenthesized singleton tuples). The precedent here is more on the side of disallowing things than changing the semantics of things.

alandonovan · 2020-07-31T15:14:44Z

Jon, we follow Python3: a comprehension creates a single lexical block. The first loop's operand is resolved in the enclosing environment. Then all vars are bound; their scope is the entire block. Then all expressions are resolved.

(BTW, terms: a binding has a scope, which is the region of text in which its name refers to it. The environment is a tree of blocks, which map names to bindings.)

In the interests of sanity I think I would prefer to stick with the Python behavior. If the inner loop body is ever executed, you'll discover fwd-ref mistakes quickly. BTW it is possible to exploit the Python behavior by having the second loop conditionally execute the reference to the var of the third loop only on its second iteration [... for y in ([1] if firsttime() else z) for z in ...] but no-one should do that.

brandjon · 2020-07-31T16:07:47Z

If the resolution is to do exactly the same scoping behavior as Python 3, that's fine be me, and it's just a matter of spec wording and implementation.

stepancheg · 2020-07-31T17:32:09Z

I greatly appreciate increasing the strictness of the spec.

Replicating Python 3 behaviour is good.

Though I think Python 3 developers made a mistake by defining only one scope for the whole comprehension, we could do better (create scope for each for clause: that is probably what users intuitively expect).

(I guess Python did not bother with static error reporting due to its very dynamic nature, like locals() function or exec statement or for historical reasons, but Starlark is much stricter and can report errors better).

If the inner loop body is ever executed, you'll discover fwd-ref mistakes quickly.

This can be said about any static error. But a lot of definitions are very infrequently executed code.

As some evidence, when I enabled strict scope resolution in our repositories (each variable must be resolved to global or local), I have discovered a huge amount of broken code. Like this:

def create_target(name):
  if name.startswith("_"):
    fail("target name must not start with underscore: {}".format(target_name))
  ...

Can you spot an error here? But Starlark can, and that's great.

But either way is fine with me, it is not a life and death issue.

ndmitchell · 2020-08-03T18:05:39Z

I asked @stepancheg for an example of how you can observe the scope difference - he replied with:

def foo():
    z = []
    print([x for x in [1] for y in z for z in []])
foo()

The z is in scope, but not yet bound. I find the Python3 behaviour here quite confusing - for in a list comp does different things each time. I'd be a fan of defining the scope the way it would be in a statically-typed language. And I agree with @stepancheg that detecting this statically (which Starlark would permit) is a great thing.

brandjon · 2020-08-03T18:27:40Z

See my above comment on simplifying things versus matching Python behavior. Fixing this means yet another difference, and I'm dubious that this kind of error really occurs in practice. It's easier to deviate from Python when we're outright banning a feature that commonly leads to errors, rather than changing semantics of something for the sake of purity. That said, I don't feel super strongly.

- first loop operand is resolved in enclosing environment - all loop vars are bound before any expression is resolved See bazelbuild/starlark#84

alandonovan · 2020-08-24T15:35:35Z

The remaining task is to copy the changes from google/starlark-go#299 into this repo.

alandonovan · 2020-11-13T14:19:16Z

The remaining task is to copy the changes from google/starlark-go#299 into this repo.

Done by #115.

alandonovan mentioned this issue Jan 6, 2020

comprehensions temporarily rebind global variables #92

Closed

alandonovan mentioned this issue Jul 31, 2020

spec: clarify scope rules for comprehensions google/starlark-go#299

Merged

alandonovan pushed a commit to google/starlark-go that referenced this issue Aug 4, 2020

spec: clarify scope rules for comprehensions (#299)

4379bb3

- first loop operand is resolved in enclosing environment - all loop vars are bound before any expression is resolved See bazelbuild/starlark#84

alandonovan closed this as completed Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: clarify comprehension scope issues #84

spec: clarify comprehension scope issues #84

alandonovan commented Oct 4, 2019 •

edited

alandonovan commented Oct 4, 2019

brandjon commented Oct 4, 2019

laurentlb commented Oct 7, 2019

brandjon commented Oct 9, 2019

alandonovan commented Jul 31, 2020 •

edited

brandjon commented Jul 31, 2020

brandjon commented Jul 31, 2020

alandonovan commented Jul 31, 2020

brandjon commented Jul 31, 2020

stepancheg commented Jul 31, 2020

ndmitchell commented Aug 3, 2020

brandjon commented Aug 3, 2020

alandonovan commented Aug 24, 2020

alandonovan commented Nov 13, 2020

spec: clarify comprehension scope issues #84

spec: clarify comprehension scope issues #84

Comments

alandonovan commented Oct 4, 2019 • edited

alandonovan commented Oct 4, 2019

brandjon commented Oct 4, 2019

laurentlb commented Oct 7, 2019

brandjon commented Oct 9, 2019

alandonovan commented Jul 31, 2020 • edited

brandjon commented Jul 31, 2020

brandjon commented Jul 31, 2020

alandonovan commented Jul 31, 2020

brandjon commented Jul 31, 2020

stepancheg commented Jul 31, 2020

ndmitchell commented Aug 3, 2020

brandjon commented Aug 3, 2020

alandonovan commented Aug 24, 2020

alandonovan commented Nov 13, 2020

alandonovan commented Oct 4, 2019 •

edited

alandonovan commented Jul 31, 2020 •

edited